Data Cleaning Capstone Project 2

multiplechoice

How do you check if a value in a Pandas DataFrame column is null?

multiplechoice

What is the Pandas function used to drop missing values?

codevalidated

Find out the missing values in each column

Perform the calculation and store the results in the variable col_missing_values.

codevalidated

Drop the columns that have more than 1000 missing values

You have to drop this column permanently as we can not use them for any purpose.

codevalidated

Drop the rows that have a `Service Area` other than `Karachi`

Perform this drop permanently in df.

codevalidated

Fill the missing values of the column `Wait Time Min` with the median of the column

Make sure to apply this change on the original df.

codevalidated

Fill all missing values in `Credit ID` using backward filling method

Make sure to apply this change on the original df.

multiplechoice

How do you handle duplicate values in Pandas?

multiplechoice

How do you extract a substring from a Pandas DataFrame column?

codevalidated

Find and drop duplicate rows based on `Booking ID`, `Trip ID`, `Car Model`, `Payment Type`, and `Pickup Location` columns while keeping last row

Make sure to permanently drop these duplicates.

input

How many users paid with `Credit Card` in the Column `Payment Type`?

Let's count all the Customers paid with Credit Card in the Column Payment Type.

codevalidated

Replace the `Car Type` having `Go Mini` with `GO Mini`

Make sure to apply this change to the original df.

codevalidated

Find the trips (rows) whose Column `Car Type` contains the substring `GO` & the locations in column `Pickup Location` contains `University`

Store your selection in the variable edu_trips_with_GO.
Note: make sure to pass the previous activity to avoid facing any issue in your result here.

input

How many Values in the Column `Car Type` end with `+`

multiplechoice

Which of the following is an example of a normalization technique used in data cleaning?

codevalidated

Clean the column `Wait Time Min` by selecting outliers

Outliers are defined as any values 3 or more std to the left or right of the mean.
Perform the outlier identification and drop them.
Important Note: Make sure to correctly solve the previous activities before solving this activity.

codevalidated

Clean the column `Trip Price` by identifying outliers

Outliers are defined as any values that are 1.5 IQR to the left or right.
Perform the outlier identification and drop them.
Important Note: Make sure to correctly solve the previous activities before solving this activity.

codevalidated

Clean the column `Payment Type` by removing invalid values

Invalid values are defined as any value other than Credit Card or Cash.

Perform the selection of valid values and store them in column Payment_Type_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_payment_type.

codevalidated

Clean the columns `Trip Currency` by removing invalid values

Invalid values are defined as any value other than PKR.
Perform the selection of invalid values and drop them from the original df.

codevalidated

Clean the column `Total Distance` by removing invalid values

Invalid values are defined as any value that is not an integer.
Perform the selection of valid values and store them in column Total_Distance_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_distance.

Mohamed Rawash

Project Activities

How do you check if a value in a Pandas DataFrame column is null?

What is the Pandas function used to drop missing values?

Find out the missing values in each column

Drop the columns that have more than 1000 missing values

Drop the rows that have a `Service Area` other than `Karachi`

Fill the missing values of the column `Wait Time Min` with the median of the column

Fill all missing values in `Credit ID` using backward filling method

How do you handle duplicate values in Pandas?

How do you extract a substring from a Pandas DataFrame column?

Find and drop duplicate rows based on `Booking ID`, `Trip ID`, `Car Model`, `Payment Type`, and `Pickup Location` columns while keeping last row

How many users paid with `Credit Card` in the Column `Payment Type`?

Replace the `Car Type` having `Go Mini` with `GO Mini`

Find the trips (rows) whose Column `Car Type` contains the substring `GO` & the locations in column `Pickup Location` contains `University`

How many Values in the Column `Car Type` end with `+`

Which of the following is an example of a normalization technique used in data cleaning?

Clean the column `Wait Time Min` by selecting outliers

Clean the column `Trip Price` by identifying outliers

Clean the column `Payment Type` by removing invalid values

Clean the columns `Trip Currency` by removing invalid values

Clean the column `Total Distance` by removing invalid values

Mohamed Rawash

Data Cleaning with Pandas

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database