Practice identifying and dealing with invalid values
Practice identifying and dealing with invalid values Data Science Project
Data Cleaning with Pandas

Practice identifying and dealing with invalid values

In this project you'll learn how to identify "invalid values" given the nature of the data itself. Understanding the categorical sets, the ranges and the types of data related. You'll need to use string handling and statistical notions to make sure your resulting dataset is clean and ready for analysis.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Clean the column `Salary` by removing invalid values

Invalid values are defined as any value that is not an integer.

Perform the selection of valid values and store them in column Salary_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_salaries.

codevalidated

Clean the column `Zip` by removing invalid values

Invalid values are defined as any value that is not an integer.

Perform the selection of valid values and store them in column Zip_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_zip.

codevalidated

Clean the column `ManagerID` by removing invalid values.

Invalid values are defined as any value that is not an integer or is an integer below 1.

Perform the selection of valid values and store them in column ManagerID_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_managerID.

codevalidated

Clean the column `Sex` by removing invalid values

Invalid values are defined as any value other than M or F.

Perform the selection of valid values and store them in column Sex_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_sex.

codevalidated

Clean the column `RaceDesc` by removing invalid values

Invalid values are defined as any value other than [White, Black or African American, Asian, Two or more races, American Indian or Alaska Native, Hispanic].

Perform the selection of valid values and store them in column RaceDesc_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_race.

codevalidated

Clean the column `MaritalDesc` by removing invalid values

Invalid values are defined as any value other than Single, Married, Divorced, Separated, or Widowed.

Perform the selection of valid values and store them in column MaritalDesc_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_marital_status.

codevalidated

Clean the column `DOB` by removing invalid values

Invalid values are defined as any value that is not a datetime.

Perform the selection of valid values and store them in column DOB_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_DOB.

codevalidated

Clean the column `DateofHire` by removing invalid values

Invalid values are defined as any value that is not a datetime.

Perform the selection of valid values and store them in column DateofHire_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_HireDate.

codevalidated

Clean the column `Email` by removing invalid values

Invalid values are defined as any value that does not contain @.

Perform the selection of valid values and store them in column Email_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_Email.

codevalidated

Clean the column `Phone` by removing invalid values

Invalid values are defined as any value that does not contain +.

Perform the selection of valid values and store them in column Phone_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_Phone.

Practice identifying and dealing with invalid valuesPractice identifying and dealing with invalid values
Author

Mohamed Rawash

This project is part of

Data Cleaning with Pandas

Explore other projects