Practice identifying and dealing with invalid values
Practice identifying and dealing with invalid values Data Science Project
Data Cleaning with Pandas

Practice identifying and dealing with invalid values

This lab guides you through cleaning a Human Resources dataset with pandas, focusing on handling invalid values across numeric columns, categories, datetimes, and strings. You'll learn methods to identify, select, and clean invalid data, enhancing your data preprocessing skills. This practical exercise is essential for anyone looking to refine their data analysis capabilities in Python.
Start this project
Practice identifying and dealing with invalid valuesPractice identifying and dealing with invalid values
Project Created by

Mohamed Rawash

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Clean the column `Salary` by removing invalid values

Invalid values are defined as any value that is not an integer.

Perform the selection of valid values and store them in column Salary_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_salaries.

codevalidated

Clean the column `Zip` by removing invalid values

Invalid values are defined as any value that is not an integer.

Perform the selection of valid values and store them in column Zip_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_zip.

codevalidated

Clean the column `ManagerID` by removing invalid values.

Invalid values are defined as any value that is not an integer or is an integer below 1.

Perform the selection of valid values and store them in column ManagerID_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_managerID.

codevalidated

Clean the column `Sex` by removing invalid values

Invalid values are defined as any value other than M or F.

Perform the selection of valid values and store them in column Sex_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_sex.

codevalidated

Clean the column `RaceDesc` by removing invalid values

Invalid values are defined as any value other than [White, Black or African American, Asian, Two or more races, American Indian or Alaska Native, Hispanic].

Perform the selection of valid values and store them in column RaceDesc_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_race.

codevalidated

Clean the column `MaritalDesc` by removing invalid values

Invalid values are defined as any value other than Single, Married, Divorced, Separated, or Widowed.

Perform the selection of valid values and store them in column MaritalDesc_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_marital_status.

codevalidated

Clean the column `DOB` by removing invalid values

Invalid values are defined as any value that is not a datetime.

Perform the selection of valid values and store them in column DOB_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_DOB.

codevalidated

Clean the column `DateofHire` by removing invalid values

Invalid values are defined as any value that is not a datetime.

Perform the selection of valid values and store them in column DateofHire_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_HireDate.

codevalidated

Clean the column `Email` by removing invalid values

Invalid values are defined as any value that does not contain @.

Perform the selection of valid values and store them in column Email_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_Email.

codevalidated

Clean the column `Phone` by removing invalid values

Invalid values are defined as any value that does not contain +.

Perform the selection of valid values and store them in column Phone_Fixed while invalid values should be NaN. Then select invalid values and store the results in the variable df_invalid_Phone.

Practice identifying and dealing with invalid valuesPractice identifying and dealing with invalid values
Project Created by

Mohamed Rawash

This project is part of

Data Cleaning with Pandas

Explore other projects