Practice cleaning missing values with California Cities report
Practice cleaning missing values with California Cities report Data Science Project
Data Cleaning with Pandas

Practice cleaning missing values with California Cities report

In this project you'll practice how to identify and clean "missing values". Missing values is defined as values that are directly not provided in the data, and usually represented as a NaN. We'll load a dataset containing a report of different Cities of California and clean it dealing with all the NaN and missing values.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Find out the missing values in each column and store them in the variable `col_missing_values`

Make sure you run all the previous cells. Don't worry if you screw up with the DataFrame! just reload it with the first line of the notebook.

input

Which Column has the most number of missing values?

Look at the result of question 1 and find which column has the most of its values missing

multiplechoice

Which Column has the minimum missing values?

You can check the column from the solution for question 1. No need to write any code.

input

How many values of the column `elevation_m` are missing?

codevalidated

Find the total missing values in the whole dataset and store the number in `df_missing_values`

You must modify the df variable itself. Don't worry if you screw up with the DataFrame! just reload it with the first line of the notebook.

codevalidated

Drop the column `area_water_percent` as it has the most of its values missing

You have to drop this column perminently as we can not use it for any purpose.

codevalidated

Drop the rows having missig values and store the resulting DataFrame in the variable `df_narows_dropped`

Do not use inpace=True as it will perminently remove the rows from our DataFrame.

codevalidated

Drop the rows having more than 5 missing values and store the resulting DataFrame in the variable `df_rows_droped`

Use threshold parameter.

codevalidated

Drop the columns having missig values and store the resulting DataFrame in the variable `df_nacols_dropped`

codevalidated

Drop colomns with more than 10 missing values and store the resulting DataFrame in the variable `df_cols_droped`

codevalidated

Fill the 50 missing values in `elevation_m` with -999. Store your result in the variable `filled_elevation_m`

codevalidated

Fill the 7 missing values in `area_total_km2` with the value 0 permanently, store your result in the variable `filled_area_total`

codevalidated

Fill the missing values of the column `latd` using backward filling method and store your result in the variable `bfill_latd`

codevalidated

Fill the 50 missing values of the column `longd` using forwarding filling method and store your result in the variable `ffill_longd`

codevalidated

Fill the 2 missing values of the column `population_total` with the mean of the column and store your result in the variable `mean_total_population`

codevalidated

Fill the 5 missing values of the column `area_water_sq_mi` with the medain value of the column and store your result in the variable `median_fill

codevalidated

Fill the 6 missing values of the column `area_land_km2` with the mode value of the column and store your result in the variable `mode_fill`

multiplechoice

Which of the following code will fill the missing values in the DataFrame with zeros and store your result in a variable `filled_df`

multiplechoice

Which of the following is/are the general structure for filling a column values with any value (mean, mode, or median)

Practice cleaning missing values with California Cities reportPractice cleaning missing values with California Cities report
Author

Jawad haider

This project is part of

Data Cleaning with Pandas

Explore other projects