Pandas Capstone Project: Analyzing Covid Data

codevalidated

Read CSV File

Read the covid.csv file into a dataframe named df and include first column as the index column.

multiplechoice

Select the correct shape

Choose the correct shape for the df dataframe.

multiplechoice

Select the correct datatype

Choose the correct data type. There can be multiple correct answers.

multiplechoice

Find the minimum and maximum values

Select the minimum and maximum values of the total_cases column in the COVID-19 dataset stored in df dataframe.

multiplechoice

Total cases in the COVID-19 dataset

Select the total number of cases in the COVID-19 dataset using the total_cases column in the df dataframe.

multiplechoice

Find the mean cases per day

Select the mean number of new cases per day in the COVID-19 dataset and select the correct answer. Answer is rounded to two decimal places.

codevalidated

Select values from a dataframe using indexing

Craete a new dataframe named df1 which contains only the continent and location columns from the df dataframe.

codevalidated

Drop columns from the dataframe

Drop the iso_code, new_cases_smoothed, new_deaths_smoothed, total_cases_per_million, new_cases_per_million, new_cases_smoothed_per_million, total_deaths_per_million, new_deaths_per_million, and new_deaths_smoothed_per_million columns from the df dataframe.

codevalidated

Add more rows to a dataframe

Add a new row to the df dataframe with the following values:

new_data = {'continent': ['Africa'], 'location': ['Zimbabwe'], 'date': ['2022-12-07'], 'total_cases': [259356.0], 'new_cases': [192.0], 'total_deaths': [5622.0], 'new_deaths': [2.0], 'population_density': [42.729], 'median_age': [19.6], 'aged_65_older': [2.822], 'aged_70_older': [1.845], 'gdp_per_capita': [1899.767], 'cardiovasc_death_rate': [307.846], 'diabetes_prevalence': [1.85], 'life_expectancy': [61.55], 'population': [16320539.0]}

codevalidated

Update a specific cell value in the COVID-19 dataset

Update the value of the total_cases column for the row with index 166620 to 259357.0 in df dataframe.

codevalidated

Update a multiple cell value in the COVID-19 dataset

Update the values of the total_cases column for the rows with index 166620 and 166621 to 259357.0 and 259358.0 respectively.

codevalidated

Remove rows from the dataframe

Remove the rows with index 166620 and 166621 from the dataframe.

codevalidated

Use `.loc` to select rows based on a condition

Select all the rows from the dataframe where the total_cases column is greater than 1000000.0. Store the result in a variable named df_1m.

codevalidated

Select specific columns and rows

Select the total_cases and total_deaths columns for the rows with index 5168, 5172 and 163703. Store the result in a variable named df_cases_death.

codevalidated

Sort COVID-19 data in ascending order

Sort the dataframe in ascending order of the total_cases column. Store the result in a variable named df_sorted.

codevalidated

Sort COVID-19 data in descending order

Sort the dataframe in descending order of the total_cases column. Store the result in a variable named df_sorted_desc.

codevalidated

Sort the COVID-19 data by multiple columns

Sort the dataframe in descending order of the total_cases column and then in ascending order of the total_deaths column. Store the result in a variable named df_sorted_multi.

codevalidated

Add new columns using arithmetic operations

Create a new column named total_cases_per_million in the dataframe df by dividing the total_cases column by the population column.

codevalidated

Using vectorized operations to update a column

Update the total_cases_per_million column in the dataframe df by multiplying it by 1000.

codevalidated

Remove columns using `del` statement

Remove the total_cases_per_million column from the df dataframe.

codevalidated

Rename columns

Rename the total_cases column to Total Cases and the total_deaths column to Total Deaths.

codevalidated

Filter COVID-19 data using boolean indexing

Create three dataframe objects named df_india, df_china, and df_greater_new_cases by filtering the df dataframe object using boolean indexing as follows:

For df_india, select all rows from the COVID-19 DataFrame where the location is either "India" or "China".
For df_china, select all rows from the COVID-19 DataFrame where the number of new_cases is between 100000 and 200000.
For df_greater_new_cases, select all rows from the COVID-19 DataFrame where the number of new_cases per day is greater than or equal to 10000.

codevalidated

Read the data from Covid-19 dataset for visualization

Read the data from the covid.csv file and store it in the df_for_visualization dataframe object. Also parse the date column as a datetime object.

codevalidated

Filter data by month

Filter the data_for_visualization dataframe object to select only the rows where the date is in the month of March 2020 and location is India. Store the filtered dataframe object in the df_for_plot variable.

multiplechoice

Create a line plot

Plot a line plot using the df_for_plot dataframe object. The x-axis should be the date column and the y-axis should be the new_cases column. Based on the plot, which of the following statements is true?

multiplechoice

Create a bar plot

Plot a bar plot using the df_for_plot dataframe object. The x-axis should be the date column and the y-axis should be the total_deaths column. Based on the plot, which of the following statements is true?

Anurag Verma

Project Activities

Read CSV File

Select the correct shape

Select the correct datatype

Find the minimum and maximum values

Total cases in the COVID-19 dataset

Find the mean cases per day

Select values from a dataframe using indexing

Drop columns from the dataframe

Add more rows to a dataframe

Update a specific cell value in the COVID-19 dataset

Update a multiple cell value in the COVID-19 dataset

Remove rows from the dataframe

Use `.loc` to select rows based on a condition

Select specific columns and rows

Sort COVID-19 data in ascending order

Sort COVID-19 data in descending order

Sort the COVID-19 data by multiple columns

Add new columns using arithmetic operations

Using vectorized operations to update a column

Remove columns using `del` statement

Rename columns

Filter COVID-19 data using boolean indexing

Read the data from Covid-19 dataset for visualization

Filter data by month

Create a line plot

Create a bar plot

Anurag Verma

Intro to Pandas for Data Analysis

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database