Mastering DataFrame Mutations with Wine Quality data
Mastering DataFrame Mutations with Wine Quality data Data Science Project
Intro to Pandas for Data Analysis

Mastering DataFrame Mutations with Wine Quality data

In this project you'll practice modifications in a Pandas DataFrame by mutating a dataset containing Wine Quality data. You'll practice how to create columns, how to delete columns, how to change their type, etc. As usual, we'll also build the concepts around when it's ok to modify the data and when it's not.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

input

What is maximum amount of citric acid in the wine dataset?

Enter the answer to 1 decimal point.

multiplechoice

How many missing values are in the dataset?

Check the dataset and initial analysis to check for missing values.

input

What is median wine quality?

Enter the answer to 1 decimal point.

codevalidated

Rename dataframe columns to appropriate format

Rename the columns to have underscore instead of space. For example old name: fixed acidity to new name: fixed_acidity. Skip single-word columns. Set inplace=True.

codevalidated

Drop the first and last row

Perform the modification and store in a new variable: df_first_last.

codevalidated

Remove maximum total sulfur dioxide from dataset

Locate and remove the row with the maximum value for total_sulfur_dioxide and store in a new variable: df_drop.

codevalidated

Convert the quality column to the float

All the datatypes are float besides the quality column. Create a new column in the df DataFrame named quality_float which contains the values of quality, but with a float type.

codevalidated

Remove density, residual sugar and chlorides columns from the dataset

Modify the dataframe by droping the three variables density, residual_sugar,chlorides and store your result as df_drop_three.

codevalidated

Create a new column that calculates the alcohol content in terms of percentage (%)

Get the percentage of alcohol content for each datapoint and store your result in a new column alcohol_perc.

codevalidated

Evaluate the amount of sulphates and citric acid in the red wine

Create a new column in the data frame that contains the sum of sulphates and citric_acid. Store your result in a new column: sulphate_citric_acid.

codevalidated

Create a new column that identifies if the alcohol content is below the mean of the alcohol content in the dataset.

Modify the dataset accordingly and store your result in a new column deviation_alcohol

codevalidated

Convert the wine quality scores into categorical labels: `low`, `medium`, `high`

Convert the wine quality scores into categorical labels. Classify as low if values are 5 and below; medium if values are between 5 and 7; high if greater than 7. Store your result in a new column quality_label

codevalidated

Create a new column that calculates the ratio of free sulfur dioxide to total sulfur dioxide.

Modify the DataFrame to obtain the ratio and store your result in a new column free_total_ratio.

Mastering DataFrame Mutations with Wine Quality dataMastering DataFrame Mutations with Wine Quality data
Author

Bolaji Bamiro

This project is part of

Intro to Pandas for Data Analysis

Explore other projects