Intro to Pandas for Data Analysis

This project will guide you through mastering vectorized operations and data analysis techniques using a captivating cocktail recipe dataset. You'll explore series manipulation, normalization, and standardization methods to analyze data about various cocktails, their ingredients, and preparation techniques.You'll learn to perform ratio and percentage calculations, use aggregation methods, and gain insights into the world of cocktails through the lens of data science.

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Convert all the titles of the `title`

column in `lowercase`

.

Store your results in a variable named `lowercase_titles`

.

Your result should look something like this :

codevalidated

Create a series named `first_words`

that stores the first word from each cocktail `title`

.

Your result should look something like this :

codevalidated

The variety of ingredients in cocktails can vary greatly. Create a new series `ingredient_length_ratio`

by calculating the ratio of the length of each ingredient list (in characters) to the total length of all ingredient lists across all recipes.

This should be done by first calculating the total number of characters in all ingredient lists. Then, for each recipe, divide the length (number of characters) of its ingredient list by the total length.

Your result would look something like this:

multiplechoice

codevalidated

Create a new series `recipe_length_standardized`

by standardizing the number of ingredients (i.e., recipe length) for each recipe.

To do this, first calculate the recipe length as the number of characters in the ingredients string for each recipe.

Then, standardize the length using the formula:

recipe_length_standardized = (recipe_length - mean(recipe_length)) / std(recipe_length)

Where `std : stands for the standard deviation`

of recipe lengths.

Your result would look something like this:

codevalidated

Not all glass types are equally popular.

Create a series `glass_popularity_ratio`

that computes the ratio of each glass type's usage count to the total number of cocktail entries (rows) in the dataset. This will give insights into how often each `glass`

type is used relative to others.

Note that we are considering all entries, even if some recipes are missing values in the

`recipe`

column.

Your result would look something like this :

codevalidated

Assume that each garnish adds a certain value to the drink, based on its frequency of use.

Create a new series `garnish_effectiveness_index`

by dividing the frequency of each garnish by the total number of garnishes across all recipes, then multiplying by 100 to create a percentage.

Your result would look something like this :

codevalidated

Create a series `ingredient_to_garnish_ratio`

by dividing the number of ingredients used in each cocktail by the number of garnishes.
The number of ingredients is determined by counting the commas in the ingredients string and adding 1.

Similarly, the number of garnishes is calculated by counting the commas in the garnish string and adding 1.

To handle missing garnishes, fill those entries with -1 before adding 1, and to avoid division by zero, add a small value (0.1) to the denominator.

Your result should look something like this:

codevalidated

Standardize the usage of each `glass`

type by calculating the difference between the glass usage count and the mean glass usage count, then dividing by the standard deviation of the glass usage count.

Store the result in a new series `glass_usage_standardized`

.

Your result would look something like this :

multiplechoice

codevalidated

The complexity of a cocktail often reflects its ingredient intensity. Create a new series `ingredient_intensity_index`

by calculating the square of the number of characters in the ingredients string for each recipe.

Your result would look something like this:

This project is part of

Explore other projects