All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Convert all the titles of the title
column in lowercase
.
Store your results in a variable named lowercase_titles
.
Your result should look something like this :
Create a series named first_words
that stores the first word from each cocktail title
.
Your result should look something like this :
The variety of ingredients in cocktails can vary greatly. Create a new series ingredient_length_ratio
by calculating the ratio of the length of each ingredient list (in characters) to the total length of all ingredient lists across all recipes.
This should be done by first calculating the total number of characters in all ingredient lists. Then, for each recipe, divide the length (number of characters) of its ingredient list by the total length.
Your result would look something like this:
Create a new series recipe_length_standardized
by standardizing the number of ingredients (i.e., recipe length) for each recipe.
To do this, first calculate the recipe length as the number of characters in the ingredients string for each recipe.
Then, standardize the length using the formula:
recipe_length_standardized = (recipe_length - mean(recipe_length)) / std(recipe_length)
Where std : stands for the standard deviation
of recipe lengths.
Your result would look something like this:
Not all glass types are equally popular.
Create a series glass_popularity_ratio
that computes the ratio of each glass type's usage count to the total number of cocktail entries (rows) in the dataset. This will give insights into how often each glass
type is used relative to others.
Note that we are considering all entries, even if some recipes are missing values in the
recipe
column.
Your result would look something like this :
Assume that each garnish adds a certain value to the drink, based on its frequency of use.
Create a new series garnish_effectiveness_index
by dividing the frequency of each garnish by the total number of garnishes across all recipes, then multiplying by 100 to create a percentage.
Your result would look something like this :
Create a series ingredient_to_garnish_ratio
by dividing the number of ingredients used in each cocktail by the number of garnishes.
The number of ingredients is determined by counting the commas in the ingredients string and adding 1.
Similarly, the number of garnishes is calculated by counting the commas in the garnish string and adding 1.
To handle missing garnishes, fill those entries with -1 before adding 1, and to avoid division by zero, add a small value (0.1) to the denominator.
Your result should look something like this:
Standardize the usage of each glass
type by calculating the difference between the glass usage count and the mean glass usage count, then dividing by the standard deviation of the glass usage count.
Store the result in a new series glass_usage_standardized
.
Your result would look something like this :
The complexity of a cocktail often reflects its ingredient intensity. Create a new series ingredient_intensity_index
by calculating the square of the number of characters in the ingredients string for each recipe.
Your result would look something like this: