Data Wrangling with Pandas

This project takes you on a nostalgic journey through the iconic episodes of **F.R.I.E.N.D.S**, brimming with witty dialogues and heartfelt moments. Dive into a series of activities designed to enhance your skills in data wrangling with pandas, more specifically 'Aggregations with `groupby()`'. Let’s grab a coffee at Central Perk and unravel the data behind the laughter and tears!

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

input

Who's the chattiest among the F.R.I.E.N.D.S group? Let’s start simple: Identify the character who speaks the most throughout the series.

codevalidated

Want to know how talkative the group was each season? Using `word_count`

column, calculate the total number of words spoken by all characters in each season. Use the `.groupby()`

and `.sum()`

methods.

For this activity, code to calculate `word_count`

is already provided in the notebook.

Store the result in `seasonal_word_sum`

variable.

The result should match the following output:

input

Curious about the pace of the show? Calculate the average number of scenes per episode across all seasons. Use `nunique()`

to avoid counting repeated scenes.

codevalidated

Ever wondered who had the briefest or the most extended things to say? Find the shortest and longest dialogues spoken by any character using `.min()`

and `.max()`

Store the result in `dialogue_lengths`

variable.

The result should match the following output:

codevalidated

Dive deeper into dialogue details! Calculate multiple statistics (mean, standard deviation, minimum, maximum, and median) for dialogue lengths of each character using `.agg()`

.
Remember to use the previously calculated `dialogue_length`

column.

Store the result in `char_stats`

variable.

multiplechoice

Why would you use the `.groupby()`

function when analyzing dialogue data from "F.R.I.E.N.D.S"?

codevalidated

Explore the vocabulary range of the characters. Define and use a custom aggregation function to count unique words spoken by each character using `agg()`

.

Store the result in `unique_words_per_character`

variable.
The result should match the following output:

codevalidated

Phoebe's family stories are as complex as they are entertaining. Use a custom aggregation function to summarize the most frequently mentioned family members.
Store the result in `family_mentions`

variable.

The result should match the following output:

multiplechoice

What is the advantage of using `.agg()`

with multiple aggregation functions in a pandas DataFrame?

codevalidated

How much does Joey talk each season? Let's find out by counting the number of lines Joey speaks each season.
Store the result in the variable `joey_lines`

.

The result should match the following output:

codevalidated

Throughout the series, Chandler's job remained a subject of confusion and humor. Your task is to explore dialogues where Chandler or others try to describe his profession, highlighting how the confusion about his job role builds throughout the series.

Filter out for words like `job`

or `work`

using `str.contains`

method.

Don't forget to use `size()`

at the end to count the number of rows.

Store the result in `chandler_job_explanations`

variable.

The result should match the following output:

codevalidated

F.R.I.E.N.D.S often took us down memory lane with flashbacks. Identify episodes with the most references to past events.
Store the result in the variable `flashback_mentions`

.

Don't forget to use `size()`

at the end to count the number of rows.

The result should match the following output:

codevalidated

"how you doin'?" Joey's catchphrase is legendary. Find the number of times Joey uses his famous line compared to others.

Dont forget to use `size()`

at the end to count the number of rows.

Store the result in `joey_catchphrases`

variable.

The result should match the following output:

Store the result in variable : `joey_catchphrases`

input

Ah, Ross and his weddings—always a spectacle! Dive into the data to find out which wedding episode had the most dialogue. Was it Emily’s, Rachel’s, or perhaps Carol’s? Enter the Episode number.

multiplechoice

How does the `.filter()`

method differ from `.loc[]`

in the context of pandas group operations?

codevalidated

Monica’s cleanliness is legendary. Calculate the number of times Monica mentions "clean," "dust," or "soap" in each season. Who knew cleaning could be this fun to analyze?
Store the result in `cleaning_mentions`

variable.

The result should match the following output:

codevalidated

Identify the season-wise longest dialogue that was witnessed. Was it during one of Ross's scientific explanations or Monica's detailed anecdotes?

Store the result in `max_dialogue_length`

variable.

The result should match the following output:

codevalidated

Ross and Monica’s routine dance is unforgettable. Find out which season had the most dance or music-related dialogues using the `apply()`

method.
Store the results in `dance_music_dialogues`

variable.

For this activity use the `friends_info_df`

dataset.

The result should match the following output:

codevalidated

When Chandler spends time in a box as penance, the conversation around him varies dramatically. Evaluate the average number of words spoken by each character in this episode to see who talks most while he's boxed up.

Note : For finding the solution of

`chandler_box`

use the`friends_info_df`

dataset.

Filter with the exact `season`

and `epsiode`

number, that you get from the `chandler_box`

.

Store the result in `avg_words_per_character`

variable.

The result should match the following output:

multiplechoice

Why is the `.transform()`

method important in data processing with pandas?

codevalidated

Normalize dialogue lengths for each character by subtracting the mean and dividing by the standard deviation using `.transform()`

.

Remember to use the previously calculated `dialogue_length`

column.

The result should match the following output:

multiplechoice

Why is the `.pivot_table()`

function beneficial when analyzing dialogue frequency by characters and episodes in "F.R.I.E.N.D.S"?

codevalidated

Let's take a seat at Central Perk! Analyze how the dynamic of conversations at the famous coffee shop changes from season to season. Using a `pivot table`

, we'll summarize the number of dialogues each character has in Central Perk across all seasons.

Store the result in `central_perk_pivot`

variable.

Note

`Scene Directions`

is included as a character in the dataset and is therefore considered in the solution as you can see in the output image.

The result should match the following output:

codevalidated

F.R.I.E.N.D.S gave us some memorable Thanksgiving episodes. Dive into these special episodes to see how dialogue contributions vary among the characters. Create a `pivot table`

to display the count of dialogues per character for each Thanksgiving episode by season.
Store the result in `thanksgiving_pivot`

variable.

The result should match the following output:

This project is part of

Explore other projects