Unscripted Insights: Data Wrangling with F.R.I.E.N.D.S
Unscripted Insights: Data Wrangling with F.R.I.E.N.D.S Data Science Project
Data Wrangling with Pandas

Unscripted Insights: Data Wrangling with F.R.I.E.N.D.S

This project takes you on a nostalgic journey through the iconic episodes of F.R.I.E.N.D.S, brimming with witty dialogues and heartfelt moments. Dive into a series of activities designed to enhance your skills in data wrangling with pandas, more specifically 'Aggregations with `groupby()`'. Let’s grab a coffee at Central Perk and unravel the data behind the laughter and tears!
Start this project
Unscripted Insights: Data Wrangling with F.R.I.E.N.D.SUnscripted Insights: Data Wrangling with F.R.I.E.N.D.S
Project Created by

Vidhi Shah

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

input

Who Talks the Most?

Who's the chattiest among the F.R.I.E.N.D.S group? Let’s start simple: Identify the character who speaks the most throughout the series.

codevalidated

Seasonal Dialogue Sum

Want to know how talkative the group was each season? Using word_count column, calculate the total number of words spoken by all characters in each season. Use the .groupby() and .sum() methods.

For this activity, code to calculate word_count is already provided in the notebook.

Store the result in seasonal_word_sum variable.

The result should match the following output:

Activity_2

input

Average Episode Length

Curious about the pace of the show? Calculate the average number of scenes per episode across all seasons. Use nunique() to avoid counting repeated scenes.

codevalidated

Shortest and Longest Dialogues

Ever wondered who had the briefest or the most extended things to say? Find the shortest and longest dialogues spoken by any character using .min() and .max()

Store the result in dialogue_lengths variable.

The result should match the following output:

Activity_4

codevalidated

Comprehensive Character Stats

Dive deeper into dialogue details! Calculate multiple statistics (mean, standard deviation, minimum, maximum, and median) for dialogue lengths of each character using .agg(). Remember to use the previously calculated dialogue_length column.

Store the result in char_stats variable.

multiplechoice

Understanding .groupby() Function

Why would you use the .groupby() function when analyzing dialogue data from "F.R.I.E.N.D.S"?

codevalidated

Custom Aggregation: Unique Words

Explore the vocabulary range of the characters. Define and use a custom aggregation function to count unique words spoken by each character using agg().

Store the result in unique_words_per_character variable. The result should match the following output:

Activity_7

codevalidated

Phoebe’s Family History

Phoebe's family stories are as complex as they are entertaining. Use a custom aggregation function to summarize the most frequently mentioned family members. Store the result in family_mentions variable.

The result should match the following output:

Activity_8

multiplechoice

Using Aggregation Functions

What is the advantage of using .agg() with multiple aggregation functions in a pandas DataFrame?

codevalidated

The One Where Joey Speaks

How much does Joey talk each season? Let's find out by counting the number of lines Joey speaks each season. Store the result in the variable joey_lines.

The result should match the following output:

Activity_10

codevalidated

Chandler's Job Mystery

Throughout the series, Chandler's job remained a subject of confusion and humor. Your task is to explore dialogues where Chandler or others try to describe his profession, highlighting how the confusion about his job role builds throughout the series.

Filter out for words like job or work using str.contains method.

Don't forget to use size() at the end to count the number of rows.

Store the result in chandler_job_explanations variable.

The result should match the following output:

Activity_11

codevalidated

Flashback Flashes

F.R.I.E.N.D.S often took us down memory lane with flashbacks. Identify episodes with the most references to past events. Store the result in the variable flashback_mentions.

Don't forget to use size() at the end to count the number of rows.

The result should match the following output:

Activity_12

codevalidated

The One with the Catchphrases

"how you doin'?" Joey's catchphrase is legendary. Find the number of times Joey uses his famous line compared to others.

Dont forget to use size() at the end to count the number of rows.

Store the result in joey_catchphrases variable.

The result should match the following output:

Activity_13 Store the result in variable : joey_catchphrases

input

Ross's Weddings

Ah, Ross and his weddings—always a spectacle! Dive into the data to find out which wedding episode had the most dialogue. Was it Emily’s, Rachel’s, or perhaps Carol’s? Enter the Episode number.

multiplechoice

Filtering with filter()

How does the .filter() method differ from .loc[] in the context of pandas group operations?

codevalidated

Monica's Cleaning Episodes

Monica’s cleanliness is legendary. Calculate the number of times Monica mentions "clean," "dust," or "soap" in each season. Who knew cleaning could be this fun to analyze? Store the result in cleaning_mentions variable.

The result should match the following output:

Activity_16

codevalidated

The One with the Longest Monologue

Identify the season-wise longest dialogue that was witnessed. Was it during one of Ross's scientific explanations or Monica's detailed anecdotes?

Store the result in max_dialogue_length variable.

The result should match the following output:

Activity_17

codevalidated

The One with the Routine

Ross and Monica’s routine dance is unforgettable. Find out which season had the most dance or music-related dialogues using the apply() method. Store the results in dance_music_dialogues variable.

For this activity use the friends_info_df dataset.

The result should match the following output:

Activity_18

codevalidated

Chandler in a Box

When Chandler spends time in a box as penance, the conversation around him varies dramatically. Evaluate the average number of words spoken by each character in this episode to see who talks most while he's boxed up.

Note : For finding the solution of chandler_box use the friends_info_df dataset.

Filter with the exact season and epsiode number, that you get from the chandler_box.

Store the result in avg_words_per_character variable.

The result should match the following output:

Activity_19

multiplechoice

Transformations with .transform()

Why is the .transform() method important in data processing with pandas?

codevalidated

Dialogue Transformation

Normalize dialogue lengths for each character by subtracting the mean and dividing by the standard deviation using .transform().

Remember to use the previously calculated dialogue_length column.

The result should match the following output:

Activity_21

multiplechoice

Pivoting Data with pivot_table()

Why is the .pivot_table() function beneficial when analyzing dialogue frequency by characters and episodes in "F.R.I.E.N.D.S"?

codevalidated

Central Perk Coffee Talks

Let's take a seat at Central Perk! Analyze how the dynamic of conversations at the famous coffee shop changes from season to season. Using a pivot table, we'll summarize the number of dialogues each character has in Central Perk across all seasons.

Store the result in central_perk_pivot variable.

Note Scene Directions is included as a character in the dataset and is therefore considered in the solution as you can see in the output image.

The result should match the following output:

Activity_23

codevalidated

The One with All the Thanksgivings

F.R.I.E.N.D.S gave us some memorable Thanksgiving episodes. Dive into these special episodes to see how dialogue contributions vary among the characters. Create a pivot table to display the count of dialogues per character for each Thanksgiving episode by season. Store the result in thanksgiving_pivot variable.

The result should match the following output:

Activity_24

Unscripted Insights: Data Wrangling with F.R.I.E.N.D.SUnscripted Insights: Data Wrangling with F.R.I.E.N.D.S
Project Created by

Vidhi Shah

This project is part of

Data Wrangling with Pandas

Explore other projects