Data at Sea: Series Operations on the Titanic Dataset

multiplechoice

What is the primary advantage of using vectorized operations in pandas compared to traditional loops?

multiplechoice

In the Titanic dataset, which of the following operations would `NOT` be considered a vectorized operation?

input

What is average age of passengers?

Use a vectorized operation to compute the mean age of all passengers in the df.

Enter the exact value returned.

codevalidated

Who were the survivors ?

Create a boolean series named is_survivor using the Survived column , indicating whether each passenger survived.

Your result should look something like this :

multiplechoice

What does the following code do?

df['Age'].fillna(df['Age'].mean(), inplace=True)

codevalidated

Calculating the age difference

Calculate the mean age from the Age column and store in a variable named mean_age.

Create a series named age_difference to store age difference from the mean_age.

Your result would look something like this :

codevalidated

Normalize the `Fare` column

Calculate the minimum and maximum values from the Fare column and store them in variables named fare_min and fare_max.

Next, create a series named normalized_fare to store the normalized fare values, calculated using the formula:

(fare - min_fare ) / ( max_fare - min_fare ) .

Your result would look something like this:

codevalidated

Calculate family size

Create a series named family_size by calculating the total family size for each passenger.

This is done by summing the values from the Siblings/Spouses Aboard and Parents/Children Aboard columns, and adding 1 (to include the passenger themselves).

Your result would look something like this:

codevalidated

Calculate Fare Per Family Member

Calculate the fare per family member by dividing the Fare column by the family_size.

Store the result in a series named fare_per_family_member.

Your result would look something like this:

codevalidated

Calculate Weighted Age Using Fare Weight

Create a series fare_weight by dividing the Fare values by the maximum fare value.

Then, calculate the weighted age by multiplying the Age column by the fare_weight and store the result in a series named weighted_age.

Your result would look something like this:

codevalidated

Calculate Cumulative Fare Percentage

Sort the Fare column in ascending order and store it in a series named sorted_fares.

Then, calculate the cumulative fare percentage by taking the cumulative sum of the sorted_fares and dividing it by the total fare sum.

Multiply the result by 100 to express it as a percentage.

Store the final result in a series named cumulative_fare_percentage.

Your result would look something like this:

codevalidated

Identify Fare Outliers Using IQR

Calculate the first (Q1) and third (Q3) quartiles of the Fare column.

Then, compute the interquartile range (IQR) as Q3 - Q1.

Identify any outliers where the fare is less than Q1 - 1.5 * IQR or greater than Q3 + 1.5 * IQR.

Store the result as a boolean series named is_fare_outlier.

Your result would look something like this:

codevalidated

Calculate Rolling Average of Fare

Sort the dataset by its index and then calculate the rolling average of the Fare over a window of 10 rows.

The minimum number of periods required for calculation is 1.

Store the result in a series named rolling_average_fare.

Your result would look something like this:

Vidhi Shah

Project Activities

What is the primary advantage of using vectorized operations in pandas compared to traditional loops?

In the Titanic dataset, which of the following operations would `NOT` be considered a vectorized operation?

What is average age of passengers?

Who were the survivors ?

What does the following code do?

Calculating the age difference

Normalize the `Fare` column

Calculate family size

Calculate Fare Per Family Member

Calculate Weighted Age Using Fare Weight

Calculate Cumulative Fare Percentage

Identify Fare Outliers Using IQR

Calculate Rolling Average of Fare

Vidhi Shah

Intro to Pandas for Data Analysis

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database