Data at Sea: Series Operations on the Titanic Dataset
Data at Sea: Series Operations on the Titanic Dataset Data Science Project
Intro to Pandas for Data Analysis

Data at Sea: Series Operations on the Titanic Dataset

This project will guide you through mastering vectorized operations and data analysis techniques using the fascinating Titanic passenger dataset. You'll explore series manipulation, normalization, and standardization methods to analyze data about the passengers, their demographics, and survival outcomes.You'll gain insights into factors that influenced survival rates, analyze passenger demographics, and uncover patterns in the data.
Start this project
Data at Sea: Series Operations on the Titanic DatasetData at Sea: Series Operations on the Titanic Dataset
Project Created by

Vidhi Shah

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

multiplechoice

What is the primary advantage of using vectorized operations in pandas compared to traditional loops?

multiplechoice

In the Titanic dataset, which of the following operations would `NOT` be considered a vectorized operation?

input

What is average age of passengers?

Use a vectorized operation to compute the mean age of all passengers in the df.

Enter the exact value returned.

codevalidated

Who were the survivors ?

Create a boolean series named is_survivor using the Survived column , indicating whether each passenger survived.

Your result should look something like this :

img2

multiplechoice

What does the following code do?

df['Age'].fillna(df['Age'].mean(), inplace=True)
codevalidated

Calculating the age difference

Calculate the mean age from the Age column and store in a variable named mean_age.

Create a series named age_difference to store age difference from the mean_age.

Your result would look something like this :

img6

codevalidated

Normalize the `Fare` column

Calculate the minimum and maximum values from the Fare column and store them in variables named fare_min and fare_max.

Next, create a series named normalized_fare to store the normalized fare values, calculated using the formula:

(fare - min_fare ) / ( max_fare - min_fare ) .

Your result would look something like this:

img7

codevalidated

Calculate family size

Create a series named family_size by calculating the total family size for each passenger.

This is done by summing the values from the Siblings/Spouses Aboard and Parents/Children Aboard columns, and adding 1 (to include the passenger themselves).

Your result would look something like this:

img8

codevalidated

Calculate Fare Per Family Member

Calculate the fare per family member by dividing the Fare column by the family_size.

Store the result in a series named fare_per_family_member.

Your result would look something like this:

img9

codevalidated

Calculate Weighted Age Using Fare Weight

Create a series fare_weight by dividing the Fare values by the maximum fare value.

Then, calculate the weighted age by multiplying the Age column by the fare_weight and store the result in a series named weighted_age.

Your result would look something like this:

img10

codevalidated

Calculate Cumulative Fare Percentage

Sort the Fare column in ascending order and store it in a series named sorted_fares.

Then, calculate the cumulative fare percentage by taking the cumulative sum of the sorted_fares and dividing it by the total fare sum.

Multiply the result by 100 to express it as a percentage.

Store the final result in a series named cumulative_fare_percentage.

Your result would look something like this:

img11

codevalidated

Identify Fare Outliers Using IQR

Calculate the first (Q1) and third (Q3) quartiles of the Fare column.

Then, compute the interquartile range (IQR) as Q3 - Q1.

Identify any outliers where the fare is less than Q1 - 1.5 * IQR or greater than Q3 + 1.5 * IQR.

Store the result as a boolean series named is_fare_outlier.

Your result would look something like this:

img12

codevalidated

Calculate Rolling Average of Fare

Sort the dataset by its index and then calculate the rolling average of the Fare over a window of 10 rows.

The minimum number of periods required for calculation is 1.

Store the result in a series named rolling_average_fare.

Your result would look something like this:

img13

Data at Sea: Series Operations on the Titanic DatasetData at Sea: Series Operations on the Titanic Dataset
Project Created by

Vidhi Shah

This project is part of

Intro to Pandas for Data Analysis

Explore other projects