Intro to Pandas for Data Analysis

This project will guide you through mastering vectorized operations and data analysis techniques using the fascinating Titanic passenger dataset. You'll explore series manipulation, normalization, and standardization methods to analyze data about the passengers, their demographics, and survival outcomes.You'll gain insights into factors that influenced survival rates, analyze passenger demographics, and uncover patterns in the data.

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

multiplechoice

multiplechoice

input

Use a vectorized operation to compute the `mean`

age of all passengers in the `df`

.

Enter the exact value returned.

codevalidated

Create a boolean series named `is_survivor`

using the `Survived`

column , indicating whether each passenger survived.

Your result should look something like this :

multiplechoice

```
df['Age'].fillna(df['Age'].mean(), inplace=True)
```

codevalidated

Calculate the `mean`

age from the `Age`

column and store in a variable named `mean_age`

.

Create a series named `age_difference`

to store age difference from the `mean_age`

.

Your result would look something like this :

codevalidated

Calculate the minimum and maximum values from the `Fare`

column and store them in variables named `fare_min`

and `fare_max`

.

Next, create a series named `normalized_fare`

to store the normalized fare values, calculated using the formula:

(fare - min_fare ) / ( max_fare - min_fare ) .

Your result would look something like this:

codevalidated

Create a series named `family_size`

by calculating the total family size for each passenger.

This is done by summing the values from the `Siblings/Spouses Aboard`

and `Parents/Children Aboard`

columns, and adding 1 (to include the passenger themselves).

Your result would look something like this:

codevalidated

Calculate the fare per family member by dividing the `Fare`

column by the `family_size`

.

Store the result in a series named `fare_per_family_member`

.

Your result would look something like this:

codevalidated

Create a series `fare_weight`

by dividing the `Fare`

values by the maximum fare value.

Then, calculate the weighted age by multiplying the `Age`

column by the `fare_weight`

and store the result in a series named `weighted_age`

.

Your result would look something like this:

codevalidated

Sort the `Fare`

column in `ascending`

order and store it in a series named `sorted_fares`

.

Then, calculate the cumulative fare percentage by taking the cumulative sum of the `sorted_fares`

and dividing it by the total fare sum.

Multiply the result by 100 to express it as a percentage.

Store the final result in a series named `cumulative_fare_percentage`

.

Your result would look something like this:

codevalidated

Calculate the first (Q1) and third (Q3) quartiles of the Fare column.

Then, compute the interquartile range (IQR) as Q3 - Q1.

Identify any outliers where the fare is less than Q1 - 1.5 * IQR or greater than Q3 + 1.5 * IQR.

Store the result as a boolean series named `is_fare_outlier`

.

Your result would look something like this:

codevalidated

Sort the dataset by its index and then calculate the rolling average of the `Fare`

over a window of 10 rows.

The minimum number of periods required for calculation is 1.

Store the result in a series named `rolling_average_fare`

.

Your result would look something like this:

This project is part of

Explore other projects