Pandas Capstone Project: Working with custom data + titanic bonus
Pandas Capstone Project: Working with custom data + titanic bonus Data Science Project
Intro to Pandas for Data Analysis

Pandas Capstone Project: Working with custom data + titanic bonus

In this project you'll apply all the previously learned techniques involving Pandas for Data Analysis, including: statistical analysis and question/answering, filtering (using boolean and comparison operators), creating new columns, plotting and much more. All this with a dataset containing information about the Titanic Disaster.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Creating dataframe from dictionary

In this activity, you will create your custom dataframe from the dictionary of the below data:

Name Age Sex
Alice 25 F
Bob 30 M
Charlie 45 M
Diana 20 F
Emma 28 F
Frank 50 M
Grace 32 F
Henry 37 M
Isabella 23 F
Jack 42 M
Karen 29 F
Liam 31 M
Maria 48 F
Nathan 27 M
Olivia 36 F
Peter 41 M

Don't worry you didn't need to write all the data, just copy below dictionary and create a new dataframe name df from it.

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Emma', 'Frank', 'Grace', 'Henry', 'Isabella', 'Jack', 'Karen', 'Liam', 'Maria', 'Nathan', 'Olivia', 'Peter'],
    'Age': [25, 30, 45, 20, 28, 50, 32, 37, 23, 42, 29, 31, 48, 27, 36, 41],
    'Sex': ['F', 'M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M']
}
multiplechoice

Answering Basic Statistical Questions

Select all the possible the correct answer from the the below options.

codevalidated

Creating New Columns

Create a new colunm in above dataframe named Status, with the following data.

Name Age Sex Status
Alice 25 F Student
Bob 30 M Worker
Charlie 45 M Worker
Diana 20 F Student
Emma 28 F Student
Frank 50 M Retiree
Grace 32 F Worker
Henry 37 M Worker
Isabella 23 F Student
Jack 42 M Worker
Karen 29 F Student
Liam 31 M Worker
Maria 48 F Retiree
Nathan 27 M Student
Olivia 36 F Worker
Peter 41 M Worker

Below is the list of values for Status column:

['Student', 'Worker', 'Worker', 'Student', 'Student', 'Retiree', 'Worker', 'Worker', 'Student', 'Worker', 'Student', 'Worker', 'Retiree', 'Student', 'Worker', 'Worker']
codevalidated

Renaming Columns

In this activity, you will rename columns in a DataFrame. Rename as below:

  • Name: Full Name
  • Age: Years Old
  • Sex: Gender
codevalidated

Drop single row

Drop rows from DataFrame which have index 4.

codevalidated

Drop multiple rows

Drop rows from DataFrame which have index 7 and 9.

codevalidated

Drop row with condition

Drop rows from DataFrame which have 'Bob' as Full Name.

codevalidated

Add new row to dataframe

Add new row at the end of DataFrame with below values:

'Full Name': 'Emma'
'Years Old': 28, 
'Gender': 'F'
'Status': 'Student'
codevalidated

Add multiple rows

Add new row to DataFrame as below data:

Full Name Years Old Gender Status
Bob 30 M Worker
Emma 28 F Student
Henry 37 M Worker
Jack 42 M Worker

Make sure to use ignore_index=True while appending the rows otherwise you will fail to pass the activity.

codevalidated

Filter the data

Filter data with Gender as F, Status as Student and Years Old' greater than 20 years old. After filtering store data in new DataFrame name Filter_Data.

multiplechoice

Select all correct options

From the previous created dataframe Filter_Data, analyse Filter_Data and check all correct options.

multiplechoice

Select all correct options

From the previous created dataframe Filter_Data, analyse Filter_Data and check all correct options.

multiplechoice

Select all correct options

From the previous created dataframe Filter_Data, analyse Filter_Data and check all correct options.

multiplechoice

Create Basic Plots: Line Chart

  • Create a line chart between Full Name and Years Old. Mark Full Name on x-axis and Years Old on y-axis for df dataframe.

  • Based on line chart check all correct options.

multiplechoice

Create Basic Plots: Bar Chart

Create a bar chart of the Gender and based on bar chart check all correct options for df dataframe.

multiplechoice

Create Basic Plots: Pie Chart

Create a pie chart for 'Gender' and choose all correct options for df dataframe.. Also add autopct='%1.1f%%' to show percentage on pie chart.

codevalidated

Reading the Titanic Dataset

Read the Titanic dataset from a CSV file into a pandas DataFrame. Store the results in dataframe named df.

multiplechoice

Getting Information About the Dataset

Use the info() method to get information about the data types in the dataset. Check all the correct answers.

multiplechoice

Getting Basic Statistical Information

Use the describe() method to get basic statistical information about the numeric columns in the dataset. Select minimum and maximum Fare from the information.

codevalidated

Calculating Basic Statistics for a Column

Calculate the mean, median, and standard deviation of the 'Age' column.

  • Store the mean in a variable named age_mean
  • Store the median in a variable named age_median
  • Store the standard deviation in a variable named age_std_deviation
codevalidated

Creating a New Column

Create a new column called 'Family Size' that combines the 'Siblings/Spouses Aboard' and 'Parents/Children Aboard' columns.

codevalidated

Renaming a Column

Rename the 'Fare' column to 'Ticket Price'.

codevalidated

Rename multiple columns.

Rename the columns of a DataFrame as below:

'Pclass' -> 'Passenger Class'
'Name' -> 'Full Name'
codevalidated

Dropping Rows

Drop the rows in the dataset where the 'Age' is less than 18 years old.

codevalidated

Adding a New Row

Create a new row in the dataset. Add the following values in respective columns:

  • Survived: 0
  • Passenger Class: 3
  • Full Name: 'Harry'
  • Sex: 'male'
  • Age: 30
  • Siblings/Spouses Aboard: 0
  • Parents/Children Aboard: 2
  • Ticket Price: 50.00
  • Family Size: 3

Use below data to create the new row:

df2 = {'Survived':  0, 
       'Passenger Class': 3, 
       'Full Name': 'Harry', 
       'Sex': 'male', 
       'Age': 30, 
       'Siblings/Spouses Aboard': 0, 
       'Parents/Children Aboard': 2, 
       'Ticket Price': 50.00, 
       'Family Size': 3
    }
codevalidated

Filtering the Dataset

  • Filter the dataset to only include passengers who were in first class and paid more than $100 for their ticket.

  • Store the filtered dataset in filtered_df variable.

codevalidated

Creating a Bar Chart

Create a bar chart showing the number of passengers in each class(Pclass).

Note:

  • Read titanic.csv file again and store it in new_df variable.
  • Store the counts of different classes in counts variable.
codevalidated

Creating a Pie Chart

Create a pie chart showing the percentage of male and female passengers in the dataset.

Note:

  • Read titanic.csv file again and store it in new_df variable.
  • Store the counts of male and female in gender_counts variable.
Pandas Capstone Project: Working with custom data + titanic bonusPandas Capstone Project: Working with custom data + titanic bonus
Author

Anurag Verma

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥 When I'm not geeking out over AI 🤖 with my classmates or building neural networks, 🧠 you can find me buried in statistics textbooks. 📚 I know, what a nerd! 🤓 I'm always down to learn new ways to speak human 🫂 and computer 💻. Making tech more fun is my jam! 🍇 If you want a cheery data buddy 😎 who can make difficult things easy-peasy 🥝 and learning a party 🎉, I'm your guy! 🙋‍♂️ Let's chat codes 👨‍💻, numbers 🧮, and machines 🤖 over coffee! ☕ I'd love to meet more techy humans. 💁‍♂️ Can't wait to talk! 🗣️

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥 When I'm not geeking out over AI 🤖 with my classmates or building neural networks, 🧠 you can find me buried in statistics textbooks. 📚 I know, what a nerd! 🤓 I'm always down to learn new ways to speak human 🫂 and computer 💻. Making tech more fun is my jam! 🍇 If you want a cheery data buddy 😎 who can make difficult things easy-peasy 🥝 and learning a party 🎉, I'm your guy! 🙋‍♂️ Let's chat codes 👨‍💻, numbers 🧮, and machines 🤖 over coffee! ☕ I'd love to meet more techy humans. 💁‍♂️ Can't wait to talk! 🗣️

This project is part of

Intro to Pandas for Data Analysis

Explore other projects