Intro to Pandas for Data Analysis

# Pandas Capstone Project: Working with custom data + titanic bonus

In this project you'll apply all the previously learned techniques involving Pandas for Data Analysis, including: statistical analysis and question/answering, filtering (using boolean and comparison operators), creating new columns, plotting and much more. All this with a dataset containing information about the Titanic Disaster.
Project Created by

## Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

### Creating dataframe from dictionary

In this activity, you will create your custom dataframe from the dictionary of the below data:

Name Age Sex
Alice 25 F
Bob 30 M
Charlie 45 M
Diana 20 F
Emma 28 F
Frank 50 M
Grace 32 F
Henry 37 M
Isabella 23 F
Jack 42 M
Karen 29 F
Liam 31 M
Maria 48 F
Nathan 27 M
Olivia 36 F
Peter 41 M

Don't worry you didn't need to write all the data, just copy below dictionary and create a new dataframe name `df` from it.

``````data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Emma', 'Frank', 'Grace', 'Henry', 'Isabella', 'Jack', 'Karen', 'Liam', 'Maria', 'Nathan', 'Olivia', 'Peter'],
'Age': [25, 30, 45, 20, 28, 50, 32, 37, 23, 42, 29, 31, 48, 27, 36, 41],
'Sex': ['F', 'M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M']
}
``````
multiplechoice

Select all the possible the correct answer from the the below options.

codevalidated

### Creating New Columns

Create a new colunm in above dataframe named `Status`, with the following data.

Name Age Sex Status
Alice 25 F Student
Bob 30 M Worker
Charlie 45 M Worker
Diana 20 F Student
Emma 28 F Student
Frank 50 M Retiree
Grace 32 F Worker
Henry 37 M Worker
Isabella 23 F Student
Jack 42 M Worker
Karen 29 F Student
Liam 31 M Worker
Maria 48 F Retiree
Nathan 27 M Student
Olivia 36 F Worker
Peter 41 M Worker

Below is the list of values for Status column:

``````['Student', 'Worker', 'Worker', 'Student', 'Student', 'Retiree', 'Worker', 'Worker', 'Student', 'Worker', 'Student', 'Worker', 'Retiree', 'Student', 'Worker', 'Worker']
``````
codevalidated

### Renaming Columns

In this activity, you will rename columns in a DataFrame. Rename as below:

• Name: Full Name
• Age: Years Old
• Sex: Gender
codevalidated

### Drop single row

Drop rows from DataFrame which have index 4.

codevalidated

### Drop multiple rows

Drop rows from DataFrame which have index 7 and 9.

codevalidated

### Drop row with condition

Drop rows from DataFrame which have 'Bob' as `Full Name`.

codevalidated

### Add new row to dataframe

Add new row at the end of DataFrame with below values:

``````'Full Name': 'Emma'
'Years Old': 28,
'Gender': 'F'
'Status': 'Student'
``````
codevalidated

Add new row to DataFrame as below data:

Full Name Years Old Gender Status
Bob 30 M Worker
Emma 28 F Student
Henry 37 M Worker
Jack 42 M Worker

Make sure to use `ignore_index=True` while appending the rows otherwise you will fail to pass the activity.

codevalidated

### Filter the data

Filter data with `Gender` as `F`, `Status` as `Student` and `Years Old'` greater than 20 years old. After filtering store data in new DataFrame name `Filter_Data`.

multiplechoice

### Select all correct options

From the previous created dataframe `Filter_Data`, analyse `Filter_Data` and check all correct options.

multiplechoice

### Select all correct options

From the previous created dataframe `Filter_Data`, analyse `Filter_Data` and check all correct options.

multiplechoice

### Select all correct options

From the previous created dataframe `Filter_Data`, analyse `Filter_Data` and check all correct options.

multiplechoice

### Create Basic Plots: Line Chart

• Create a line chart between `Full Name` and `Years Old`. Mark `Full Name` on x-axis and `Years Old` on y-axis for `df` dataframe.

• Based on line chart check all correct options.

multiplechoice

### Create Basic Plots: Bar Chart

Create a bar chart of the `Gender` and based on bar chart check all correct options for `df` dataframe.

multiplechoice

### Create Basic Plots: Pie Chart

Create a pie chart for 'Gender' and choose all correct options for `df` dataframe.. Also add `autopct='%1.1f%%'` to show percentage on pie chart.

codevalidated

Read the Titanic dataset from a CSV file into a pandas DataFrame. Store the results in dataframe named `df`.

multiplechoice

### Getting Information About the Dataset

Use the `info()` method to get information about the data types in the dataset. Check all the correct answers.

multiplechoice

### Getting Basic Statistical Information

Use the `describe()` method to get basic statistical information about the numeric columns in the dataset. Select minimum and maximum Fare from the information.

codevalidated

### Calculating Basic Statistics for a Column

Calculate the mean, median, and standard deviation of the 'Age' column.

• Store the mean in a variable named `age_mean`
• Store the median in a variable named `age_median`
• Store the standard deviation in a variable named `age_std_deviation`
codevalidated

### Creating a New Column

Create a new column called 'Family Size' that combines the 'Siblings/Spouses Aboard' and 'Parents/Children Aboard' columns.

codevalidated

### Renaming a Column

Rename the 'Fare' column to 'Ticket Price'.

codevalidated

### Rename multiple columns.

Rename the columns of a DataFrame as below:

``````'Pclass' -> 'Passenger Class'
'Name' -> 'Full Name'
``````
codevalidated

### Dropping Rows

Drop the rows in the dataset where the 'Age' is less than 18 years old.

codevalidated

Create a new row in the dataset. Add the following values in respective columns:

• Survived: 0
• Passenger Class: 3
• Full Name: 'Harry'
• Sex: 'male'
• Age: 30
• Siblings/Spouses Aboard: 0
• Parents/Children Aboard: 2
• Ticket Price: 50.00
• Family Size: 3

Use below data to create the new row:

``````df2 = {'Survived':  0,
'Passenger Class': 3,
'Full Name': 'Harry',
'Sex': 'male',
'Age': 30,
'Siblings/Spouses Aboard': 0,
'Parents/Children Aboard': 2,
'Ticket Price': 50.00,
'Family Size': 3
}
``````
codevalidated

### Filtering the Dataset

• Filter the dataset to only include passengers who were in first class and paid more than \$100 for their ticket.

• Store the filtered dataset in `filtered_df` variable.

codevalidated

### Creating a Bar Chart

Create a bar chart showing the number of passengers in each class(`Pclass`).

Note:

• Read `titanic.csv` file again and store it in `new_df` variable.
• Store the counts of different classes in `counts` variable.
codevalidated

### Creating a Pie Chart

Create a pie chart showing the percentage of male and female passengers in the dataset.

Note:

• Read `titanic.csv` file again and store it in `new_df` variable.
• Store the counts of male and female in `gender_counts` variable.
Project Created by

#### Anurag Verma

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥

This project is part of

## Intro to Pandas for Data Analysis

Explore other projects