Pandas Data Science by Example - FreeCodeCamp video series

Santiago Basulto

To become a real Data Scientist and taking the next step in your career, it's not enough with watching videos or reading books/articles. You have to put your skills to a test.

That's what's this Series is about: by Example is a new collaboration between DataWars and FreeCodeCamp focused on solving real life Data Science projects interactively, and encouraging watchers to resolve the projects by themselves.

Create a free account and Solve the projects by yourself

The heart of this Pandas by Example series is to help YOU practice your skills. Before watching the video resolutions, we encourage you to try to solve the projects by yourself first.

You can create a FREE account just by following this link: https://beta.datawars.io/register

List of projects covered in Pandas By Example

Here's a quick summary of everything that was covered in this By Example Series, divided by their specialities (Introductory, Data Cleaning, Data Wrangling) and inlcuding a sense of difficulty/expertise and estimated time of resolution.

Introductory Projects

DataFrames practice: working with English Word
[Easy / Beginners] - Estimated: 20 mins

This project introduces the concept of DataFrames and Pandas, but with a real life twist: analyzing English words. We'll do some Q&A around the most interesting words in the language, including calculating new columns with Vectorized Operations.

Click here to start it try it out by yourself.

Filtering and Sorting Pokemon Data
[Easy / Beginners] - Estimated: 30 mins

This project focuses on Data Analysis and question answering. To do so, you'll have to put your Filtering and Sorting dataframes skills to a test.

Click here to start it try it out by yourself.

The Birthday Paradox in the NBA
[Medium / Beginners] - Estimated: 30 mins

The birthday problem answers the question: if you put N people in the same room, what is the probability that any pair of people share a birthday. The birthday paradox, on the other hand, asks the question: how many people do we need to put in the same room for that probability to reach 50%. The answer is surprising: only 23 people are enough (N=23).

Even though this project still covers the "basic" aspects of Pandas, its resolution involves a more "original" approach, based on combinatorics.

As usual, we encourage you to try to resolve it by yourself first.

Click here to start it try it out by yourself.

Data Cleaning Projects

Matching Strings by Similarity using Levenshtein distance
[Easy / Intermediate] - Estimated: 25 minutes

This project deals with one of the most problematic aspects of data cleaning: dealing with Strings. For this project you'll be given two dataframes with company names with different names, and your task will be to use Levenshtein distance to match them and align them.

Click here to start it try it out by yourself.

Cleaning Google Playstore data
[Medium / Intermediates] - Estimated: 40 mins

This project covers pretty much all the aspects of Data Cleaning, including: finding null/missing values, and discussing the strategies to fix them (data imputation, removing them, etc), finding duplicate values, finding outliers, etc.

The dataset comes as the result of Scraping the Google Playstore. After scraping sites, the result is usually messy data. Here's your chance to sort it out. The project finishes with some Data Analysis and question answering.

Click here to start it try it out by yourself.

Data Wrangling Projects

Analyzing Premier League match results by combining DataFrames
[Easy / Advanced] - Estimated: 35 minutes

This project focuses on performing an analysis of Premier League match results. To do so, you'll have to put your Data Wrangling skills to a test, including: merging and joining dataframes and performing analysis using Group By operations and Pivot Tables.

Click here to start it try it out by yourself.

Capstone Project: NBA 2017 season analysis
[Hard / Advanced] - Estimated: 45 minutes

This project combines skills from all the other projects, including: merging and solving data, cleaning it, and analyzing it. The data comes from the 2017 NBA season and it finishes with a thorough analysis of the results and the players.

Click here to start it try it out by yourself.

Santiago Basulto
More from DataWars

Latest posts

Replit Teams for Education Deprecation: All you need to know
Product
8 min read

Replit Teams for Education Deprecation: All you need to know

Replit Teams for Education will shut down on August 1st, 2024. Here is all you need to know to find an alternative.
Read post
Hello PlanetPython 👋
Product
8 min read

Hello PlanetPython 👋

We have just been added to PlanetPython's RSS feed.
Read post
Introducing Trooper: A GPT based AI Assistant for Data Science
AI
8 min read

Introducing Trooper: A GPT based AI Assistant for Data Science

Datawars introduces its new GPT-based AI assistant trooper, adapted and fine tuned for each different Data Science project
Read post