To become a real Data Scientist and taking the next step in your career, it's not enough with watching videos or reading books/articles. You have to put your skills to a test.
That's what's this Series is about: by Example is a new collaboration between DataWars and FreeCodeCamp focused on solving real life Data Science projects interactively, and encouraging watchers to resolve the projects by themselves.
The heart of this Pandas by Example series is to help YOU practice your skills. Before watching the video resolutions, we encourage you to try to solve the projects by yourself first.
You can create a FREE account just by following this link: https://beta.datawars.io/register
Here's a quick summary of everything that was covered in this By Example Series, divided by their specialities (Introductory, Data Cleaning, Data Wrangling) and inlcuding a sense of difficulty/expertise and estimated time of resolution.
This project introduces the concept of DataFrames and Pandas, but with a real life twist: analyzing English words. We'll do some Q&A around the most interesting words in the language, including calculating new columns with Vectorized Operations.
Click here to start it try it out by yourself.
This project focuses on Data Analysis and question answering. To do so, you'll have to put your Filtering and Sorting dataframes skills to a test.
Click here to start it try it out by yourself.
The birthday problem answers the question: if you put N people in the same room, what is the probability that any pair of people share a birthday. The birthday paradox, on the other hand, asks the question: how many people do we need to put in the same room for that probability to reach 50%. The answer is surprising: only 23 people are enough (N=23).
Even though this project still covers the "basic" aspects of Pandas, its resolution involves a more "original" approach, based on combinatorics.
As usual, we encourage you to try to resolve it by yourself first.
Click here to start it try it out by yourself.
This project deals with one of the most problematic aspects of data cleaning: dealing with Strings. For this project you'll be given two dataframes with company names with different names, and your task will be to use Levenshtein distance to match them and align them.
Click here to start it try it out by yourself.
This project covers pretty much all the aspects of Data Cleaning, including: finding null/missing values, and discussing the strategies to fix them (data imputation, removing them, etc), finding duplicate values, finding outliers, etc.
The dataset comes as the result of Scraping the Google Playstore. After scraping sites, the result is usually messy data. Here's your chance to sort it out. The project finishes with some Data Analysis and question answering.
Click here to start it try it out by yourself.
This project focuses on performing an analysis of Premier League match results. To do so, you'll have to put your Data Wrangling skills to a test, including: merging and joining dataframes and performing analysis using Group By operations and Pivot Tables.
Click here to start it try it out by yourself.
This project combines skills from all the other projects, including: merging and solving data, cleaning it, and analyzing it. The data comes from the 2017 NBA season and it finishes with a thorough analysis of the results and the players.