All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Identify invalid values in the
season column and replace them with the string
Unknown season (data imputation).
IMPORTANT: If by any reason you think you have incorrectly modified the original dataframe, just go ahead and read it again.
Analyze the columns
away_goals and answer: how many invalid values each contains?
Hint: Use a visualization to help you in the process!
Replace all the invalid goals in
0 (data imputation).
result column contains a "summary" of the result of the match.
H indicates a home win;
A indicates an away win;
D indicates a draw.
Identify and clean the values assigning the correct result.
Calculate the average number of goals per match. Enter the value with up to 2 decimals. Example, if you find the value to be
1.8857, enter just
For the previous activity, it would have been convenient to have a
total_goals column with the sum of
Create the column now.
Calculate the number of average goals per season. The result should be a series ordered per season. Store the value in the variable
goals_per_season. It'll look something like:
What was the biggest goal difference in a match found in the dataset?
Note: Goal diff can be either from a home win, or an away win. Example: a
10-1 result or a
1-10 result are the same difference,
9 goals for the winning team.
Find the team that has won the most matches away from home.
This is a tricky activity, because we're not looking for the "total" of goals received, but the "ratio" of received goals / played goals.
Example, the team
Charlton Athletic is the team with LITERALLY the least goals received at home, with only 20, but that's because they only played 38 matches in total, and only 19 at home.
What's the team with the lowest goals received to match played ratio? Defined as:
goals_received / home_games.
What's the team that playing away from home scored the most goals?
Data Wrangling with Pandas