Premier League Match Analysis
Premier League Match Analysis Data Science Project
Data Wrangling with Pandas

Premier League Match Analysis

In this project, you'll use your Data Cleaning and Data Analysis skills to answer question about the performance of different teams in the Premier league. The analysis will require a good understanding of `groupby` operations and Pivot Tables.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.


Replace invalid values from the `season` column

Identify invalid values in the season column and replace them with the string Unknown season (data imputation).

IMPORTANT: If by any reason you think you have incorrectly modified the original dataframe, just go ahead and read it again.


Identify invalid values in goals scored

Analyze the columns home_goals and away_goals and answer: how many invalid values each contains?

Hint: Use a visualization to help you in the process!


Replace invalid goals for 0

Replace all the invalid goals in home_goals and away_goals with 0 (data imputation).


Identify and clean invalid results in the `result` column

The result column contains a "summary" of the result of the match. H indicates a home win; A indicates an away win; D indicates a draw.

Identify and clean the values assigning the correct result.


What's the average number of goals per match?

Calculate the average number of goals per match. Enter the value with up to 2 decimals. Example, if you find the value to be 1.8857, enter just 1.88.


Create a new column `total_goals`

For the previous activity, it would have been convenient to have a total_goals column with the sum of home_goals and away_goals.

Create the column now.


Calculate average goals per season

Calculate the number of average goals per season. The result should be a series ordered per season. Store the value in the variable goals_per_season. It'll look something like:


What's the biggest goal difference in a match?

What was the biggest goal difference in a match found in the dataset?

Note: Goal diff can be either from a home win, or an away win. Example: a 10-1 result or a 1-10 result are the same difference, 9 goals for the winning team.


What's the team with most away wins?

Find the team that has won the most matches away from home.


What's the team with the most goals scored at home?


What's the team that received the least amount of goals while playing at home?

This is a tricky activity, because we're not looking for the "total" of goals received, but the "ratio" of received goals / played goals.

Example, the team Charlton Athletic is the team with LITERALLY the least goals received at home, with only 20, but that's because they only played 38 matches in total, and only 19 at home.

What's the team with the lowest goals received to match played ratio? Defined as: goals_received / home_games.


What's the team with most goals scored playing as a visitor (away from home)?

What's the team that playing away from home scored the most goals?

Premier League Match AnalysisPremier League Match Analysis

Santiago Basulto

This project is part of

Data Wrangling with Pandas

Explore other projects