Clean, Aggregate and Analyze Cricket dataset using Pandas and Pivot Tables
Clean, Aggregate and Analyze Cricket dataset using Pandas and Pivot Tables Data Science Project
Data Wrangling with Pandas

Clean, Aggregate and Analyze Cricket dataset using Pandas and Pivot Tables

Explore Asia Cup cricket data from 1984 to 2022 in 14 Python and Pandas activities. Clean, analyze, and gain insights into match outcomes, player performances, and team strategies. Enhance Python and Pandas skills, learn data manipulation, statistical analysis, sports analytics, and different ways to group the data.
Start this project
Clean, Aggregate and Analyze Cricket dataset using Pandas and Pivot TablesClean, Aggregate and Analyze Cricket dataset using Pandas and Pivot Tables
Project Created by

Manoj Kumar

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Standardize Match Outcomes

If you perform a quick value_counts() method on the Result column, you will realize that the column contains incoherent values. Make corrections as follows:

  • Transform win to Win.
  • Change ' Win D/L' to Win.
  • Change ' Lose D/L' to Lose.

Note: You should notice that there is space in ' Win D/L' and ' Lose D/L'.

codevalidated

Remove matches with `No Result` from the `Result` column

After cleaning the Win and Lose entries, you will realize that there are a few rows with No Result in the Result column. Remove them.

codevalidated

Create a `Victorious_Team` Column based on Match Outcome

  • Use the Team value if the Result is Win.
  • Use the Opponent if the Result is Lose.

​Your solution should look like the image given below. ​ Activity_03

codevalidated

Create a Pivot Table to Aggregate Team Victories per Year

Calculate and rank the number of victories each team achieves annually. Group the cricket data by Year and Victorious_Team, then sort the results in descending order of both Victories and Year. Subsequently, index the sorted results by both Year and Victorious_Team and store them in a variable named victories_count_per_year.

Your solution should look like the image given below.

Activity_04

codevalidated

Find the team with Most Victories per Year

Determine the team that has the most victories each year from the victories_count_per_year DataFrame, which we generated in the 4th activity. Arrange the data in descending order and store these in the most_victories_per_year variable. Be sure to reset the index.

Your solution should look like the image given below.

Activity_05

codevalidated

Calculate Total Runs Scored by Each Team

Calculate the total runs scored by each team in the dataset by grouping the data by the Team column and summing the Run Scored for each group. Store it in the variable total_runs_by_team.

Your solution should be like this image which is given below.

Activity_06

input

Identify the Team with the Highest Average Batting Strike Rate

Identify the team with the highest average batting strike rate. To do this, First remove rows with missing data in Avg Bat Strike Rate column.Then, group the data by the Team column, calculate the mean of the Avg Bat Strike Rate column for each team, and enter your answer in the input box below as a string.

codevalidated

Count the Number of Toss Wins by Each Team

Find out how many times each team has won the toss. Filter the dataset to include only rows where the Toss column indicates a successful win, then group the data by team to determine the count of toss wins for each team. Store this information in the variable toss_wins_by_team.

Your solution should be like this image which is given below.

Activity_08

codevalidated

Analyze Team's Dominance Over Opponents Using Historical Match Data

Analyze the dominance of each team over its opponents using the historical match data. Create a variable named team_wins_sorted to store and display the count of wins for each team against every opponent. Group the dataset by Team and Opponent columns, then reset the index by the Wins column and sort the dataframe by Wins in descending order and Team in ascending order, Your solution should look like the image given below.

Activity_09

input

Identify the Top performer in `Mirpur` ground

Find the player who has received the most Player of the Match awards at the Mirpur ground, along with the total number of awards they have won there.

Write the answer in this way: player_name, number of awards. For example, Rohit Sharma, 6.

input

Determine Most Frequent Winners of 'Player of the Match' in ODI and T20i Formats

Identify the players who have won the most Player of the Match awards in both the ODI and T20i game formats. Your answer should be written in this format: Player of the Match in ODI, Player of the Match in T20i. For example, if Rohit Sharma has won the most awards in ODI format and Virat Kohli has won the most awards in T20i format, your answer would be: Rohit Sharma, Virat Kohli.

input

Identify Players with Fewest 'Player of the Match' Awards in ODI and T20i Formats

Identify the players who have won the fewest Player of the Match awards in both the ODI and T20i game formats Your answer should be written in this format: Player of the Match in ODI, Player of the Match in T20i. For example, if Karn Sharma has won the fewest awards in ODI format and Virat Kohli has won the fewest awards in T20i format, your answer would be: Karn Sharma, Virat Kohli.

codevalidated

Analyze the Winners by Ground

Determine which team has won the most matches at each cricket ground. Store it in the variable most_wins_per_ground.

Your solution should look like the image given below.

Activity_12

input

Find the Player with the Most 'Player of the Match' Awards in Losing Matches

Find the player who has won the most Player of the Match awards in matches where their team lost. Write the Player's name in input box.

Clean, Aggregate and Analyze Cricket dataset using Pandas and Pivot TablesClean, Aggregate and Analyze Cricket dataset using Pandas and Pivot Tables
Project Created by

Manoj Kumar

I am working in datawars as a content author

I am working in datawars as a content author

This project is part of

Data Wrangling with Pandas

Explore other projects