Data Wrangling with Pandas

In this hands-on project, we'll explore a comprehensive dataset of football players from around the world. You'll learn how to use Pandas GroupBy operations to group data by various attributes such as club, division, and nationality. In addition, we will utilize both built-in and custom functions for the purpose of data aggregation. Get ready to dive into the world of data manipulation with football player statistics!

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

codevalidated

input

Enter the name of the club with the highest number of players whose preferred foot is `Either`

. If multiple clubs have the same number, choose the one that comes first alphabetically.

codevalidated

Calculate the average age of players for each club. Store the results in a dataframe named `avg_age_per_club`

.

codevalidated

Compute the total value of players within each division. Store the result in a dataframe named `total_value_per_division`

.

input

codevalidated

input

codevalidated

Find the maximum wage of players from each nation. Store the result in a dataframe named `max_wage_per_nation`

.

input

Enter the country's three-letter country code (e.g., `FRA`

for France).

codevalidated

input

Provide the answer in the following format: Nation with Lowest Height, Nation with Highest Weight (e.g., ALB, ZIM).

codevalidated

input

Enter name of the club which has players with most stamina. If the answer is `Vélez`

enter `Velez`

.

input

input

codevalidated

input

Enter the value rounded off to two decimal points.

input

codevalidated

Store the result in the variable `avg_market_value`

codevalidated

Store the result in `player_counts_nation_pf`

codevalidated

Store the result in a dataframe named `club_aggregations`

codevalidated

Create a custom function called `age_range`

that computes the difference between the maximum and minimum ages. Apply this custom function using the `agg()`

function to calculate the age range for each nation. Save the results in a dataframe named `age_range_per_nation`

.

multiplechoice

Find out the answers for the above questions and Select the correct answer from the options given below.

codevalidated

Create a custom function called `variance()`

that computes the variance of a series. Then, calculate the mean value and the variance of current ability for players within each club. Store the result in a dataframe named `club_statistics`

.

codevalidated

Define a function player_type that classifies players as `Star`

if their current ability exceeds `180`

and their potential ability exceeds `190`

; otherwise, classify them as `Regular`

and create a new column `Player Type`

to store these classifications.

codevalidated

Create a function called `categorize_by_value`

that categorizes players based on their market value into three categories:
- `High`

for values greater than `50,000,000`

- `Medium`

for values between `20,000,001`

and `50,000,000`

- `Low`

for values of `20,000,000`

or `below`

Then, create a new column named `Value Type`

to store these categories.

codevalidated

Create a function called `categorize_by_age`

that classifies players into three age groups:

`Young`

for ages below`25`

`Mid-age`

for ages between`25`

and`29`

`Senior`

for ages`30`

and`above`

.

Then, create a new column named `Age Group`

to store these classifications.

input

Provide your answer in the format: `Player1, Player2`

(e.g., `Lionel Messi, Cristiano Ronaldo`

).

codevalidated

Define a function `calculate_bmi`

that computes the Body Mass Index (BMI) of a player using their height and weight. First, convert the player's height from centimeters to meters. Then, apply the `BMI formula: weight (kg) divided by height (m) squared`

. Create a new column `BMI`

to store the calculated BMI values.

codevalidated

Use the `groupby`

method to group players by their nation and then apply the transform method with a ranking function to assign a rank to each player's market value within their nation. The ranking is done in `descending`

order, so the player with the highest value gets `rank 1`

. Create a new column `Value Rank`

to store these ranks.

codevalidated

Create a function named `standardize`

that standardizes a series by subtracting the mean and dividing by the standard deviation. Apply this function to standardize the vision ratings within each column. Finally, add a new column named `Standardized Vision`

to store the result.

codevalidated

Define a function named `calculate_percentile`

to compute the percentile rank of each value in a series. Utilize this function to calculate the age percentile values within each club. Then, add a new column named `Age Percentile`

to store these percentile ranks.

codevalidated

Create a function named `deviation_from_mean`

to compute the deviation of each value from the mean. Utilize this function to calculate the mean pace deviation within each club. Finally, add a new column titled `Pace Deviation`

to store the result.

codevalidated

Create a function called `rank_wage`

that sorts player wages in descending order. Apply this function to calculate the ranked wages within each club. Introduce a new column named `Wage Rank`

to store these rankings.

This project is part of

Explore other projects