Data Wrangling with Pandas

In this hands-on project, we'll explore a comprehensive dataset of football players from around the world. You'll learn how to use Pandas GroupBy operations to group data by various attributes such as club, division, and nationality. In addition, we will utilize both built-in and custom functions for the purpose of data aggregation. Get ready to dive into the world of data manipulation with football player statistics!

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

Enter the name of the club with the highest number of players whose preferred foot is `Either`

. If multiple clubs have the same number, choose the one that comes first alphabetically.

Calculate the average age of players for each club. Store the results in a dataframe named `avg_age_per_club`

.

Compute the total value of players within each division. Store the result in a dataframe named `total_value_per_division`

.

Find the maximum wage of players from each nation. Store the result in a dataframe named `max_wage_per_nation`

.

Enter the country's three-letter country code (e.g., `FRA`

for France).

Provide the answer in the following format: Nation with Lowest Height, Nation with Highest Weight (e.g., ALB, ZIM).

Enter name of the club which has players with most stamina. If the answer is `Vélez`

enter `Velez`

.

Enter the value rounded off to two decimal points.

Store the result in the variable `avg_market_value`

Store the result in `player_counts_nation_pf`

Store the result in a dataframe named `club_aggregations`

Create a custom function called `age_range`

that computes the difference between the maximum and minimum ages. Apply this custom function using the `agg()`

function to calculate the age range for each nation. Save the results in a dataframe named `age_range_per_nation`

.

Find out the answers for the above questions and Select the correct answer from the options given below.

Create a custom function called `variance()`

that computes the variance of a series. Then, calculate the mean value and the variance of current ability for players within each club. Store the result in a dataframe named `club_statistics`

.

Define a function player_type that classifies players as `Star`

if their current ability exceeds `180`

and their potential ability exceeds `190`

; otherwise, classify them as `Regular`

and create a new column `Player Type`

to store these classifications.

Create a function called `categorize_by_value`

that categorizes players based on their market value into three categories:
- `High`

for values greater than `50,000,000`

- `Medium`

for values between `20,000,001`

and `50,000,000`

- `Low`

for values of `20,000,000`

or `below`

Then, create a new column named `Value Type`

to store these categories.

Create a function called `categorize_by_age`

that classifies players into three age groups:

`Young`

for ages below`25`

`Mid-age`

for ages between`25`

and`29`

`Senior`

for ages`30`

and`above`

.

Then, create a new column named `Age Group`

to store these classifications.

Provide your answer in the format: `Player1, Player2`

(e.g., `Lionel Messi, Cristiano Ronaldo`

).

Define a function `calculate_bmi`

that computes the Body Mass Index (BMI) of a player using their height and weight. First, convert the player's height from centimeters to meters. Then, apply the `BMI formula: weight (kg) divided by height (m) squared`

. Create a new column `BMI`

to store the calculated BMI values.

Use the `groupby`

method to group players by their nation and then apply the transform method with a ranking function to assign a rank to each player's market value within their nation. The ranking is done in `descending`

order, so the player with the highest value gets `rank 1`

. Create a new column `Value Rank`

to store these ranks.

Create a function named `standardize`

that standardizes a series by subtracting the mean and dividing by the standard deviation. Apply this function to standardize the vision ratings within each column. Finally, add a new column named `Standardized Vision`

to store the result.

Define a function named `calculate_percentile`

to compute the percentile rank of each value in a series. Utilize this function to calculate the age percentile values within each club. Then, add a new column named `Age Percentile`

to store these percentile ranks.

Create a function named `deviation_from_mean`

to compute the deviation of each value from the mean. Utilize this function to calculate the mean pace deviation within each club. Finally, add a new column titled `Pace Deviation`

to store the result.

Create a function called `rank_wage`

that sorts player wages in descending order. Apply this function to calculate the ranked wages within each club. Introduce a new column named `Wage Rank`

to store these rankings.

