All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Rename the columns in the dataset for consistency. Change type_race to race_type and has_immortality to is_immortal.
Transform the binary values in the ability-related columns has_shapeshifting, has_telepathy, has_regeneration, is_immortal, has_teleportation into descriptive labels. Replace 1 with Yes and 0 with No.
Select the first 10 rows of the dataframe df and store it in a variable named first_ten_rows.
Drop columns that have more than 30% of their values missing.
Clean the dataset by dropping all rows containing any NaN values. After removing these rows, reset the df's index to ensure it remains sequential.
Extract numeric height values in centimeters from mixed string formats and convert them into numeric (int) data type.
Extract all superheroes who belong to the human race from the race_type column. Store the resultant dataframe in df_main_race.
Filter the dataset to include only those superheroes who have a Good alignment and are part of the Marvel Comics. Store the resultant dataframe in marvel_good_alignment.
Sort df first by overall_score in descending order, then by intelligence_score. Store the sorted dataframe in df_sorted.
Filter the dataset to find superheroes who have both super speed and super strength. Store the filtered dataframe in df_superpowers.
Group the dataset by creator and calculate the average overall_score for each creator. Store the result in a variable named average_scores_by_creator.
Group the dataset by race and calculate the average intelligence_score for each race. Store the result in race_intelligence.
Create a function called categorize_score that categorizes superheroes based on their overall_score into five categories:
Then, create a new column named score_category to store these categories.
Use the apply function to create a new column power_index, which is the sum of intelligence_score, speed_score, and durability_score.
Use apply to count the number of superpowers each superhero has and add it as a new column num_superpowers.
Create a histogram to visualize the distribution of overall_score. Store the plot in a variable named overall_score_histogram.
Generate a correlation matrix to explore the relationships between different scores like intelligence_score, speed_score, and durability_score. Store the correlation matrix in correlation_matrix.
Create a scatter plot of height against power_score to explore any potential relationship. Store the plot in height_vs_power_plot.
Group the dataset by gender and analyze the average number of superpowers each gender possesses. Store the result in gender_superpowers.
Create a pie chart showing the distribution of alignments (Good, Bad, etc.) across the dataset. Store the chart in alignment_pie_chart.
Answer this question by looking at the pie chart plotted in 25th activity.
Visualize the distribution using a boxplot. Store the result in height_boxplot.