Practicing Discretization and Binning with Music data

codevalidated

Replace Missing Values in Columns 'Views', 'Likes', and 'Comments' with Their Respective Medians

Modify the tracks_df DataFrame directly to replace missing values.

codevalidated

Replace Missing Values in Specified Columns with Their Respective Means

Modify the tracks_df Dataframe by replacing the missing values in the Duration_ms, Loudness, Speechiness, Energy, and Tempo columns with their respective means.

codevalidated

Classify track durations into `Short`, `Medium` and `long`.

Using the Duration_ms column, we want to discretize the durations of our tracks into three categories:

Short tracks: those that are between 0 and 180000 ms.
Medium tracks: between 180000 and 300000 ms
Long: above 300000 ms

Create the categories and store them in a new column Duration_Category. It should look similar to:

Duration_Category column

codevalidated

Categorize Track Tempo into Appropriate Bins

Using the Tempo column, we want to discretize the tempo of our tracks into three categories:

Slow tracks: those that are between 0 and 100 bpm.
Medium tracks: between 100 and 140 bpm
Fast: above 140 bpm

Create the categories and store them in a new column Tempo_Category. It should look similar to:

Tempo_Category

codevalidated

Categorize Tracks as 'Viral' or 'Non-Viral' Based on Views Using Quantiles

Using the Views column, we want to discretize the views of our tracks into two categories:

Non-Viral tracks: those that are between 0 and 1,000,000 views.
Viral tracks: above 1,000,000 views

Create the categories and store them in a new column Viral_Category. It should look similar to:

Viral_Category

codevalidated

Create a Grouped Bar Chart to Visualize the Relationship Between `Viral_Category` and `Tempo_Category`

Store the generated chart in a variable named viral_tempo_bar_chart and the organized data in another variable named viral_tempo_counts.

Notes:

Please complete the previous activities before attempting this one.
Your chart should be a stacked bar chart and have a figure size of (10, 6).
Your chart should resemble the following example:

activity6-answer

codevalidated

Create a Grouped Bar Chart Visualizing `Viral_Category` & `Duration_Category` Relation

Store the resulting chart in the variable duration_viral_bar_chart and the grouped data in the variable duration_viral_counts.

Notes:

Ensure completion of previous activities before attempting this one.
Construct a stacked bar chart with a figure size of (10, 6).
Your resulting chart should resemble this example:

activity7-answer

codevalidated

Generate Dummy Variables for the `Album_type` Column with `Track` Prefix

Store the resulting dummy variables in the album_type_dummies variable. Be sure to prefix each variable with Track. Don't forget to convert the dtype of each column in the dummy variables to bool.

codevalidated

Categorize 'Loudness' Column into Predefined Bins

Using the Loudness column, we want to discretize the loudness of our tracks into five categories:

Very Low tracks: those that are between -50 and -35 dB.
Low tracks: between -35 and -20 dB
Moderate tracks: between -20 and -5 dB
High tracks: between -5 and 10 dB
Very High: above 10 dB

Create the categories and store them in a new column Loudness_Category. It should look similar to:

Loudness_Category

input

Calculate the Number of Tracks with a 'Loudness_Category' of 'High'.

Write your answer in the below input box as an integer.

codevalidated

Generate Dummy Variables for 'Artist' Column with 'Genre' Prefix

Generate dummy variables for the 'Artist' column. Each of these variables must be prefixed with 'Genre', using a colon ':' as the separator. Ensure the resultant data is stored in a new variable named genres_dummies. Remember to convert the dtype of each column in the dummy variables to bool.

codevalidated

Categorize Tracks into Five Quantiles Based on Speechiness

Store the categorized results into a new column titled Speechiness_Quantile. The quantiles should be labeled Q1, Q2, Q3, Q4, Q5.

codevalidated

Categorize the 'Energy' Column into Given Bins and Ranges

Using the Energy column, we want to discretize the energy of our tracks into five categories:

Very Low tracks: those that are between 0 and 0.2.
Low tracks: between 0.2 and 0.4
Moderate tracks: between 0.4 and 0.6
High tracks: between 0.6 and 0.8
Very High: above 0.8

Create the categories and store them in a new column Energy_Category. It should look similar to:

Energy_Category

Anurag Verma

Project Activities

Replace Missing Values in Columns 'Views', 'Likes', and 'Comments' with Their Respective Medians

Replace Missing Values in Specified Columns with Their Respective Means

Classify track durations into `Short`, `Medium` and `long`.

Categorize Track Tempo into Appropriate Bins

Categorize Tracks as 'Viral' or 'Non-Viral' Based on Views Using Quantiles

Create a Grouped Bar Chart to Visualize the Relationship Between `Viral_Category` and `Tempo_Category`

Create a Grouped Bar Chart Visualizing `Viral_Category` & `Duration_Category` Relation

Generate Dummy Variables for the `Album_type` Column with `Track` Prefix

Categorize 'Loudness' Column into Predefined Bins

Calculate the Number of Tracks with a 'Loudness_Category' of 'High'.

Generate Dummy Variables for 'Artist' Column with 'Genre' Prefix

Categorize Tracks into Five Quantiles Based on Speechiness

Categorize the 'Energy' Column into Given Bins and Ranges

Anurag Verma

Data Wrangling with Pandas

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database