Practicing Discretization and Binning with Music data
Practicing Discretization and Binning with Music data Data Science Project
Data Wrangling with Pandas

Practicing Discretization and Binning with Music data

In this lab on "Transforming Data: Discretization, Binning, and Dummies" using the Spotify and YouTube dataset, you'll master data analysis techniques. Explore the relationships between variables such as loudness, duration, genres, and more. Apply discretization, binning, and dummies to gain deeper insights and make informed decisions based on the data.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Replace Missing Values in Columns 'Views', 'Likes', and 'Comments' with Their Respective Medians

Modify the tracks_df DataFrame directly to replace missing values.

codevalidated

Replace Missing Values in Specified Columns with Their Respective Means

Modify the tracks_df Dataframe by replacing the missing values in the Duration_ms, Loudness, Speechiness, Energy, and Tempo columns with their respective means.

codevalidated

Classify track durations into `Short`, `Medium` and `long`.

Using the Duration_ms column, we want to discretize the durations of our tracks into three categories:

  • Short tracks: those that are between 0 and 180000 ms.
  • Medium tracks: between 180000 and 300000 ms
  • Long: above 300000 ms

Create the categories and store them in a new column Duration_Category. It should look similar to:

Duration_Category column

codevalidated

Categorize Track Tempo into Appropriate Bins

Using the Tempo column, we want to discretize the tempo of our tracks into three categories:

  • Slow tracks: those that are between 0 and 100 bpm.
  • Medium tracks: between 100 and 140 bpm
  • Fast: above 140 bpm

Create the categories and store them in a new column Tempo_Category. It should look similar to:

Tempo_Category

codevalidated

Categorize Tracks as 'Viral' or 'Non-Viral' Based on Views Using Quantiles

Using the Views column, we want to discretize the views of our tracks into two categories:

  • Non-Viral tracks: those that are between 0 and 1,000,000 views.
  • Viral tracks: above 1,000,000 views

Create the categories and store them in a new column Viral_Category. It should look similar to:

Viral_Category

codevalidated

Create a Grouped Bar Chart to Visualize the Relationship Between `Viral_Category` and `Tempo_Category`

Store the generated chart in a variable named viral_tempo_bar_chart and the organized data in another variable named viral_tempo_counts.

Notes:

  • Please complete the previous activities before attempting this one.

  • Your chart should be a stacked bar chart and have a figure size of (10, 6).

  • Your chart should resemble the following example:

activity6-answer

codevalidated

Create a Grouped Bar Chart Visualizing `Viral_Category` & `Duration_Category` Relation

Store the resulting chart in the variable duration_viral_bar_chart and the grouped data in the variable duration_viral_counts.

Notes:

  • Ensure completion of previous activities before attempting this one.

  • Construct a stacked bar chart with a figure size of (10, 6).

  • Your resulting chart should resemble this example:

activity7-answer

codevalidated

Generate Dummy Variables for the `Album_type` Column with `Track` Prefix

Store the resulting dummy variables in the album_type_dummies variable. Be sure to prefix each variable with Track. Don't forget to convert the dtype of each column in the dummy variables to bool.

codevalidated

Categorize 'Loudness' Column into Predefined Bins

Using the Loudness column, we want to discretize the loudness of our tracks into five categories:

  • Very Low tracks: those that are between -50 and -35 dB.
  • Low tracks: between -35 and -20 dB
  • Moderate tracks: between -20 and -5 dB
  • High tracks: between -5 and 10 dB
  • Very High: above 10 dB

Create the categories and store them in a new column Loudness_Category. It should look similar to:

Loudness_Category

input

Calculate the Number of Tracks with a 'Loudness_Category' of 'High'.

Write your answer in the below input box as an integer.

codevalidated

Generate Dummy Variables for 'Artist' Column with 'Genre' Prefix

Generate dummy variables for the 'Artist' column. Each of these variables must be prefixed with 'Genre', using a colon ':' as the separator. Ensure the resultant data is stored in a new variable named genres_dummies. Remember to convert the dtype of each column in the dummy variables to bool.

codevalidated

Categorize Tracks into Five Quantiles Based on Speechiness

Store the categorized results into a new column titled Speechiness_Quantile. The quantiles should be labeled Q1, Q2, Q3, Q4, Q5.

codevalidated

Categorize the 'Energy' Column into Given Bins and Ranges

Using the Energy column, we want to discretize the energy of our tracks into five categories:

  • Very Low tracks: those that are between 0 and 0.2.
  • Low tracks: between 0.2 and 0.4
  • Moderate tracks: between 0.4 and 0.6
  • High tracks: between 0.6 and 0.8
  • Very High: above 0.8

Create the categories and store them in a new column Energy_Category. It should look similar to:

Energy_Category

Practicing Discretization and Binning with Music dataPracticing Discretization and Binning with Music data
Author

Anurag Verma

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥 When I'm not geeking out over AI 🤖 with my classmates or building neural networks, 🧠 you can find me buried in statistics textbooks. 📚 I know, what a nerd! 🤓 I'm always down to learn new ways to speak human 🫂 and computer 💻. Making tech more fun is my jam! 🍇 If you want a cheery data buddy 😎 who can make difficult things easy-peasy 🥝 and learning a party 🎉, I'm your guy! 🙋‍♂️ Let's chat codes 👨‍💻, numbers 🧮, and machines 🤖 over coffee! ☕ I'd love to meet more techy humans. 💁‍♂️ Can't wait to talk! 🗣️

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥 When I'm not geeking out over AI 🤖 with my classmates or building neural networks, 🧠 you can find me buried in statistics textbooks. 📚 I know, what a nerd! 🤓 I'm always down to learn new ways to speak human 🫂 and computer 💻. Making tech more fun is my jam! 🍇 If you want a cheery data buddy 😎 who can make difficult things easy-peasy 🥝 and learning a party 🎉, I'm your guy! 🙋‍♂️ Let's chat codes 👨‍💻, numbers 🧮, and machines 🤖 over coffee! ☕ I'd love to meet more techy humans. 💁‍♂️ Can't wait to talk! 🗣️

This project is part of

Data Wrangling with Pandas

Explore other projects