Spotify Data Explorer: Honing DataFrame Mutation Techniques

codevalidated

Rename the `acousticness` column to `acoustic_level`

The acousticness column in the DataFrame represents the acoustic level of each song. To make the column name more descriptive and readable, use the rename() function to change the column name from acousticness to acoustic_level. Set the inplace parameter to True to modify the DataFrame directly without creating a new copy. This renaming operation will update the column name in the original DataFrame df.

codevalidated

Rename Multiple Columns Using the `rename()` Function

Rename multiple columns in the DataFrame df to make them more descriptive, concise, and easily understandable. Change 'danceability' to 'dance_score', 'duration_ms' to 'duration_milliseconds', 'instrumentalness' to 'instrumental', 'liveness' to 'live_performance', and 'speechiness' to 'speech_presence'. Assign the resulting DataFrame with the renamed columns back to the variable df to update the original DataFrame.

codevalidated

Add a new column called `duration_seconds` that converts the `duration_milliseconds` column from milliseconds to seconds

Convert the duration_milliseconds column values to seconds and store the result in a new column named duration_seconds.

Note : New column added at the end of the df

codevalidated

Add a new column called `popularity_score` that multiplies the `popularity` column by 0.01

Rescale the values in the popularity column by multiplying them with 0.01 and store the rescaled values in a new column named popularity_score.

Note : New column added at the end of the df

codevalidated

Add a new column called `is_popular` that assigns 1 to songs with `popularity` greater than 70 and 0 otherwise

Create a new column is_popular that contains 1 for rows where the popularity value is greater than 70, and 0 otherwise. Convert the boolean result to integer values, where True becomes 1 and False becomes 0. This new column will indicate whether a song is popular or not, with 1 representing popular songs and 0 representing non-popular songs.

Note : New column added at the end of the df

codevalidated

Add a new column called `artist_count` that counts the number of artists in the `artists` column

Calculate the number of artists for each row by counting the number of commas in the artists column and adding 1, then store the result in a new column named artist_count.

Note : New column added at the end of the df

codevalidated

Add a new column called `duration_minutes` that calculates the duration in minutes from the `duration_seconds` column

Convert the duration_seconds column from seconds to minutes and store the result in a new column named duration_minutes.

Note : New column added at the end of the df

codevalidated

Update the `popularity` column by adding 10 to each value

Increase the values in the popularity column by adding 10 to each value.

codevalidated

Update the `speech_presence` column by multiplying each value by 0.8

Reduce the values in the speech_presence column by multiplying them with 0.8.

Note : Use df.head().T for viewing your df. df.head().T provides a compact way to view the initial rows as columns, making it easier to scan the data horizontally.

codevalidated

Update the `dance_score` column by subtracting 0.1 from each value

Decrease the values in the dance_score column by subtracting 0.1 from each value.

codevalidated

Update the `mode` column by replacing 0 with 'Minor' and 1 with 'Major'

Replace the numerical values in the mode column with textual representations, where 0 is replaced with 'Minor' and 1 is replaced with 'Major'.

codevalidated

Update the `tempo` column by setting values greater than 150 to 150

Limit the maximum value in the tempo column to 150 by clipping any values above 150 to 150.

Note : Use df.head().T for viewing your df. df.head().T provides a compact way to view the initial rows as columns, making it easier to scan the data horizontally.

codevalidated

Replace All Numerical Values in the 'Key' Column with Their Corresponding Note Names

Replace the numerical values in the key column with their corresponding note names, the mappings are:

0 → 'C', 1 → 'C#', 2 → 'D', 3 → 'D#', 4 → 'E', 5 → 'F', 6 → 'F#', 7 → 'G', 8 → 'G#', 9 → 'A', 10 → 'A#', 11 → 'B'

codevalidated

Replace the `explicit` column values 0 and 1 with `Not Explicit` and `Explicit`, respectively

Replace the numerical values in the explicit column with textual representations, where 0 is replaced with 'Not Explicit' and 1 is replaced with 'Explicit'.

Note : Use df.head().T for viewing your df. df.head().T provides a compact way to view the initial rows as columns, making it easier to scan the data horizontally.

codevalidated

Replace the `year` column values before 1950 with 1950

For rows where the year value is less than 1950, replace the year value with 1950.

codevalidated

Replace the `tempo` column values above 150 with 150 and values below 50 with 50

Limit the tempo column values between 50 and 150. For values exceeding 150, replace them with 150, and for values below 50, replace them with 50.

Note : Use df.head().T for viewing your df. df.head().T provides a compact way to view the initial rows as columns, making it easier to scan the data horizontally.

Dhrubaraj Roy

Project Activities

Rename the `acousticness` column to `acoustic_level`

Rename Multiple Columns Using the `rename()` Function

Add a new column called `duration_seconds` that converts the `duration_milliseconds` column from milliseconds to seconds

Add a new column called `popularity_score` that multiplies the `popularity` column by 0.01

Add a new column called `is_popular` that assigns 1 to songs with `popularity` greater than 70 and 0 otherwise

Add a new column called `artist_count` that counts the number of artists in the `artists` column

Add a new column called `duration_minutes` that calculates the duration in minutes from the `duration_seconds` column

Update the `popularity` column by adding 10 to each value

Update the `speech_presence` column by multiplying each value by 0.8

Update the `dance_score` column by subtracting 0.1 from each value

Update the `mode` column by replacing 0 with 'Minor' and 1 with 'Major'

Update the `tempo` column by setting values greater than 150 to 150

Replace All Numerical Values in the 'Key' Column with Their Corresponding Note Names

Replace the `explicit` column values 0 and 1 with `Not Explicit` and `Explicit`, respectively

Replace the `year` column values before 1950 with 1950

Replace the `tempo` column values above 150 with 150 and values below 50 with 50

Dhrubaraj Roy

Intro to Pandas for Data Analysis

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database