Data Wrangling in Action: Analyzing Kindle Books Data
Data Wrangling in Action: Analyzing Kindle Books Data Data Science Project
Data Wrangling with Pandas

Data Wrangling in Action: Analyzing Kindle Books Data

Explore a vast dataset of thousands of books in this exciting project. Challenge yourself with activities that hone your data filtering, transformation, and visualization skills using Pandas. Ready to dive into the world of literature through data? Let's begin your wrangling adventure!
Start this project
Data Wrangling in Action: Analyzing Kindle Books DataData Wrangling in Action: Analyzing Kindle Books Data
Project Created by

Lohith Unnam

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Perform a Left Join on Two DataFrames Using a Common Column

Merge DataFrames df1 and df2 based on the common column ASIN using a left join. Store the result in the df variable.

multiplechoice

Which of the following column uniquely identify each row in our data?

input

What is the number of unique book categories present in our dataset?

Enter the number of categories of books present in the DataFrame df.

codevalidated

Filter and Sort High-Rated Books by Price

Filter the DataFrame df to include only books with a rating higher than 4.5. After filtering, sort the books by price in descending order and store the top five rows in top_rated_books.

codevalidated

Filter and Sort Recent Bestsellers by Rating

Filter the DataFrame df to include only bestseller books published from 2020 onwards. After filtering, sort the books by rating in descending order and store the top ten rows in recent_bestsellers.

multiplechoice

Which of the following is NOT a valid aggregation function in pandas?

multiplechoice

What does the following code do?

   df.groupby('Category_Name')['Price'].mean()
input

Find the Seller with the Highest Average Price

Enter the name of the seller with the highest average book price.

codevalidated

Group and Analyze Reviews by Author

Group the DataFrame df by Author and calculate the total number of reviews and the average rating for each author. After grouping, sort the result by the total number of reviews in descending order and store the top five authors in top_authors_by_reviews.

codevalidated

Create a Summary DataFrame that shows the following for each category of Books

  • The number of books
  • The average rating
  • The percentage of books that are bestsellers

Store the resultant dataframe in category_summary_df

multiplechoice

What is the primary purpose of binning in data analysis?

codevalidated

Create Dummy Variables for Book Categories

Create a new DataFrame containing dummy variables for the Category_Name column. Convert the categorical values in Category_Name into binary dummy variables. Store the resultant DataFrame in the variable category_dummies.

multiplechoice

Which method would you use to create bins with an equal number of items in each bin?

codevalidated

Categorize Prices into Bins

Create a new column Price_Category by binning the Price column into 5 equal-width bins.

Use the following bins and labels:

  • 0-140: Very Low
  • 140-280: Low
  • 280-420: Medium
  • 420-560: High
  • 560-700: Very High

Make sure to set include_lowest=True

multiplechoice

What's the main difference between `apply()` and `applymap()`?

codevalidated

Extract Year from `Published_Date`

Use a lambda function with the apply() method to extract the year from the Published_Date column. Store the extracted year in published_year_series variable.

codevalidated

Classify Titles by Length

Calculate the length of each book title in terms of the number of words and store it in a new column Title_Length. Determine the median title length. Classify each title as either Long Title or Short Title based on Title_Length. If the Title_Length is greater than or equal to the median length then classify it as Long Title and if the Title_Length is less than median then classify as Short Title. Store this classification in a new column Title_Category.

codevalidated

Normalize Author and Category Names to Lowercase

Create a new DataFrame df_author_category containing only the Author and Category_Name columns from the original DataFrame df. Convert all text in these columns to lowercase using applymap() and store the result in df_lower_case.

codevalidated

Categorize Books Based on Ratings

Use the apply() method with a custom function to categorize books based on their ratings as follows:

  • 5 stars: Excellent
  • 4 to 4.9 stars: Very Good
  • 3 to 3.9 stars: Good
  • Below 3 stars: Average

And store the categorized rating in Rating_Category column. After that, use groupby() to count how many books fall into each category. Store the result as a series in the rating_counts_series variable.

multiplechoice

Which of the following code snippets would you use to create a histogram with `10` bins for the `Price` column?

codevalidated

Bar Plot of Top 10 Authors by Number of Appearances

Create a bar plot displaying the top 10 authors who appear most frequently in our dataset.

codevalidated

Pie Chart of Bestseller Distribution

Create a pie chart to visualize the distribution of books that are bestsellers versus those that are not.

Data Wrangling in Action: Analyzing Kindle Books DataData Wrangling in Action: Analyzing Kindle Books Data
Project Created by

Lohith Unnam

This project is part of

Data Wrangling with Pandas

Explore other projects