LEGO Database: A Pandas Joining and Merging Project
LEGO Database: A Pandas Joining and Merging Project Data Science Project
Data Wrangling with Pandas

LEGO Database: A Pandas Joining and Merging Project

This lab is for more practicing and working with dataframes and performing merging using the Lego Database. In this activity, we will dive into the Lego Database, a real-world dataset, and explore how we can combine and analyze data from different tables to answer specific queries. Throughout this activity, you will gain hands-on experience working with Python dataframes, learn how to merge tables based on common columns.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

List Spare Parts in Inventory

Create a list of unique part names that are spare in the inventory. Store this list, sorted in ascending lexicographical order, in the variable spare_part_names.

Your list should look like this:

['1/4 CIRCLE TILE 1X1',
 '1/4 CIRCLE TILE 1X1 with Lattice Pie Print',
 '1/4 CIRCLE TILE 1X1 with Pizza Print',
 '1/4 CIRCLE TILE 1X1 with Watermelon Print',
 '10240stk01',
 '10240stk02',
 '10240stk03',
 '3.2 shaft w/ knob',
 '3D Glasses Atlantis',
 'Adventurers Mini Comic Book 1',
 'Adventurers Mini Comic Book 2',
 ...
]
codevalidated

Count Parts with Quantity Over 5 for Each Transparent Color

Calculate, for each transparent color, the number of parts in the inventory with a quantity greater than 5. Store the resulting series of counts in the variable trans_colors.

Your series should look like this:

Activity 2

codevalidated

Identify Theme(s) with the Highest Number of Sets (1970-2000)

Determine which theme or themes had the most sets released between the years 1970 and 2000. Store the list of themes with the highest set counts in the variable themes_with_max_count_names.

codevalidated

Count the Number of Parts in Each Category

Calculate the number of parts belonging to each category and store the resulting series of counts in the variable categories.

codevalidated

Count the Total Parts by Color

Utilizing the colors_df and inventory_parts_df dataframes, calculate the total number of parts in the inventory for each color_id. Store the resulting series of counts in the variable colors.

codevalidated

Calculate Average Parts per Set for Each Theme and Find the Theme(s) with the Highest Average

Compute the average number of parts per set for each theme and identify the theme or themes with the highest average. Store the resulting list of theme names in the variable highest_average_themes.

codevalidated

Identify Inventory with the Most Sets and Highest Version

Determine the inventory with the maximum number of sets and the highest version simultaneously. Store the resulting inventory name in the variable inventory_name and the version in the variable version. The set_num column in the inventories_df dataframe represents the name of the inventory.

codevalidated

Identify Inventory with the Highest Total Quantity Across Part Categories

Determine the inventory or inventories with the highest total quantity of parts across all part categories. Store the resulting list of inventory names in the variable inventory_names. The set_num column in the inventories_df dataframe represents the name of the inventory.

Hint: Use thegroupby() operation to group the data by inventory_id and part_cat_id. Calculate the total quantity of parts for each inventory and part category.

codevalidated

Determine the parent theme with the highest number of child themes.

Store the resulting list in the variable parent_themes_with_highest_count.

Hint: Think of merging themes_df with itself using suitable suffixes. Count the number of unique child themes for each parent theme. Finally, find the parent theme with the maximum number of child themes.

codevalidated

Identify the set numbers of the inventory with the highest total quantity of spare parts

Store the resulting list of unique set numbers in the variable set_nums_inventory_with_highest_spare_parts.

codevalidated

Determine the inventory that contains sets with the highest total number of unique colors

Store the resulting list in the variable inventory_names.

codevalidated

Calculate the total number of sets per theme category and year.

Store the resulting series in the variable total_sets_per_theme_year.

LEGO Database: A Pandas Joining and Merging ProjectLEGO Database: A Pandas Joining and Merging Project
Author

Anurag Verma

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥 When I'm not geeking out over AI 🤖 with my classmates or building neural networks, 🧠 you can find me buried in statistics textbooks. 📚 I know, what a nerd! 🤓 I'm always down to learn new ways to speak human 🫂 and computer 💻. Making tech more fun is my jam! 🍇 If you want a cheery data buddy 😎 who can make difficult things easy-peasy 🥝 and learning a party 🎉, I'm your guy! 🙋‍♂️ Let's chat codes 👨‍💻, numbers 🧮, and machines 🤖 over coffee! ☕ I'd love to meet more techy humans. 💁‍♂️ Can't wait to talk! 🗣️

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥 When I'm not geeking out over AI 🤖 with my classmates or building neural networks, 🧠 you can find me buried in statistics textbooks. 📚 I know, what a nerd! 🤓 I'm always down to learn new ways to speak human 🫂 and computer 💻. Making tech more fun is my jam! 🍇 If you want a cheery data buddy 😎 who can make difficult things easy-peasy 🥝 and learning a party 🎉, I'm your guy! 🙋‍♂️ Let's chat codes 👨‍💻, numbers 🧮, and machines 🤖 over coffee! ☕ I'd love to meet more techy humans. 💁‍♂️ Can't wait to talk! 🗣️

This project is part of

Data Wrangling with Pandas

Explore other projects