Practicing Python Collections using Video Game sales data
Practicing Python Collections using Video Game sales data Data Science Project
Python Collections

Practicing Python Collections using Video Game sales data

Practice your basics of Data Analysis using just nesting and combining (for loops with if-else statements) using the dataset of Video Game Sales. Discover the best-selling video game platforms, popular games among 1990s teens, and dominant publishers using the Video Game Sales dataset. Dive into this exciting project to unleash your data analysis expertise!

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

input

Calculate the Total number of games published by `Nintendo`

Determine and input the aggregate number of games that Nintendo has published. This should be presented as an integer value.

Please note that you should count the games by their unique name, rather than by their rank.

input

Game with Highest Global Sales

Enter the name of a game with the highest global sales as a string.

input

Total number of games released in year 2012

Enter the total number of games released in year 2012 as an integer.

Count unique games by name and not by rank.

codevalidated

What are the top 10 best-selling video games globally, and their corresponding sales figures?

Create a dictionary top_10_games with the name of the game as key and the global sales as value.

If there are multiple games with the same name, then the game with the highest global sales should be considered.

Example expected output:

{'Wii Sports': 82.74,
 'Super Mario Bros.': 40.24,
 'Mario Kart Wii': 35.82,
 'Wii Sports Resort': 33.0,
  ...
}
input

What is the minimum value of the `global_sales` and which `platform` sold such low sales

Your input should be formatted as platform: global_sales e.g Duck Hunt: 28.31.

There is a space after the colon.

codevalidated

Creating Nested Dictionaries for Total Sales and Game Count

Create a dictionary called platform_sales, which will consist of nested dictionaries. Each nested dictionary will have two key-value pairs: one with the key total_sales and its corresponding value, and another with the key game_count and its corresponding value."

{'2600': {'total_sales': 97.08000000000003, 'game_count': 133},
 'PSP': {'total_sales': 296.2799999999948, 'game_count': 1213},
 'XOne': {'total_sales': 141.05999999999995, 'game_count': 213},
 'GC': {'total_sales': 199.3600000000007, 'game_count': 556},
 'WiiU': {'total_sales': 81.86000000000006, 'game_count': 143},
 'GEN': {'total_sales': 28.360000000000003, 'game_count': 27},
 ...
 }
codevalidated

What is the distribution of `global_sales` across different platforms?

Create a list distribution which will have a tuple with the Name of the Platform and its percentage.

Percentage will be the Platform's Global_Sales / Total Global_Sales

If one platform has multiple games, then the platform's global sales will be the sum of all the games' global sales.

Example expected output:

[('Wii', 10.38861311773706),
 ('NES', 2.8145472644843057),
 ('GB', 2.8636479814892892),
 ('DS', 9.220285098043025),
 ('X360', 10.985556766256583),
 ('PS3', 10.737586935172041),...]

input

Which genre has the highest total sales in North America i.e `na_Sales`?

Enter the name of the genre with highest total sales in North America as a string.

codevalidated

Select Publishers with `global_sales` Over 1 Million Copies

Create a dictionary publisher_game_count where the keys are the publishers and the values are the counts of global_sales greater than 1.0 (million).

Example expected output:

{ 'Take-Two Interactive': 91,
 'Sony Computer Entertainment': 147,
 'Activision': 161,
 'Ubisoft': 114,
 'Bethesda Softworks': 22,
 'Electronic Arts': 338,
 'Sega': 74,
 'SquareSoft': 19,
 'Atari': 42,
 '505 Games': 7,...
}

There can be multiple game with same publisher, you have to take the sum of all the games' global sales for that publisher.

codevalidated

Name of the game and its release year

Create a list game_year which will have a tuple with the Name of the Game and its Release Year.

Example expected output:

[('Wii Sports', 2006),
 ('Super Mario Bros.', 1985),
 ('Mario Kart Wii', 2008),
 ('Wii Sports Resort', 2009),
 ('Pokemon Red/Pokemon Blue', 1996),...]
codevalidated

Calculate the total sales (sum of `na_sales`, `eu_sales`, `jp_sales`, `other_sales`) for each platform

Create a new variable named total_sales_by_platform that contains the platform names as keys and their corresponding total sales as values.

Example of expected output:

{'Wii': 0.01,
 'NES': 0.12,
 'GB': 0.12,
 'DS': 0.01,
 'X360': 0.01,
 'PS3': 0.02,
 'PS2': 0.01,
 'SNES': 0.02,
 'GBA': 0.02,
 '3DS': 0.02,..}
codevalidated

Map Game Titles to their Release Years

Create a dictionary year_game where the keys will be the year and the values will be the list of games released in that year.

Example of expected output:

{2006: ['Wii Sports',
  'Gears of War',
  'Pokemon Diamond/Pokemon Pearl',
  'New Super Mario Bros.',
  'Wii Play',
  'Final Fantasy XII',
  'Brain Age: Train Your Brain in Minutes a Day',
   ...],
 2008: ['Mario Kart Wii',
  'Grand Theft Auto IV',
  'Wii Fit',
   ...],
   ...
}
codevalidated

Determine the platforms associated with each publisher

Create a dictionary publisher_platform where the keys will be the publisher and the values will be the list of platforms associated with that publisher.

Example of expected output:

{   'Nintendo': ['Wii', 'NES', 'GB', 'DS', 'SNES', ... ],
    'Microsoft Game Studios': ['X360', 'PC', 'XOne', 'XB', 'XBL', ... ],
    'Take-Two Interactive': ['PS2', 'PS3', 'X360', 'PS4', 'PC', ... ],
    'Sony Computer Entertainment': ['PS', 'PS2', 'PS3', 'PS4', 'PSP', ... ],
    'Activision': ['PS2', 'PS3', 'X360', 'PS', 'Wii', ... ],
     ...
}

If a publisher has multiple games on the same platform, then the platform should only be listed once.

codevalidated

Normalize the `games` collection

we have a variable called games that contains details of various video games, such as their names, platforms, release years, genres, publishers, and sales figures in different regions. However, we noticed that some games share the same year, platform, genre, and publisher, leading to duplicate entries in the dataset.

To address this issue, you will create a new variable called normalized_games in a way that it contains all the unique games' details without any duplications based on the combination of year, platform, genre, and publisher.

Expected structure of the normalized_games variable:

normalized_games = {
    year: {
        platform: {
            genre: {
                publisher: {
                    games: [
                        {
                            'name': name,
                            'na_sales': na_sales,
                            'eu_sales': eu_sales,
                            'jp_sales': jp_sales,
                            'other_sales': other_sales,
                            'global_sales': global_sales
                        },
                        ...
                    ]
                }
            }
        }
    }
}
codevalidated

Identify the top 3 platforms with the highest average sales per game, considering only platforms that have released at least 50 games.

Create a sorted list named top_3_platforms with the top 3 platforms with the highest average sales per game, considering only platforms that have released at least 50 games.

input

Determine the genre that has the highest total sales in North America and Europe combined.

Enter the name of the genre with highest total sales in North America and Europe as a string.

codevalidated

Calculate the average sales for each publisher

Create a dictionary publisher_avg_sales where the keys will be the publisher and the values will be the average sales for each publisher. Average sales will be the Publisher's Global_Sales / Total Global_Sales.

Example of expected output:

{'Nintendo': 0.196,
 'Microsoft Game Studios': 0.104,
 'Take-Two Interactive': 0.055,
 'Sony Computer Entertainment': 0.081,
  ...
}
codevalidated

Find 5 games which have the highest rank among all games, and store the rank, name and genre as tuples in the list `top_ranked`

Create a list top_ranked which will have a tuple containing the rank, names, and the genre of the selected games.

Example of expected output:

[(1, 'Wii Sports', 'Sports'),
 (2, 'Super Mario Bros.', 'Platform'),
 (3, 'Mario Kart Wii', 'Racing'),...]

codevalidated

Determine the Top Genre with the Highest Global Sales

Construct a sorted list named top_genre, consisting of genres with the highest average sales per game. Only consider games that have been released in the last 10 years.

Note: 'last 10 year' corresponds to the last decade based on the latest year present in the dataset.

Expected output:

['Shooter', 
 'Platform',
  ...
]
codevalidated

Calculate the percentage contribution of each genre to the total global sales

Create a dictionary genre_percentage_sales which contains the genre name as keys and the percentage contribution to the global sales as values.

Example of expected output:

{'Platform': 9.319831757177944,
 'Racing': 8.206321661263361,
 'Role-Playing': 10.39601185591744,
 'Puzzle': 2.7459407831900946,
 'Misc': 9.07982117474026,...}

input

Determine How Many Publishers Published the Top 100 Games

In this activity, you are being tasked to compute the number of unique publishers who have had their games ranked in the top 100 list. The count should be returned as an integer. The ranking of the games should be considered in your selection.

codevalidated

How many games are published by each publisher

Create a list publisher_game_count which will have a tuple with the Name of the Publisher and the number of games published by that publisher.

Example expected output:

[('Nintendo', 703),
 ('Microsoft Game Studios', 189),
 ('Take-Two Interactive', 413),
 ('Sony Computer Entertainment', 409),
  ...
]
Practicing Python Collections using Video Game sales dataPracticing Python Collections using Video Game sales data
Author

Anurag Verma

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥 When I'm not geeking out over AI 🤖 with my classmates or building neural networks, 🧠 you can find me buried in statistics textbooks. 📚 I know, what a nerd! 🤓 I'm always down to learn new ways to speak human 🫂 and computer 💻. Making tech more fun is my jam! 🍇 If you want a cheery data buddy 😎 who can make difficult things easy-peasy 🥝 and learning a party 🎉, I'm your guy! 🙋‍♂️ Let's chat codes 👨‍💻, numbers 🧮, and machines 🤖 over coffee! ☕ I'd love to meet more techy humans. 💁‍♂️ Can't wait to talk! 🗣️

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥 When I'm not geeking out over AI 🤖 with my classmates or building neural networks, 🧠 you can find me buried in statistics textbooks. 📚 I know, what a nerd! 🤓 I'm always down to learn new ways to speak human 🫂 and computer 💻. Making tech more fun is my jam! 🍇 If you want a cheery data buddy 😎 who can make difficult things easy-peasy 🥝 and learning a party 🎉, I'm your guy! 🙋‍♂️ Let's chat codes 👨‍💻, numbers 🧮, and machines 🤖 over coffee! ☕ I'd love to meet more techy humans. 💁‍♂️ Can't wait to talk! 🗣️

This project is part of

Python Collections

Explore other projects