Classifying Legendary Pokemons using Naive Bayes and Decision Trees
Classifying Legendary Pokemons using Naive Bayes and Decision Trees Data Science Project
Classification in Depth with Scikit-Learn

Classifying Legendary Pokemons using Naive Bayes and Decision Trees

The project will likely start with performing EDA (Exploratory Data Analysis) to understand the dataset and feature engineering to extract relevant features from the data. You will then train a Naive Bayes classifier on the prepared dataset to identify Legendary Pokemons, which are a special type of Pokemon that are rare and highly desirable.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

multiplechoice

How many columns of our data are numeric columns?

multiplechoice

How many Pokémons are legendary?

Filter the legendary Pokémon from the original dataset and store them in a new variable called legendaries_df.

multiplechoice

Pokémon stats and lengedary status

If we take a quick look at the data we can identify clear differences between Pokémon stats in their relation to legendary status.

What's the average points of Attack on Legendary Pokémons?

multiplechoice

Legendary Pokémons strength

Legendary Pokémons are more solid than normal Pokémons. Can you validate that?

What is the average percentage of extra Defense points that Legendary Pokémons have compared to normal Pokémons?

multiplechoice

Which are the most common types for Legendary Pokémons?

multiplechoice

Did you find any missing value?

multiplechoice

Did you find any duplicated value?

multiplechoice

Which categorical columns would you remove if you want to reduce noise and improve your model accuracy?

multiplechoice

Which numerical columns would you remove if you want to reduce noise and improve your model accuracy?

multiplechoice

What is the recommended split ratio for train/test data when using scikit-learn?

multiplechoice

Encoding labels

What kind of value we expect to have after encoding labels with LabelEncoder?

multiplechoice

Categorical variables encoding done wrong?

A common mistake is to simply assign a numerical value to each category, ignoring any inherent order. Based on previous discussions, is it a correct decision to use the Label Encoder to transform the "Type 1" and "Type 2" variables?"

multiplechoice

Which sentence is not a benefit of data preprocessing?

multiplechoice

What is the difference between normalization and standardization?

multiplechoice

Evaluating model score

Now that you have predicted whether Pokémons are legendaries or not over the test data, use the accuracy_score function to check your model score.

The score you got is:

multiplechoice

What does a confusion matrix measure?

multiplechoice

Evaluating model score

Having your second model ready, use the accuracy_score function to check your model score over the test data.

The score you got is:

multiplechoice

Which are the most important features on your model?

multiplechoice

Which is a valid reason for our model to get wrong predictions?

Classifying Legendary Pokemons using Naive Bayes and Decision TreesClassifying Legendary Pokemons using Naive Bayes and Decision Trees
Author

Matias Caputti

This project is part of

Classification in Depth with Scikit-Learn

Explore other projects