Hyperparameter Tuning for a Random Forest Classifier

The project will cover topics such as understanding hyperparameters, the impact they have on model performance, and how to tune them to achieve the best results. Using a Random Forest model you will learn how to tune hyperparameters. We will be using the Ghouls, Goblins, and Ghosts dataset, so let's have fun an tune the model.

multiplechoice

Based on this plot and a correlation analysis, which variable present the hightest asociation between them

paiplot

codevalidated

Train and test split

Use `train_test_split√ to split the data into training and testing sets. Split the dataset in 80% training, 20% testing and random_state=0.

Store the values in the variables in X_train,X_test,y_train, y_test,random_state .

multiplechoice

Which value has the Best hyperparameters of max_depth?

multiplechoice

Use GridSearchCV to search over a range of values for max_depth (from 1 to 20) and n_estimators (from 1 to 10) hyperparameters to find the combination that yields the best performance.

For this task use cv=5, and random_state=42 and compute the Best mean score

multiplechoice

True or False: For this example, the best hyperparameter obtained is max_depth = 19

multiplechoice

The best hyperparameters for a given machine learning algorithm will always depend on the specific dataset and problem being addressed.

multiplechoice

If searching among a large number of hyperparameters, you should try values in a grid rather than random values, so that you can carry out the search more systematically and not rely on chance.

multiplechoice

Hyperparameter Tuning for a Random Forest Classifier

Verónica Barraza

Project Activities

Based on this plot and a correlation analysis, which variable present the hightest asociation between them

Train and test split

Which value has the Best hyperparameters of max_depth?

Use GridSearchCV to search over a range of values for max_depth (from 1 to 20) and n_estimators (from 1 to 10) hyperparameters to find the combination that yields the best performance.

True or False: For this example, the best hyperparameter obtained is max_depth = 19

The best hyperparameters for a given machine learning algorithm will always depend on the specific dataset and problem being addressed.

If searching among a large number of hyperparameters, you should try values in a grid rather than random values, so that you can carry out the search more systematically and not rely on chance.

Underfitting can occur when hyperparameters are tuned too much on a small dataset, leading to poor generalization performance on new data.

Verónica Barraza

Classification in Depth with Scikit-Learn

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database