In this project, you will practice concepts related to overfitting and underfitting. You will explore techniques to mitigate overfitting and underfitting issues, enhancing your understanding of model generalization.

Project Activities

Use train_test_split to split the data into training and testing sets. Split the dataset in 25% testing, and random_state=42.

Store the variables in X_test, X_train, y_train and y_test.


Log normalization

For this task implement the log normalization to the Monetary (c.c. blood), and store the new variable called monetary_log and the other features in X_train_normed and X_test_normed.


Impact of Variable Transformation on Decision Tree

In the previous page, we performed a variable transformation on the numerical features (logarithmic transformation) to improve their distribution.

Based on this scenario, select the correct statement:


Model Performance Evaluation

Evaluate the model's performance on the training and testing sets. Based on this performance, select the correct statement.


Model Fit Evaluation

Evaluate the model's fit based on its performance and select the correct term that corresponds to the given scenario.


Validation Curve Evaluation

To assess model performance and find the optimal hyperparameter value, we can plot a validation curve. Based on this concept, select the correct statement:


Validation Curve

Based on the following figure, identify the best max_depth hyperparameter to train a decision tree.



Compute precision, recall and f1-score using the test dataset

Store the precision, recall, and f1-score of the positive class in the variables precision, recall, and f_1_score

