Test and train set for classification
Test and train set for classification Data Science Project
Introduction to Supervised Learning with scikit-learn

Test and train set for classification

After training a model, it is important to evaluate its performance and obtain an estimate of its accuracy when applied to new data. In this project, you will practice splitting the dataset into test and train sets.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Dropping Unnecessary Features

Unnamed: 0 column is unique for every row and will be deviating from the model. So let's just remove it.

You must modify the df_t variable itself.

codevalidated

Separate the target and the features into two variables.

Store the features in X and the target y.

codevalidated

Use train_test_split to split the data into training and testing sets. Split the dataset in 80% training, 20% testing, and random_state=0.

Store the values in the variables in X_train,X_test,y_train, y_test,random_state .

Test and train set for classificationTest and train set for classification
Author

Verónica Barraza

This project is part of

Introduction to Supervised Learning with scikit-learn

Explore other projects