Supervised machine learning classification: Customer Churn prediction (not guided )
Supervised machine learning classification: Customer Churn prediction (not guided ) Data Science Project
Classification in Depth with Scikit-Learn

Supervised machine learning classification: Customer Churn prediction (not guided )

In this project you'll apply all the previously learned techniques and models involving cleaning, feature engineering, tuning hyperparameters and much more. All this with a dataset containing information about Customer Churn.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Train and test split

First, separate the target and the features into two variables.

Store the features in X and the target y.

Then, use train_test_split to split the data into training and testing sets. Split the dataset in 80% training, 30% testing, and random_state=0.

Store the values in the variables in X_train,X_test,y_train, y_test,random_state .

codevalidated

Classification Model

Train an XGBoost Classification model using the training data, and store the model in model.

Calculate the accuracy of both the training and testing sets, and the macro-average precision, recall and F1-score of the testing set and run the code in a Jupyter Notebook.

Store the results in the variables test_precision, test_recall and test_f1score.

Note: The expected evaluation metrics for a simple problem varies depending on the specifics of the problem and data. For this particular problem, you are required to achieve a precision of over 70%.

codevalidated

Submission

Now, let's submit your predictions for the test dataset to get a score from the platform.

multiplechoice

XGBoost: Tuning Parameters

The RandomizedSearchCV() function takes in the following arguments:

  • estimator: The estimator being fit, here it's XGBoost.
  • param_distributions: Unlike params - this is the distribution of possible hyperparameters to use.
  • cv: Number of cross-validation iterations
  • n_iter: Number of hyperparameter combinations to choose from verbose: Prints more output

For this assessment, load the train_tuning.csv dataset. You do not need to preprocess this dataset. After loading the dataset, follow the instructions and complete the exercise:

  1. Create a parameter grid called rs_param_grid that contains:

    • 'max_depth': list((range(3,12)))
    • 'alpha': [0,0.001, 0.01,0.1,1]
    • 'subsample': [0.5,0.75,1]
    • 'learning_rate': np.linspace(0.01,0.5, 10)
    • 'n_estimators': [10, 25, 40]
  2. Create a RandomizedSearchCV object called xgb_rs, passing in the parameter grid to param_distributions. Also, specify verbose=2, cv=3, and n_iter=5.

  3. Your objective is to maximize F1-score.

  4. Fit the RandomizedSearchCV object to X and y.

What are the best parameters?

Supervised machine learning classification: Customer Churn prediction (not guided )Supervised machine learning classification: Customer Churn prediction (not guided )
Author

Verónica Barraza

This project is part of

Classification in Depth with Scikit-Learn

Explore other projects