Banking classification
Banking classification Data Science Project
Classification in Depth with Scikit-Learn

Banking classification

In this project, you will practice using XGBoost model on a given banking dataset related to direct marketing campaigns of a Portuguese banking institution to predict which clients are more likely to subscribe for a long-term deposit. The project aims to validate your skills in using XGBoost for classification problems.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

multiplechoice

Examine the train and test datasets

Examine the test and train dataset above and chose the correct statements.

codevalidated

Check the missing values in the trian dataset and store your result in `training_missing`

multiplechoice

How many columns are there in our train dataset & test dataset

Make sure you have run the code for previewing the train dataset above and written the code for test dataset by replacing the name training_dataset with testing_dataset

codevalidated

Lets remove the redundant columns from the training dataset

As we can see from the our result above and during the EDA that, some of the columns of train dataset are redundant and shoul be removed. Also the test dataset does not contain them. Hence, let remove them from our train dataset.

You can do it by several different ways. But here let's follow the steps defined below:

  1. First get the list of the columns of test dataset into columns_to_retain
  2. Append the Column y of train dataset to columns_to_retain
  3. Finally filter the train dataset to only contain columns_to_retain
codevalidated

Put the last Column of training_dataset y into the variable y_training_data

As we separated our independent variables and put them into x_training_data above, we now need the corresponding target values from our training_dataset. We will store these values in y_training_data and later use during the training of our models.

codevalidated

Define the RandomForestClassifer Model and store it in classifier_random_forest

The steps are the same as we did for XGBoost model above.

codevalidated

**Train the RandomForestClassifier we just created on the data `(x_train,y_train)`**

The steps are the same as we did for XGBoost model above.

input

What is the Accuracy Score for XGBoost Model upto 4 decimal places

We are using accuracy_score(y_pred,y_valid) to find the accruacy score in the code above

codevalidated

Predict the output for x_valid using RandomnForesetClassifier Model

Store the predicted values in the variable y_pred_rnd_frst

codevalidated

Create the Confusion Matrix for our RandomForesetClassifier Model

You have to insert the missing values where indicated:

codevalidated

Find the accuracy_score for randomforest classifer and store your answer in random_forest_accuracy

Banking classificationBanking classification
Author

Jawad haider

This project is part of

Classification in Depth with Scikit-Learn

Explore other projects