Spaceship Titanic

codevalidated

Train and test split

First, use train_test_split to split the data into training and testing sets. Split the dataset in 80% training, 20% testing, and random_state=0.

Store the values in the variables in X_train,X_test,y_train, y_test,random_state .

codevalidated

Classification Model

Train a Classification model using the training data, and store the model in clf.

For this task, you could use the following models:

- Logistic Regression. 
- Decision Tree Classifier. 
- Naive Bayes Classifier.
- Random Forest Classifier. 
- XGBoost.

However, do not use neural networks for this assessment.

Calculate the accuracy of both the training and testing sets, and the macro-average precision, recall and F1-score of the testing set and run the code in a Jupyter Notebook.

Store the results in the variables train_accuracy, test_accuracy, test_precision, test_recall and test_f1score.

Note: The expected evaluation metrics for a simple problem varies depending on the specifics of the problem and data. However, for a well-defined and simple problem with a large and diverse training dataset, a well-trained machine learning model could achieve an precision and recall of over 80% in some cases.

Verónica Barraza

Project Activities

Train and test split

Classification Model

Verónica Barraza

Classification in Depth with Scikit-Learn

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database