Spaceship Titanic
Spaceship Titanic Data Science Project
Classification in Depth with Scikit-Learn

Spaceship Titanic

The project involves working with a fictional dataset based on the traditional Titanic dataset to gain practical experience and validate one's skills for the final assessment.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Train and test split

First, use train_test_split to split the data into training and testing sets. Split the dataset in 80% training, 20% testing, and random_state=0.

Store the values in the variables in X_train,X_test,y_train, y_test,random_state .

codevalidated

Classification Model

Train a Classification model using the training data, and store the model in clf.

For this task, you could use the following models:

- Logistic Regression. 
- Decision Tree Classifier. 
- Naive Bayes Classifier.
- Random Forest Classifier. 
- XGBoost.

However, do not use neural networks for this assessment.

Calculate the accuracy of both the training and testing sets, and the macro-average precision, recall and F1-score of the testing set and run the code in a Jupyter Notebook.

Store the results in the variables train_accuracy, test_accuracy, test_precision, test_recall and test_f1score.

Note: The expected evaluation metrics for a simple problem varies depending on the specifics of the problem and data. However, for a well-defined and simple problem with a large and diverse training dataset, a well-trained machine learning model could achieve an precision and recall of over 80% in some cases.

Spaceship TitanicSpaceship Titanic
Author

Verónica Barraza

This project is part of

Classification in Depth with Scikit-Learn

Explore other projects