Introduction to Feature Engineering with Simulated Dataset
Introduction to Feature Engineering with Simulated Dataset Data Science Project
Introduction to Supervised Learning with scikit-learn

Introduction to Feature Engineering with Simulated Dataset

In this lab, we will explore the concept of feature engineering using a simulated dataset. Feature engineering involves transforming and creating new features from existing data to improve the performance of machine learning models.
Start this project
Introduction to Feature Engineering with Simulated DatasetIntroduction to Feature Engineering with Simulated Dataset
Project Created by

Verónica Barraza

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Encoding Categorical Variables

Use get_dummies function to convert categorical variables into numerical representations that can be understood by machine learning models.

Store the dataframe with the encoded categorical variables and the numerical one in df_encoded.

multiplechoice

Handling Missing Values

Identify and handle missing values in the dataset df_encoded. This can be done by filling missing values with mean, median, or mode, or by removing rows or columns with missing values.

Select the correct code to replace the missing values for the mean value.

codevalidated

Creating New Features

In this activity, we are adding three new derived features to the DataFrame df_filled. The first line calculates the squared value of Numeric and assigns it to the new column Feature1_squared. The second line calculates the cubed value of Numeric and assigns it to the new column Feature2_cubed. Finally, the third line calculates the natural logarithm of Numeric using np.log() function and assigns it to the new column Feature3_log.

Introduction to Feature Engineering with Simulated DatasetIntroduction to Feature Engineering with Simulated Dataset
Project Created by

Verónica Barraza

This project is part of

Introduction to Supervised Learning with scikit-learn

Explore other projects