Feature Engineering

multiplechoice

Did you find any missing value?

codevalidated

Impute the null values of the varibles Car_Owned, Bike_Owned, Active_Loan,House_Own and Child_Count using simple imputer

Replace the original dataset df

codevalidated

Complete the code and impute the null values using the correct or more appropriate approach based on the data type.

Run the completed code in yout notebook

sm = SimpleImputer(strategy="COMPLETE",missing_values=np.nan)
df.iloc[:, 1] = sm.fit_transform(df.iloc[:, 1].values.reshape(-1,1))
df.iloc[:, 6:8] = sm.fit_transform(df.iloc[:, 6:8])
df.iloc[:, 33:36] = sm.fit_transform(df.iloc[:, 33:36])
df.iloc[:, 16:20] = np.round(sm.fit_transform(df.iloc[:, 16:20]))
df.iloc[:, 36:37] = np.round(sm.fit_transform(df.iloc[:, 36:37]))

Sometine there more than one possible solution. If the result is not as expected, think another way to resolve this activity.

codevalidated

Encode

Apply the OneHotEncoder from scikit-learn to encode the categorical columns.

First, store the name of the categorical columns in a variable categorical_columns.

Concatenate the result with the numerical variables in a new dataframe called data_preprocessed. For this task

codevalidated

Normalize the dataset

Normalize the dataset to ensure that all features are on a similar scale. This step is crucial for logistic regression, as it helps prevent certain features from dominating the others in the model's learning process.

You should use StandardScaler to standardize the features and store the results in the variables X_train_scaler and X_test_scaler.

codevalidated

Train a logistic regression model with regularization

Train a logistic regression model with regularization using the normalized dataset. Regularization helps prevent overfitting and improves the model's generalization ability.

You should use the LogisticRegression class from scikit-learn and set the regularization parameter C to control the regularization strength. Store the trained model in the variable logreg_model.

codevalidated

Evaluate the logistic regression model

Evaluate the performance of the trained logistic regression model using the testing dataset.

You should use the predict method of the trained logreg_model to make predictions on the normalized testing data. Calculate and store the predictions in the variable y_pred. Then, utilize appropriate evaluation metrics to assess the model's performance, such as accuracy, precision, recall, and F1-score. Store the results in their respective variables: accuracy, precision, recall, and f1_score.

codevalidated

Grid Search for Logistic Regression with Regularization

Perform a grid search to find the best combination of hyperparameters for logistic regression with regularization. Grid search is a technique that exhaustively searches through a specified set of hyperparameters to find the optimal combination that yields the best model performance.

You should store the best hyperparameters in the variable best_params and the best model in the variable best_model.

Make sure use X_train_scaler.

multiplechoice

True or False Activity

Read the following statements and determine whether each statement is true or false.

multiplechoice

What is feature engineering?

multiplechoice

Verónica Barraza

Project Activities

Did you find any missing value?

Impute the null values of the varibles Car_Owned, Bike_Owned, Active_Loan,House_Own and Child_Count using simple imputer

Complete the code and impute the null values using the correct or more appropriate approach based on the data type.

Encode

Normalize the dataset

Train a logistic regression model with regularization

Evaluate the logistic regression model

Grid Search for Logistic Regression with Regularization

True or False Activity

What is feature engineering?

How can analyzing misclassified instances help improve the model's performance?

Verónica Barraza

Classification in Depth with Scikit-Learn

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database