Modifying DataFrames: creating columns and more
Modifying DataFrames: creating columns and more Data Science Project
Intro to Pandas for Data Analysis

Modifying DataFrames: creating columns and more

In this project you'll learn how to modify DataFrames by creating columns, modifying their type or deleting them. Modifications or mutations are usually not recommended in Data Analysis, so you'll also learn the potential dangers of them as well as to build the required criteria to judge by yourself when it's ok and when it's not to modify a DataFrame.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Create a new column: `Revenue in $`

The column Revenue is expressed in millions of dollars. Create a new one, Revenue in $ with the values for revenue expressed in $US Dollars (single units).

codevalidated

Create a new column: `Is American?`

Create a new boolean column Is American? that contains the value True for companies which Country is USA, and False otherwise.

codevalidated

Create a new column with the CEOs of each company

Create new column CEO that contains the names of the CEOs of each company. You'll find the list of the CEOs in the associated notebook.

codevalidated

Delete the column `CEO`

Using the del keyword, delete the column CEO.

codevalidated

Drop Microsoft from the `df`

Using .drop, delete Microsoft and assign the result to df_no_windows. IMPORTANT, you should NOT modify df.

codevalidated

Delete *inplace* the values for IBM and Dell

Perform a mutable operation and delete the rows containing information for IBM and Dell Technologies.

codevalidated

Delete companies with revenue lower than the mean

Drop the companies that have a value of Revenue lower than the mean (average Revenue). Do NOT modify the original DataFrame; store the new results in df_high_revenue.

codevalidated

Drop the companies that are NOT from the USA

Drop the companies whose country is NOT USA. Store the results in the variable df_usa_only.

codevalidated

Japanese companies sorted by Revenue (desc)

Use chaining methods to first drop all the companies that are NOT Japanese, and, in the same expression, sort them by Revenue in descending order.

Modifying DataFrames: creating columns and moreModifying DataFrames: creating columns and more
Author

Santiago Basulto

This project is part of

Intro to Pandas for Data Analysis

Explore other projects