Mastering DataFrame Mutations with Hollywood data
Mastering DataFrame Mutations with Hollywood data Data Science Project
Intro to Pandas for Data Analysis

Mastering DataFrame Mutations with Hollywood data

In this project you'll practice modifications in a Pandas DataFrame by mutating a dataset containing Hollywood movies data. You'll practice how to create columns, how to delete columns, how to change their type, etc. As usual, we'll also build the concepts around when it's ok to modify the data and when it's not.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Create a new column `revenue`

To calculate the revenue, a new column revenue can be generated by summing the values of the budget and gross columns of the df dataframe.

  • budget refers to the total cost that have been used for making a movie.
  • revenue is the amount of income generated from the sale of a movie.
  • gross profit represents the income or profit remaining after the budget costs have been subtracted from revenue.
codevalidated

Create a new column `percentage_profit`

To determine the Percentage Profit, a new column(percentage_profit) can be created by dividing the gross value by the revenue and multiplying the result by 100. This will give the percentage of profit made from the movie in the dataframe."

  • Percentage profit refers to the ratio of profit made compared to the total revenue expressed as a percentage.
codevalidated

Create a new column `high_budget_movie`

Create a new column high_budget_movie with the value True if the movie's budget is greater than 100 million and False otherwise.

codevalidated

Create a new column `successful_movie`

Create a new column successful_movie with the value True if the movie's profit is greater than 0 and False otherwise. Here profit refers to the revenue of the movie.

codevalidated

Create a new column `is_critically_acclaimed`.

Create a new column is_critically_acclaimed which is True if the value of score column is greater than 8 and False otherwise.

codevalidated

Create a new column `is_new_release`

Create a new column is_new_release which is True if the value of year column is greater than 2020 and False otherwise.

codevalidated

Create a new column `is_long_movie`

Create a new column is_long_movie which is True if the value of runtime column is greater than 150 minutes and False otherwise.

codevalidated

Drop unsuccessful movie.

Drop all the rows where the successful_movie column value is False. Use the inplace parameter to make the changes permanent.

codevalidated

Drop high budget movie

Drop all the rows where the value of budget is greater than 100 million and store the new dataframe in the variable high_budget_df. Don't drop from the original dataframe.

codevalidated

Drop the column `budget`

To remove the budget column from the movie dataframe, use the drop method and specify the column name budget. Ensure to specify the axis to indicate that it's a column and not a row. Additionally, specify the inplace parameter as True to make the change permanent."

codevalidated

Drop the `director` and `writer` columns from the dataframe.

To eliminate the director and writer columns from the movie dataframe, use the drop method and pass in the column names director and writer. Specify the axis to indicate that they are columns and not rows. Set the inplace parameter to False to create a new dataframe named new_df without modifying the original dataframe.

  • Note that in this activity you have to create a new dataframe named new_df.
Mastering DataFrame Mutations with Hollywood dataMastering DataFrame Mutations with Hollywood data
Author

Anurag Verma

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥 When I'm not geeking out over AI 🤖 with my classmates or building neural networks, 🧠 you can find me buried in statistics textbooks. 📚 I know, what a nerd! 🤓 I'm always down to learn new ways to speak human 🫂 and computer 💻. Making tech more fun is my jam! 🍇 If you want a cheery data buddy 😎 who can make difficult things easy-peasy 🥝 and learning a party 🎉, I'm your guy! 🙋‍♂️ Let's chat codes 👨‍💻, numbers 🧮, and machines 🤖 over coffee! ☕ I'd love to meet more techy humans. 💁‍♂️ Can't wait to talk! 🗣️

What's up, friends! 👋 I'm a computer science student about to finish my last year of college. 🎓 I LOVE writing code! ❤️ It makes me so happy! 😄 Whether I'm goofing in notebooks 📓 or coding in Python 🐍, writing programs is a blast! 💥 When I'm not geeking out over AI 🤖 with my classmates or building neural networks, 🧠 you can find me buried in statistics textbooks. 📚 I know, what a nerd! 🤓 I'm always down to learn new ways to speak human 🫂 and computer 💻. Making tech more fun is my jam! 🍇 If you want a cheery data buddy 😎 who can make difficult things easy-peasy 🥝 and learning a party 🎉, I'm your guy! 🙋‍♂️ Let's chat codes 👨‍💻, numbers 🧮, and machines 🤖 over coffee! ☕ I'd love to meet more techy humans. 💁‍♂️ Can't wait to talk! 🗣️

This project is part of

Intro to Pandas for Data Analysis

Explore other projects