Practice Lab: Merge and Joining data with Pandas
Practice Lab: Merge and Joining data with Pandas Data Science Project
Data Wrangling with Pandas

Practice Lab: Merge and Joining data with Pandas

In this lab, you'll explore the merging and joining of datasets using Pandas. You'll practice different types of joins, merging different dataframes to gain insights about movies. Get hands-on experience with merging techniques and tackle interesting questions about the movies.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

multiplechoice

Which parameter should we use while using `pd.merge()` method to handle overrapling column names?

multiplechoice

Which parameter we should use to determine the type of merge to be performed?

multiplechoice

Which of the following is not an option in `how` parameter?

multiplechoice

Which of the following is the default option for `how` parameter?

multiplechoice

Which of the following indicates the usage of `cross` option in `how` parameter?

codevalidated

Drop duplicate movies based on `title` and keep the first occurence

Perform the dropping on the original dataframe movies_df.

codevalidated

Merge `movies_df` & `ratings_df` with an inner join on `movieId`.

Store the resulting dataframe in the variable movies_ratings_df.

codevalidated

Use the merged `movies_ratings_df` dataframe to calculate the average rating for each movie.

Store the result in the variable avg_ratings.

Your result should look like this (title is unique for all movies as we have already dropped duplicates in activity 6):

activity7a-answer

multiplechoice

What is the average rating of `Toy Story (1995)` Movie?

codevalidated

Merge `movies_df` & `tags_df` with a left join on `movieId`.

Store the resulting dataframe in the variable movies_tags_df.

codevalidated

Use the merged dataframe `movies_tags_df` to select the movies with no tags.

Store the result in the variable movies_with_no_tags.

codevalidated

Merge `tags_df` & `ratings_df` using the movie ID and the user ID

Merge tags_df & ratings_df using an outer join on 'movieId' and 'userId'. Use suffixes '_tags' and '_ratings'.

Store the resulting dataframe in the variable tags_ratings_df.

The result should look something like:

codevalidated

Merge `movies_df` dataframe & `tag_counts` series with the left dataframe on `genres` & the right series on its index.

Store the resulting dataframe in the variable movies_tags_counts_df.

  • Note: use the default option of inner join.
codevalidated

Merge `movies_df` dataframe & `rating_counts` series using `outer` join with the left dataframe on `movieId` & the right series on its index.

Store the resulting dataframe in the variable movies_ratings_counts_df.

codevalidated

Use the `movies_ratings_counts_df` dataframe to select the movies with no ratings.

Store the resulting dataframe in the variable movies_with_no_ratings.

Practice Lab: Merge and Joining data with PandasPractice Lab: Merge and Joining data with Pandas
Author

Mohamed Rawash

This project is part of

Data Wrangling with Pandas

Explore other projects