Practicing Filtering and Selection with TED Talks data
Practicing Filtering and Selection with TED Talks data Data Science Project
Intro to Pandas for Data Analysis

Practicing Filtering and Selection with TED Talks data

In this project you'll have to perform multiple queries to analyze and answer questions about the TED Talks data. You'll need to create different type of queries using the `.loc` or `.query` methods, boolean and comparison operators. This is the foundation of Data Analysis!

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Filter for talks that have a minimum of 1 million views (include only "view_count" and "speaker_name" in the subdataframe)

Perform the selection and store the results in the variable df_more_than_1million

codevalidated

Filter all talks that have a comment count greater than 2,000 and are given by speakers with the occupation of "Artist"

Perform the selection and store the results in the variable df_commented_artists

codevalidated

Filter all talks that have a duration of more than 60 minutes

Perform the selection and store the results in the variable df_long_duration

codevalidated

Filter talks where the number of comments is greater than or equal to the duration

Perform the selection and store the results in the variable df_high_comments

codevalidated

Filter talks where the duration is not greater than or equal to the average duration

Perform the selection and store the results in the variable df_short_talks

codevalidated

Filter all talks that have a view count greater than 10 million, are given in the English language, and are themed with "Culture"

Perform the selection and store the results in the variable df_popular_english_culture

codevalidated

Filter all talks that have a view count between 1 million and 2 million inclusive, were published after January 1, 2019, and are themed with either "Science" or "Technology"

Perform the selection and store the results in the variable df_medium_science_tech

codevalidated

Select the speaker name of the highest talk in views which published in Jan or Aug and its speaker is 'Journalist' or 'Entrepreneur' and its duration is less than 3 min

  • First, Filter talks that were published in either January or August and store the result in df_jan_aug.

  • Second, apply a filter on first step's result by filtering talks with speakers who are journalists or entrepreneurs and store the result in df_journalists_entrepreneurs.

  • Third, apply a filter on second step's result by filtering talks with a duration of less than 8 minutes (480 seconds) and store the result in df_below_8m_talks.

  • Finaly, Perform the selection and store the final result in the variable highest_view_talk_speaker

codevalidated

Select the name, speaker_name, and event columns for all talks with a view_count that is greater than 3 times the standard deviation of the view_count

Perform the selection and store the results in the variable df_talks_more_than_std

codevalidated

Select all talks that are either in English 'en' and have a duration greater than 10 minutes or are in Spanish 'es' and have a duration greater than 5 minutes

Perform the selection and store the results in the variable df_long_en_es_talks

Practicing Filtering and Selection with TED Talks dataPracticing Filtering and Selection with TED Talks data
Author

Mohamed Rawash

This project is part of

Intro to Pandas for Data Analysis

Explore other projects