All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Let's analyze the Trip Type
column. As we know from our data dictionary, there are 5 types of trip types. If you forget, revisit our data dictionary at the top. Use value_counts()
method on the Trip Type
column. You will get 5 types. Store them in trip_type_counts
.
Use the plot method to make a pie chart. Also, you need to give the title Most Popular Trip Type
, and your autopct
should be '%1.1f%%'
Note - To assist you, a figure and axis have already been created and use a semicolon
(;)
at the end to avoid garbage output in the plot.
The HOTEL_CITY
column contains several cities where the hotels are present. Now, use the value_counts()
method on the hotel city column and store it in the city_counts
variable.
Create a bar plot to see the highest number of hotels present in each city. Set xlabel as Hotel City
, ylabel as Count
, and finally, give the title as Most Popular Hotel Cities
.
Note: Use a semicolon
(;)
at the end to avoid garbage output in the plot.
The Rating
column contains the ratings given by users for their hotel stays. Use the value_counts()
method on the Rating
column to get the count of each rating value. Store the resulting Series in the rating_counts
variable.
As you create the previous plot, create kind of same plot for the Rating
column to observe the highest rating. I'm guessing it's 4 since most customers likely gave a rating of 4. What do you think? Use the plot()
method set xlabel as Rating
, ylabel as Count
, and finally, give the title as Distribution of Customer Ratings
.
Note: Use a semicolon
(;)
at the end to avoid garbage output in the plot.
Get the count of each HOTEL_TIMEZONE
and store it in a variable called hotel_timezone_counts
.
Using hotel_timezone_counts
from the previous activity, calculate the percentage of each timezone out of the total number of rows in the DataFrame. Store the result in a variable called hotel_timezone_percents
.
Using hotel_timezone_percents
, create a pie chart using the plot()
method, use the labels hotel_timezone_percents.index
and set the autopct
to '%1.1f%%'
. Finally, set the title to Percentage Distribution of Hotel Timezones
.
Note: Use a semicolon
(;)
at the end to avoid garbage output in the plot.
Using timezone_ratings
, create a bar chart using the HOTEL_TIMEZONE
column as the x-axis and the Rating
column as the y-axis. In the plot()
method, set the x-axis label to Hotel Timezone
and the y-axis label to Average Rating
. Finally, set the title to Average Rating by Hotel Timezone
.
Note: Use a semicolon
(;)
at the end to avoid garbage output in the plot.
Your plot should look like this :
Using user_state_ratings
, create a bar chart using the USER_STATE
column as the x-axis and the Rating
column as the y-axis. In the plot()
method, set the x-axis label to User State
and the y-axis label to Average Rating
. Finally, set the title to Average Ratings by User State
. Customize the x-tick labels using the USER_STATE
column and rotate them by 90 degrees for better readability.
Note: Use a semicolon
(;)
at the end to avoid garbage output in the plot.
Filter the DataFrame to get top-rated hotels with a Rating
greater than or equal to 4. Store the filtered DataFrame in a variable called top_rated_hotels
.
Select data for the Eastern
hotel timezone by filtering the top_rated_hotels
DataFrame. Store the filtered DataFrame in a variable called timezone_data
.
Calculate the count of each unique trip type in the timezone_data
DataFrame using value_counts()
on the Trip Type
column. Store the result in a variable called trip_types
.
Calculate the total number of trips by summing the values in trip_types
. Store the result in a variable called total_trips
.Next calculate the percentage of each trip type by dividing the count of each trip type by the total number of trips and multiplying by 100. Store the result as a Pandas Series with trip_types
as the index in a variable called trip_type_percents
.
Using the trip_type_percents
variable, create a pie chart using the plot()
method, set the labels to trip_type_percents.index
and autopct
to '%1.1f%%'
. Finally, set the title to Distribution of Trip Types for Top-Rated Hotels
.
Note: Use a semicolon
(;)
at the end to avoid garbage output in the plot.
Using pivoted_timezone
, create a bar chart using the plot()
method, set the x-axis label to Hotel Timezone
, the y-axis label to Average Rating
, and the title to Average Ratings of Top User States across Hotel Timezones
. Improve the readability of the x-tick
labels by setting the x-tick
positions based on the number of unique timezones in the pivoted_timezone
DataFrame. Rotate the x-tick
labels by 45
degrees and align them to the right for better visibility.
Note: Use a semicolon
(;)
at the end to avoid garbage output in the plot.
Using pivoted_state
, create a bar chart using plot()
method, set the x-axis label to Hotel State
, the y-axis label to Average Rating
, and the title to Average Ratings of Top User States by Hotel State
. Improve the readability of the x-tick
labels by setting the x-tick
positions based on the number of unique states in the pivoted_state
DataFrame. Rotate the x-tick
labels by 45
degrees and align them to the right for better visibility.
Note: Use a semicolon
(;)
at the end to avoid garbage output in the plot.