Data Analysis and Plotting of Hotel Ratings and Trip Type

codevalidated

Analyze Trip Type Distribution

Let's analyze the Trip Type column. As we know from our data dictionary, there are 5 types of trip types. If you forget, revisit our data dictionary at the top. Use value_counts() method on the Trip Type column. You will get 5 types. Store them in trip_type_counts.

codevalidated

Create a pie chart to identify the most popular Trip Type.

Use the plot method to make a pie chart. Also, you need to give the title Most Popular Trip Type, and your autopct should be '%1.1f%%'

Note - To assist you, a figure and axis have already been created and use a semicolon (;) at the end to avoid garbage output in the plot.

codevalidated

Analyze Hotel City Distribution

The HOTEL_CITY column contains several cities where the hotels are present. Now, use the value_counts() method on the hotel city column and store it in the city_counts variable.

codevalidated

Create a bar chart to determine the most popular city where the highest number of hotels are present

Create a bar plot to see the highest number of hotels present in each city. Set xlabel as Hotel City, ylabel as Count, and finally, give the title as Most Popular Hotel Cities.

Note: Use a semicolon (;) at the end to avoid garbage output in the plot.

codevalidated

Analyze Rating Distribution

The Rating column contains the ratings given by users for their hotel stays. Use the value_counts() method on the Rating column to get the count of each rating value. Store the resulting Series in the rating_counts variable.

codevalidated

Create a bar plot to visualize the distribution of customer ratings

As you create the previous plot, create kind of same plot for the Rating column to observe the highest rating. I'm guessing it's 4 since most customers likely gave a rating of 4. What do you think? Use the plot() method set xlabel as Rating, ylabel as Count, and finally, give the title as Distribution of Customer Ratings.

Note: Use a semicolon (;) at the end to avoid garbage output in the plot.

multiplechoice

Which rating was given by the most people?

codevalidated

Count the number of hotel timezones

Get the count of each HOTEL_TIMEZONE and store it in a variable called hotel_timezone_counts.

codevalidated

Calculate the percentage of each hotel timezone

Using hotel_timezone_counts from the previous activity, calculate the percentage of each timezone out of the total number of rows in the DataFrame. Store the result in a variable called hotel_timezone_percents.

codevalidated

Create a pie chart to visualize the percentage distribution of hotel timezones

Using hotel_timezone_percents, create a pie chart using the plot() method, use the labels hotel_timezone_percents.index and set the autopct to '%1.1f%%'. Finally, set the title to Percentage Distribution of Hotel Timezones.

Note: Use a semicolon (;) at the end to avoid garbage output in the plot.

multiplechoice

Which timezone most of the hotels are in?

codevalidated

Create a bar chart to visualize average ratings by Hotel Timezone using subplots

Using timezone_ratings, create a bar chart using the HOTEL_TIMEZONE column as the x-axis and the Rating column as the y-axis. In the plot() method, set the x-axis label to Hotel Timezone and the y-axis label to Average Rating. Finally, set the title to Average Rating by Hotel Timezone.

Note: Use a semicolon (;) at the end to avoid garbage output in the plot.

Your plot should look like this :

multiplechoice

What are the two time zones with the highest average ratings?

codevalidated

Create a bar chart to visualize average ratings by User State using subplots

Using user_state_ratings, create a bar chart using the USER_STATE column as the x-axis and the Rating column as the y-axis. In the plot() method, set the x-axis label to User State and the y-axis label to Average Rating. Finally, set the title to Average Ratings by User State. Customize the x-tick labels using the USER_STATE column and rotate them by 90 degrees for better readability.

Note: Use a semicolon (;) at the end to avoid garbage output in the plot.

multiplechoice

Which two user states have given the highest rating?

codevalidated

Filter top-rated hotels

Filter the DataFrame to get top-rated hotels with a Rating greater than or equal to 4. Store the filtered DataFrame in a variable called top_rated_hotels.

codevalidated

Select data for a specific hotel timezone

Select data for the Eastern hotel timezone by filtering the top_rated_hotels DataFrame. Store the filtered DataFrame in a variable called timezone_data.

codevalidated

Get unique trip types and their counts

Calculate the count of each unique trip type in the timezone_data DataFrame using value_counts() on the Trip Type column. Store the result in a variable called trip_types.

codevalidated

Calculate the percentage of each trip type

Calculate the total number of trips by summing the values in trip_types. Store the result in a variable called total_trips.Next calculate the percentage of each trip type by dividing the count of each trip type by the total number of trips and multiplying by 100. Store the result as a Pandas Series with trip_types as the index in a variable called trip_type_percents.

codevalidated

Visualize Trip Type Distribution with a Pie Chart

Using the trip_type_percents variable, create a pie chart using the plot() method, set the labels to trip_type_percents.index and autopct to '%1.1f%%'. Finally, set the title to Distribution of Trip Types for Top-Rated Hotels.

Note: Use a semicolon (;) at the end to avoid garbage output in the plot.

codevalidated

Create a bar chart to visualize regional rating trends for the top user states across hotel locations using pivoted data

Using pivoted_timezone, create a bar chart using the plot() method, set the x-axis label to Hotel Timezone, the y-axis label to Average Rating, and the title to Average Ratings of Top User States across Hotel Timezones. Improve the readability of the x-tick labels by setting the x-tick positions based on the number of unique timezones in the pivoted_timezone DataFrame. Rotate the x-tick labels by 45 degrees and align them to the right for better visibility.

Note: Use a semicolon (;) at the end to avoid garbage output in the plot.

codevalidated

Visualize Average Ratings with a Bar Chart

Using pivoted_state, create a bar chart using plot() method, set the x-axis label to Hotel State, the y-axis label to Average Rating, and the title to Average Ratings of Top User States by Hotel State. Improve the readability of the x-tick labels by setting the x-tick positions based on the number of unique states in the pivoted_state DataFrame. Rotate the x-tick labels by 45 degrees and align them to the right for better visibility.

Note: Use a semicolon (;) at the end to avoid garbage output in the plot.

Dhrubaraj Roy

Project Activities

Analyze Trip Type Distribution

Create a pie chart to identify the most popular Trip Type.

Analyze Hotel City Distribution

Create a bar chart to determine the most popular city where the highest number of hotels are present

Analyze Rating Distribution

Create a bar plot to visualize the distribution of customer ratings

Which rating was given by the most people?

Count the number of hotel timezones

Calculate the percentage of each hotel timezone

Create a pie chart to visualize the percentage distribution of hotel timezones

Which timezone most of the hotels are in?

Create a bar chart to visualize average ratings by Hotel Timezone using subplots

What are the two time zones with the highest average ratings?

Create a bar chart to visualize average ratings by User State using subplots

Which two user states have given the highest rating?

Filter top-rated hotels

Select data for a specific hotel timezone

Get unique trip types and their counts

Calculate the percentage of each trip type

Visualize Trip Type Distribution with a Pie Chart

Create a bar chart to visualize regional rating trends for the top user states across hotel locations using pivoted data

Visualize Average Ratings with a Bar Chart

Dhrubaraj Roy

Intro to Pandas for Data Analysis

Set Operations using Sakila

LIKE Operator using World

Membership and Range Operators with World Database