All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.
All our activities include solutions with explanations on how they work and why we chose them.
Group the dataset by Gender and calculate the average Age for each gender group. This will help you understand the average age of male and female customers.
Enter the average age of Female in exact decimals returned.
Group the dataset by Marital Status and count the frequency of orders where Output is Yes. This will help you identify how marital status affects ordering trends.
Store the result in the orders_by_marital_status variable.
The result should match the following output :

Group the dataset by Occupation and count the frequency of orders where Output is "Yes". This analysis will show which occupations order more frequently.
Store the result in the orders_by_occupation variable.
The result should match the following output :

Group the dataset by Educational Qualifications and count the number of orders where Output is "Yes".
Use groupby followed by size to count occurrences, then reset_index to convert the groupby result into a DataFrame, facilitating easier analysis and visualization.
This will help you determine if education level affects ordering behavior.
Store the result in the orders_by_education variable
The result should match the following output :

Group the dataset by Family size and count the number of orders where Output is "Yes". This will help you understand if larger families order more frequently.
Store the result in the orders_by_family_size variable.
The result should match the following output :

Split the dataset into subsets based on Gender (Male and Female) and later concatenate them to compare findings between genders.
Store the result in the concatenated_data variable.
The result should match the following output :

Split the dataset into subsets based on No Income and Below Rs.10000 from the Monthly Income column, then concatenate them.
After concatenation, reset the index using reset_index(drop=True) to ensure the index is continuous and without duplicates.
Store the result in the concatenated_data_income variable.
The result should match the following output :

Split the Feedback data into Positive and Negative subsets, merge these analyses on Occupation to get a comprehensive view of customer sentiments.
After grouping by Occupation, use reset_index to convert the indices into columns, and specify the column name for the count of feedback using name='Positive' for positive feedback and name='Negative' for negative feedback.
Note: It is 'Negative ' and not
Negative. In the dataset there is space after the word the Negative.
Store the result in the merged_feedback variable.
The result should match the following output :

Split the dataset into subsets based on Post_Graduate, Graduate and Ph.D from the Educational Qualifications column, then concatenate them.
After concatenation, reset the index using reset_index(drop=True) to ensure the index is continuous and without duplicates.
Store the result in the concatenated_education_data variable.
The result should match the following output :

Split the dataset into subsets based on Family size (1-2, 3-4, and 5 or more) and later concatenate them to analyze the effect of family size on feedback.
Store the result in the concatenated_family_size_data variable.
The result should match the following output:

Use the applymap function to convert all entries in the Occupation column to uppercase to standardize the data.
Store the result in the df DataFrame with a new column Occupation Uppercase.
The result should match the following output :

Use the where function to identify orders from families with a size greater than 4. This will help in targeting larger families for marketing campaigns.
Use notna() method to remove NAN values created by where.
Store the result in the large_family_orders variable.
The result should match the following output :

Apply a custom function to derive geographical insights based on latitude and longitude. This will help in understanding the geographical distribution of orders.
Use the pandas apply method to apply this function across the DataFrame. The apply method should be used with axis=1, which ensures that the function is applied to each row individually.
Store the result in the df DataFrame with a new column Location Insights.
The result should match the following output :

Convert the Marital Status column into dummy variables for regression or classification analysis. This will help in predictive modeling.
Store the result in the marital_status_dummies variable.
The result should match the following output :

Convert the Occupation column into dummy variables for analysis of occupational impacts on ordering habits. This will facilitate regression analysis.
Store the result in the occupation_dummies variable.
The result should match the following output :

Filter the dataset to include where occupation is Student with No Income to analyze their ordering patterns. This will help in understanding the behavior of student customers.
Store the result in the student_orders variable.
The result should match the following output :

Group by Educational Qualifications and Feedback to assess if different educational qualifications correlate with specific types of feedback. This will help in understanding how education level influences customer satisfaction.
Enter the number of Positive and Negative feedbacks for Educational Qualifications : Graduate.
Note: Enter in comma seperated format, for example : 154, 20
Examine if there’s a direct correlation between the size of the family and the frequency of orders.
Group the dataset by Family size and count the frequency of orders where Output is Yes.
Store the result in the family_size_order_frequency variable.
The result should match the following output :
