Capstone Project: Using Dictionaries to architect application's data
Capstone Project: Using Dictionaries to architect application's data Data Science Project
Python Collections

Capstone Project: Using Dictionaries to architect application's data

In this project, you'll work with an IMDb movies dataset, starting with initial dictionaries like `movies`, `directors`, and `actors`. Through a series of tasks, you'll merge these into a comprehensive `films` dictionary and perform analysesโ€”identifying oldest films, counting documentaries, and more. This hands-on experience will enhance your proficiency with Python dictionaries, equipping you with skills for complex data manipulation.

Project Activities

All our Data Science projects include bite-sized activities to test your knowledge and practice in an environment with constant feedback.

All our activities include solutions with explanations on how they work and why we chose them.

codevalidated

Implement a Function to Define a Movie Dictionary

Start by initializing an empty dictionary movies. Your task is to write a function named define_movies(). This function should take a dictionary along with data values as arguments. The function's purpose is to insert the given data values into the dictionary and eventually return the updated dictionary.

The Python code stub below shows you how to initialize and define your function:

def define_movies(dictionary, id, name, year, rank):
    # Your code goes here
    return dictionary

The structure of the movies dictionary is as follows:

movies = {
    id(type 'int'): {
        'name': name, # string
        'year': year, # integer
        'rank': rank # float
    }
}

Upon successfully defining your function, use the following data to test it by inserting them into the movies dictionary.

id = '0'
name = 'Carmencita'
year = 1894
rank = 5.6

Make sure to add only the above data otherwise the test cases will fail.

codevalidated

Add data to `movies` Dictionary

In the previous activity, we created a function define_movies() that takes a dictionary and data values and inserts the data values into the dictionary. Now, we will use this function to insert the data into the movies dictionary.

We have movies_list which contains the data of the movies. We will iterate over the movies_list and insert the data into the movies dictionary.

The structure of the movies_list is as follows:

movies_list = [
    {
        'id': 1,
        'name': 'Carmencita',
        'year': 1894,
        'rank': 5.6
    },
    ...
]
codevalidated

Create a dictionary `directors`

We have an empty dictionary for directors. You have to create a function define_directors(). The function takes a dictionary and data values and the function should insert the data values from the directors_list into the directors dictionary and return the dictionary.

The definition of the define_directors() function is as follows:

def define_directors(dictionary, directors_list):
    # Your code goes here
    return dictionary

The structure of the directors dictionary is as follows:

directors = {
    id: {
        'first_name': firstName, # string
        'last_name': lastName # string
    }
}

The structure of the directors_list is as follows:

directors_list = [
    {
        'id': '1',
        'first_name': 'Fred',
        'last_name': 'Abberline'
    },
    ...
]
codevalidated

Create a dictionary `actors`

We have an empty dictionary for actors. You have to create a function define_actors(). The function takes a dictionary and data values and the function should insert the data values from the actors_list into the actors dictionary and return the dictionary.

The definition of the define_actors() function is as follows:

def define_actors(dictionary, actors_list):
    # Your code goes here
    return dictionary

The structure of the actors dictionary is as follows:

actors = {
    id: {
        'first_name': firstName, # string
        'last_name': lastName # string
        'gender': M or F # single character string
    }
}

The structure of the actors_list is as follows:

actors_list = [
    {
        'id': 4,
        'first_name': 'Dieguito',
        'last_name': 'El Cigala',
        'gender': 'M'
    }
    ...
]
codevalidated

Define a function `define_roles`

Define a function define_roles() that takes a dictionary and data values and the function should add the data to the dictionary in the required format and return the dictionary. The definition of the define_roles() function is as follows:

def define_roles(dictionary, actor_id, role):
    # Your code goes here
    return dictionary

The structure of the roles dictionary is as follows:

roles = {
    actor_id: {
        'role': [role1, role2, ...] # list of strings
    },
    ...
}

In the function define_roles(), we will pass a single actor id and a single role at a time. So, we will add the role to the list of roles for the actor id.

After defining the function, insert the below data into the roles dictionary using the function define_roles().

define_roles(roles, 4, 'Actor')
define_roles(roles, 4, 'Singer')
define_roles(roles, 3, 'Actor')
define_roles(roles, 3, 'Singer')
define_roles(roles, 3, 'Director')
define_roles(roles, 2, 'Actor')
define_roles(roles, 2, 'Singer')
define_roles(roles, 1, 'Actor')
codevalidated

Define a function `define_movies_genres`

Define a function define_movies_genres() that takes a dictionary and data values and the function should add the data to the dictionary in the required format and return the dictionary. The definition of the define_movies_genres() function is as follows:

def define_movies_genres(dictionary, movie_id, genre):
    # Your code goes here
    return dictionary

The structure of the movies_genres dictionary is as follows:

movies_genres = {
    movie_id: {
        'genre': [genre1, genre2, ...] # list of strings
    },
    ...
}

In the function define_movies_genres(), we will pass a single movie id and a single genre at a time. So, we will add the genre to the list of genres for the movie id.

After defining the function, insert the below data into the movies_genres dictionary using the function define_movies_genres().

define_movies_genres(movies_genres, 1, 'Short')
define_movies_genres(movies_genres, 2, 'Short')
define_movies_genres(movies_genres, 1, 'Comedy')
define_movies_genres(movies_genres, 2, 'Comedy')
define_movies_genres(movies_genres, 3, 'Crime')
define_movies_genres(movies_genres, 4, 'Drama')
define_movies_genres(movies_genres, 5, 'Romance')
codevalidated

Define a function `define_directors_genres()`

In this activity, you will be defining a function define_directors_genres() that takes a dictionary and data values, and the function should add the data to the dictionary in the required format and return the dictionary. The definition of the define_directors_genres() function is as follows:

def define_directors_genres(dictionary, director_id, genre, probability):
    # Your code goes here
    return dictionary

The structure of the directors_genres dictionary is as follows:

directors_genres = {
    director_id: {
        genre1: probability1, # float
        genre2: probability2, # float
        ...
    },
    ...
}

In the function define_directors_genres(), we will pass a single director id, single genre, and single probability at a time. So, we will add the genre and probability to the dictionary for the director id.

After defining the function, insert the below data into the directors_genres dictionary using the function define_directors_genres().

define_directors_genres(directors_genres, 1, 'Short', 0.5)
define_directors_genres(directors_genres, 2, 'Short', 0.5)
define_directors_genres(directors_genres, 1, 'Comedy', 0.5)
define_directors_genres(directors_genres, 2, 'Comedy', 0.5)
define_directors_genres(directors_genres, 3, 'Crime', 1.0)
define_directors_genres(directors_genres, 4, 'Drama', 1.0)
define_directors_genres(directors_genres, 5, 'Romance', 1.0)
codevalidated

Creating a Combined Film Details Dictionary

Your task is to write a Python function that combines information from seven different dictionaries representing a movie database into a single, normalized dictionary. This will help us organize movie data in a more efficient and structured way.

The function name should be combine_movie_data() which takes seven dictionaries as input and returns a single dictionary films as output. The seven dictionaries are movies, directors, actors, roles, movies_genres, directors_genres, and movies_directors.

The definition of the function is as follows:

def combine_movie_data(movies, directors, actors, roles, movies_genres, directors_genres, movies_directors):
    # Write your code here
    return films

The structure of the films dictionary is as follows:

films = {
    "1": {
        'name': "Movie 1",
        'year': 2023,
        'rank': 8.5,
        'directors': ["Director 1"],
        'actors': [
            {
                'name': "Actor 1",
                'gender': "M",
                'roles': ["Role A", "Role B"]
            },
            {
                'name': "Actress 2",
                'gender': "F",
                'roles': ["Role C"]
            }
        ],
        'genres': ["Action", "Adventure"],
        'directors_genres': {
            'name': "Director 1",
            'genres': {
                "Action": 0.7,
                "Adventure": 0.3
            }
        }
    },
    "2": {
        'name': "Movie 2",
        'year': 2021,
        'rank': 7.8,
        'directors': ["Director 2"],
        'actors': [
            {
                'name': "Actor 3",
                'gender': "M",
                'roles': ["Role D"]
            }
        ],
        'genres': ["Drama"],
        'directors_genres': {
            'name': "Director 2",
            'genres': {
                "Drama": 0.6,
                "Romance": 0.4
            }
        }
    },
    # ... more movies
}
codevalidated

Find the Oldest Film

In this activity, we will find the oldest film in the films dictionary. We will use the films dictionary that we created in the previous section. The oldest film is the film with the earliest year. If there are multiple films with the same earliest year, then store film names in a list and return the list.

You have to write a function named find_oldest_film() that takes one argument, films, and returns a list of film names.

The definition of the find_oldest_film() function is given below:

def find_oldest_film(films):
    # Write your code here
    pass
input

Count the Documentaries

In this activity, we will count the number of documentaries in the films dictionary. We will use the films dictionary that we created in the previous section. A film is a documentary if its genre is Documentary.

Input the value in below input box below as an integer.

input

Average Rank of Drama Films

Calculate the average rank of films in the 'Drama' genre.

Input the value in the below input box as a float(rounded to 2 decimal places).

There are some films with unknown rank. You should not include these films in your calculation.

codevalidated

List All Short Films

Create a list of all films that are categorized as Short.

You have to write a function named list_short_films() that takes one argument, films, and returns a list of film names.

The definition of the list_short_films() function is given below:

def list_short_films(films):
    # Write your code here
    pass
input

Find the Comedy with Highest Rank

Identify the film with the Comedy genre that has the highest rank. If there are multiple films with the same highest rank, then input the name of the film that comes first in the alphabetical order.

Input the value in the below input box as a string.

There are some films with unknown rank. You should not include these films in your calculation.

codevalidated

List Films with No Directors

Create a list of films that do not have any directors listed.

You have to write a function named list_films_with_no_directors() that takes one argument, films, and returns a list of film names.

The definition of the list_films_with_no_directors() function is given below:

def list_films_with_no_directors(films):
    # Write your code here
    pass
input

Calculate the Total Number of Films

Determine the total number of films in the 'films' dictionary.

Input the value in below input box below as an integer.

codevalidated

Identify Genres by Year

Create a dictionary that maps years to the genres of films released in those years.

You have to write a function named genres_by_year() that takes one argument, films, and returns a dictionary. The keys of the dictionary are years and the values are lists of genres.

The definition of the genres_by_year() function is given below:

def genres_by_year(films):
    # Write your code here
    pass
codevalidated

Find Films with Unknown Rank

List the films with the unknown rank.

You have to write a function named list_films_with_unknown_rank() that takes one argument, films, and returns a list of film names.

The definition of the list_films_with_unknown_rank() function is given below:

def list_films_with_unknown_rank(films):
    # Write your code here
    pass
input

Calculate Average Rank

Calculate the average rank of all films with a numeric rank.

Input the average rank of all films with a float rounded to two decimal places.

There are films with a rank of unknown. You should not include these films in your calculation.

Capstone Project: Using Dictionaries to architect application's dataCapstone Project: Using Dictionaries to architect application's data
Author

Anurag Verma

What's up, friends! ๐Ÿ‘‹ I'm a computer science student about to finish my last year of college. ๐ŸŽ“ I LOVE writing code! โค๏ธ It makes me so happy! ๐Ÿ˜„ Whether I'm goofing in notebooks ๐Ÿ““ or coding in Python ๐Ÿ, writing programs is a blast! ๐Ÿ’ฅ When I'm not geeking out over AI ๐Ÿค– with my classmates or building neural networks, ๐Ÿง  you can find me buried in statistics textbooks. ๐Ÿ“š I know, what a nerd! ๐Ÿค“ I'm always down to learn new ways to speak human ๐Ÿซ‚ and computer ๐Ÿ’ป. Making tech more fun is my jam! ๐Ÿ‡ If you want a cheery data buddy ๐Ÿ˜Ž who can make difficult things easy-peasy ๐Ÿฅ and learning a party ๐ŸŽ‰, I'm your guy! ๐Ÿ™‹โ€โ™‚๏ธ Let's chat codes ๐Ÿ‘จโ€๐Ÿ’ป, numbers ๐Ÿงฎ, and machines ๐Ÿค– over coffee! โ˜• I'd love to meet more techy humans. ๐Ÿ’โ€โ™‚๏ธ Can't wait to talk! ๐Ÿ—ฃ๏ธ

What's up, friends! ๐Ÿ‘‹ I'm a computer science student about to finish my last year of college. ๐ŸŽ“ I LOVE writing code! โค๏ธ It makes me so happy! ๐Ÿ˜„ Whether I'm goofing in notebooks ๐Ÿ““ or coding in Python ๐Ÿ, writing programs is a blast! ๐Ÿ’ฅ When I'm not geeking out over AI ๐Ÿค– with my classmates or building neural networks, ๐Ÿง  you can find me buried in statistics textbooks. ๐Ÿ“š I know, what a nerd! ๐Ÿค“ I'm always down to learn new ways to speak human ๐Ÿซ‚ and computer ๐Ÿ’ป. Making tech more fun is my jam! ๐Ÿ‡ If you want a cheery data buddy ๐Ÿ˜Ž who can make difficult things easy-peasy ๐Ÿฅ and learning a party ๐ŸŽ‰, I'm your guy! ๐Ÿ™‹โ€โ™‚๏ธ Let's chat codes ๐Ÿ‘จโ€๐Ÿ’ป, numbers ๐Ÿงฎ, and machines ๐Ÿค– over coffee! โ˜• I'd love to meet more techy humans. ๐Ÿ’โ€โ™‚๏ธ Can't wait to talk! ๐Ÿ—ฃ๏ธ

This project is part of

Python Collections

Explore other projects