Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [ ]:
NAME = ""
COLLABORATORS = ""

TMDb API Exercises

In [ ]:
import requests

import os
import os.path
import sys
import importlib

if os.path.isdir(os.path.join("../../..", "modules")):
    module_dir = os.path.join("../../..", "modules")
else:
    module_dir = os.path.join("../..", "modules")

module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
    sys.path.append(module_path)

import util
importlib.reload(util)

Set Up

Edit creds.json Before Starting this Homework

Record Credentials Information

  1. In the same folder as this notebook, right-click on creds.json and Open With and select Editor.
  2. In the tmdb dictionary entry, put your API key, obtained for your application from TMDb, as the value of the apikey key.
  3. Save the File
  4. Run the cell below, which calls a function in the util module that simply reads in a paricular sub-dictionary within the creds file, and then assigns a global variable apikey. The signature for this function is:

    read_creds(key, folder=".", file="creds.json")

    where the second and third parameter give the folder and name for the credentials file. Since we are using a creds.json in the current directory, we only need to specify the key for the sub-dictionary we wish to read.

In [ ]:
tmdb_creds = util.read_creds("tmdb", ".", "creds.json")
print(tmdb_creds)

#apikey = ""
apikey = tmdb_creds['apikey']
print(apikey)

Q1 Write a function

getGenreDict(my_api_key)

that returns a Python dictionary mapping Movie genre names to id numbers, e.g. {'Action':28, 'Adventure':12,...}. Your function should only make a single call to requests.get(), with the appropriate endpoint path for TMDB, and with the given API key. As always, return None if something goes wrong. Note that this is not asking for the entire JSON result, but a dictionary mapping genre names to id numbers.

You may want to write this as a global cell to get it working and to examine the results, and then convert it into a function.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Testing cell

d = getGenreDict(apikey)
print(d)

Q2 Write a function

search_person_id(my_api_key, name)

that uses the /search/person endpoint to conduct a search for the given name, and returns the id of the first entry in the results list. This should use only one call to requests.get(). As always, return None if something goes wrong, and return -1 if there were no results.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Testing cell
assert search_person_id(apikey,'tom cruise') == 500
assert search_person_id(apikey+'z','Bill Murray') == None

Q3 The reading mentioned the TMDB endpoint /discover/movie. Many examples of what can be done with this endpoint are provided at the following link.

However, as always with APIs, it is unwise to rely solely on examples. Rather, you must read and understand the API documentation. Navigate the TMDB API documentation and find the url that explains how to use the endpoint /discover/movie (the place with the "Try it out" tab). Store this url as a string called documentation below.

In [ ]:
documentation = "" # modify this
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Testing cell

Discovering Movies and/or TV Shows

The Discover Movie endpoint has many possibilities, as seen by the long list of various examples pointed to in the last question, and there is also a comparable discover endpoint for TV shows. The remainder of the questions in this notebook will be primarily self-determined and focus on versions of the discover endpoints that are interesting to the individual student.

Q4 Write Python code to issue at least four distinct requests to either the movie or the TV discover endpoints, and then examine the results. As you look at the results, think about how you would process results to build a tabular/pandas dataframe and the columns you would use in such a representation.

In [ ]:
# Discover Example 1

# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Discover Example 2

# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Discover Example 3

# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Discover Example 4

# YOUR CODE HERE
raise NotImplementedError()

The following suggests a culminating exercise that focuses on movies and the use of two attributes to use for filtering in a call to the discover endpoint. If you wish, you can change to focus on TV shows, and can use different attributes, depending on what you might find personally interesting. The point is the use of multiple attributes and the building of a tabular (LoD) representation.

Q5 Write a function

makeBigMovieLoD(my_api_key, actorList, genreList)

that returns a LoD with one dictionary per movie, with columns title (that's 'original title'), id, popularity, language, and overview. You will have to use your earlier functions since the actorList will be a list of actual actor names, and genreList will also be symbolic, and not numeric.

Your function should then use a single call to requests.get() and should include movies with all of the actors on actorList and with any of the genres on genreList. If you come across a non-existant actor or genre (e.g., "abventure"), don't include it in the search. Please sort your results by popularity, from most popular to least popular.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()