In [ ]:
 

Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [ ]:
NAME = ""
COLLABORATORS = ""

IC_3.2: school Database

In [ ]:
import pandas as pd
import os
import os.path
import json
import sys
import importlib

module_dir = "../../modules"
module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
    sys.path.append(module_path)

import dbutil
importlib.reload(dbutil)

%load_ext sql

School Database Schema Reference

In [1]:
from IPython.display import Image
Image("classimages/school_schema.jpg", width=600, height=600)
Out[1]:

Set User Credentials

Edit creds.json to reflect your mysql user and password

This must be done prior to executing the following cell

In general, you will be able to choose whether you are using the remote MySQL database or the SQLite database(s) by setting the dbsource variable to "mysql" or "sqlite" respectively. The function dbutil.db_cstring function computes a connection string for the chosen dbsource using the information in the creds.json file. If the last argument to this function is present, the generated connection string uses that datbase as superceding the name of the database in creds.json.

In [ ]:
dbsource = "sqlite"
db = "school"
cstring = dbutil.db_cstring(dbsource, "creds.json", ".", db)

%sql $cstring

Q1 Create a list of departments, including the id of the department, the name of the department, and the last and first names of the chair of the department. Determine the number of rows in your result and compare with the number of rows total in departments? Are they the same? Why or why not?

In [ ]:
#Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head(10)

YOUR ANSWER HERE

Q2 In reference to the school database, select all course titles for classes offered during the year, their class meeting times, and their terms. Keep the default ordering (by coursetitle). It's ok to include directed studies, but don't allow any NULL course titles or meeting times.

In [ ]:
#Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.tail()
In [ ]:
# Testing cell
assert len(resultdf) == 133141
assert len(resultdf.iloc[0]) == 3
assert 'Writing Workshop' in list(resultdf['coursetitle'])

Q3 Write a query to display students (last name and first name) and instructors (first name) who have the same last name, ordered by student last name, then student first name. Don't include duplicate results.

In [ ]:
# Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()

Q4 Write a query to display all the students (id, last name, first name) who took math or computer science during the fall. Please order your results by studentid (lowest to highest). If a student took multiple math or CS courses, please include them multiple times.

In [ ]:
# Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()

Q5 In reference to the school database, which instructors (first and last name) were teaching in the spring semester? Your result should not include duplicates.

In [ ]:
#Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()
In [ ]:
# Testing cell
assert len(resultdf) == 266
assert 'Taylor' in list(resultdf['instructorfirst'])
assert 'Fuller' in list(resultdf['instructorlast'])

Q6 Find the students (id only) who took more than 10 classes over the year. Include the number of classes they took as count.

In [ ]:
#Solution cell* 

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()