Denison CS181/DA210 Homework

Before you turn this problem in, make sure everything runs as expected. This is a combination of restarting the kernel and then running all cells (in the menubar, select Kernel$\rightarrow$Restart And Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE".


SQL Exercises on school Database

In [ ]:
import pandas as pd
import os
import os.path
import json
import sys
import importlib

module_dir = "../../modules"
module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
    sys.path.append(module_path)

import dbutil
importlib.reload(dbutil)

%load_ext sql

Instructions

Set User Credentials

Edit creds.json to reflect your mysql user and password

This must be done prior to executing the following cell

In general, you will be able to choose whether you are using the remote MySQL database or the SQLite database(s) by setting the dbsource variable to "mysql" or "sqlite" respectively. The function dbutil.db_cstring function computes a connection string for the chosen dbsource using the information in the creds.json file. If the last argument to this function is present, the generated connection string uses that datbase as superceding the name of the database in creds.json.

In [ ]:
dbsource = "mysql"
db = "school"
cstring = dbutil.db_cstring(dbsource, "creds.json", ".", db)
In [ ]:
print("Connection string:", cstring)
In [ ]:
%sql $cstring
In [1]:
from IPython.display import Image
Image("hwimages/school_schema.jpg", width=600, height=600)
Out[1]:

Q1 In reference to the school database, how many departments are there? Your query should result in a 1 x 1 table with just one number in it.

In [ ]:
# Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
In [ ]:
# Testing cell
assert resultdf.iloc[0,0] == 36

Q2 In reference to the school database, how many departments appear to address multiple distinct areas of study? Hint: this question is asking about the field departmentname, but your answer should be general enough that it would work even if some departments changed name (e.g., "Cinema and Culture") or if new departments were added (e.g., "Agriculture/Culinary studies").

In [ ]:
# Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()

# look for 'and', &, /
In [ ]:
# Testing cell

assert resultdf.iloc[0,0] == 9

Q3 In reference to the school database, how many classes have at least three sections? Your query should result in a 1 x 1 table.

In [ ]:
# Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
In [ ]:
# Testing cell

assert resultdf.iloc[0,0] == 160

Q4 In reference to the school database, how many students are from each state? Please order your results by state (alphabetically) and alias your new column as count. Please omit NULL entries. For a challenge, make a graph displaying this information.

In [ ]:
# Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()

Q5 In reference to the school database, select all distinct course titles for classes offered in the fall semester.

In [ ]:
#Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
In [ ]:
# Testing cell
assert len(resultdf) == 631
assert 'Intermediate Arabic I' in list(resultdf['coursetitle'])

Q6 Using the SQL database school, select departments (by name) and instructors in each department (first and last name). Order alphabetically by department name and then instructor last name. Only include departments that have instructors and instructors that are part of departments.

In [ ]:
# Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()
In [ ]:
# Testing cell
assert len(resultdf) == 293 or len(resultdf) == 277
assert list(resultdf.columns) == ['departmentname', 'instructorfirst', 'instructorlast']

Q7 Using the SQL database school, select all student first and last names, and the class ids and terms for the classes they are taking during the year. Only include students who are actually taking classes, and order your results alphabetically by student last name and then by first name.

In [ ]:
#Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()
#resultdf.iloc[0]
In [ ]:
# Testing cell
assert len(resultdf) == 22910

Q8 In reference to the school database, what courses are offered by the econ department in the spring semester? Please list courses not classes, and don't count directed or independent studies.

In [ ]:
#Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()
In [ ]:
# Testing cell
assert len(resultdf) == 16