Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All).
Make sure you fill in any place that says YOUR CODE HERE
or "YOUR ANSWER HERE", as well as your name and collaborators below:
NAME = ""
COLLABORATORS = ""
import pandas as pd
import os
import os.path
import json
import sys
import importlib
module_dir = "../../modules"
module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
sys.path.append(module_path)
import dbutil
importlib.reload(dbutil)
%load_ext sql
Edit
creds.json
to reflect your mysql user and passwordThis must be done prior to executing the following cell
In general, you will be able to choose whether you are using the remote MySQL database or the SQLite database(s) by setting the dbsource
variable to "mysql"
or "sqlite"
respectively. The function dbutil.db_cstring
function computes a connection string for the chosen dbsource
using the information in the creds.json
file. If the last argument to this function is present, the generated connection string uses that datbase as superceding the name of the database in creds.json
.
dbsource = "sqlite"
db = "school"
cstring = dbutil.db_cstring(dbsource, "creds.json", ".", db)
print("Connection string:", cstring)
%sql $cstring
from IPython.display import Image
Image("classimages/school_schema.jpg", width=600, height=600)
Q In reference to the school
database, how many departments are there? Your query should result in a 1 x 1 table with just one number in it.
# Solution cell
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
# Testing cell
assert resultdf.iloc[0,0] == 36
Q In reference to the school
database, how many departments appear to address multiple distinct areas of study? Hint: this question is asking about the field departmentname
, but your answer should be general enough that it would work even if some departments changed name (e.g., "Cinema and Culture") or if new departments were added (e.g., "Agriculture/Culinary studies").
# Solution cell
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
# look for 'and', &, /
# Testing cell
assert resultdf.iloc[0,0] == 9
Q In reference to the school
database, how many classes have at least three sections? Your query should result in a 1 x 1 table.
# Solution cell
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
# Testing cell
assert resultdf.iloc[0,0] == 160
Q In reference to the school
database, how many students are from each state? Please order your results by state (alphabetically) and alias your new column as count
. Please omit NULL entries. For a challenge, make a graph displaying this information.
# Solution cell
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
# Testing cell
assert resultdf.loc[0,'count'] == 1
assert len(resultdf) == 53
dbsource = "mysql"
db = "book"
cstring = dbutil.db_cstring(dbsource, "creds.json", ".", db)
%sql $cstring
%sql SELECT * FROM pop_gdp
%sql SELECT * FROM country_land
Match Condition
country_land.code = pop_gdp.code
Q Construct a combined table that includes all six columns from the two tables, and where the rows in the result satisfy the match condition, and the matching fields are present in both tables.
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
Q Construct a combined table that includes all six columns from the two tables, and where the rows in the result satisfy the match condition, and all rows in the pop_gdp
table are present.
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
Q Construct a combined table that includes all six columns from the two tables, and where the rows in the result satisfy the match condition, and all rows in the land_country
table are present.
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
%sql SELECT * FROM pop_gdp2