Before you turn this problem in, make sure everything runs as expected. This is a combination of restarting the kernel and then running all cells (in the menubar, select Kernel$\rightarrow$Restart And Run All).
Make sure you fill in any place that says YOUR CODE HERE
or "YOUR ANSWER HERE".
import pandas as pd
import os
import os.path
import json
import sys
import importlib
module_dir = "../../modules"
module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
sys.path.append(module_path)
import dbutil
importlib.reload(dbutil)
%load_ext sql
Edit
creds.json
to reflect your mysql user and passwordThis must be done prior to executing the following cell
In general, you will be able to choose whether you are using the remote MySQL database or the SQLite database(s) by setting the dbsource
variable to "mysql"
or "sqlite"
respectively. The function dbutil.db_cstring
function computes a connection string for the chosen dbsource
using the information in the creds.json
file. If the last argument to this function is present, the generated connection string uses that datbase as superceding the name of the database in creds.json
.
dbsource = "sqlite"
db = "book"
cstring = dbutil.db_cstring(dbsource, "creds.json", ".", db)
print("Connection string:", cstring)
The cell below uses the Python variable constructed earlier to actually establish a connection with the server.
If you cannot get a connection established, switch over to MySQL Workbench and try from that different client.
%sql $cstring
In the following cells, your only action is to, as usual, cut out the two lines, and to put a valid SQL statement as the value of string variable query
. In each case, when you execute the cell, the query will be sent to the database management system, a result obtained, and the result converted into a pandas
data frame, whose prefix is shown.
Q1 In reference to the table indicators
, write a query to project code, year, gdp, pop, and find the GDP per capita of each country, call the new column percapita
. Sort the data in descending value of percapita
.
# Solution cell
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
# Testing cell
assert len(resultdf) == 12862
Q2 In reference to the table indicators
, select all columns, plus a new column named new
that tells whether exports
are larger than imports
. Sort in descending order of gdp.
# Solution cell
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
# Testing cell
assert resultdf.shape == (12862,9)
assert resultdf.loc[2,'new'] == 0.0
school
Database¶The following cell, assuming dbsource
is defined from above, creates a new connection string and establishes a new connection to the school
database.
db = "school"
cstring = dbutil.db_cstring(dbsource, "creds.json", ".", db)
%sql $cstring
Q3 In reference to the instructors
table in the school
database, compute and project a single new column full_name
obtained as the last name, followed by a comma and a space, followed by the first name. Put the names in alphabetical order.
# Solution cell
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
len(resultdf)
# Testing cell
assert len(resultdf) == 292
assert 'Aguilar, Stephen' in list(resultdf['full_name'])
Q4 In reference to the courses
table in the school
database, select a new column catalog
obtained as the course subject, followed by a space, followed by the course number, followed by a colon and a space, followed by the course title. Only include distinct catalogue entries and don't include any NULL entries.
# Solution cell
%sql USE school
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
# Testing cell
assert len(resultdf) == 904
assert 'WMST 101: Issues in Feminism' in list(resultdf['catalog'])