Denison CS181/DA210 Homework

Before you turn this problem in, make sure everything runs as expected. This is a combination of restarting the kernel and then running all cells (in the menubar, select Kernel$\rightarrow$Restart And Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE".


SQL Column Operations Exercises

In [ ]:
import pandas as pd
import os
import os.path
import json
import sys
import importlib

module_dir = "../../modules"
module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
    sys.path.append(module_path)

import dbutil
importlib.reload(dbutil)

%load_ext sql

Instructions

Set User Credentials

Edit creds.json to reflect your mysql user and password

This must be done prior to executing the following cell

In general, you will be able to choose whether you are using the remote MySQL database or the SQLite database(s) by setting the dbsource variable to "mysql" or "sqlite" respectively. The function dbutil.db_cstring function computes a connection string for the chosen dbsource using the information in the creds.json file. If the last argument to this function is present, the generated connection string uses that datbase as superceding the name of the database in creds.json.

In [ ]:
dbsource = "sqlite"
db = "book"
cstring = dbutil.db_cstring(dbsource, "creds.json", ".", db)
In [ ]:
print("Connection string:", cstring)

Establish Connection from Client to Server

The cell below uses the Python variable constructed earlier to actually establish a connection with the server.

If you cannot get a connection established, switch over to MySQL Workbench and try from that different client.

In [ ]:
%sql $cstring

In the following cells, your only action is to, as usual, cut out the two lines, and to put a valid SQL statement as the value of string variable query. In each case, when you execute the cell, the query will be sent to the database management system, a result obtained, and the result converted into a pandas data frame, whose prefix is shown.

Problems

Q1 In reference to the table indicators, write a query to project code, year, gdp, pop, and find the GDP per capita of each country, call the new column percapita. Sort the data in descending value of percapita.

In [ ]:
# Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
In [ ]:
# Testing cell

assert len(resultdf) == 12862

Q2 In reference to the table indicators, select all columns, plus a new column named new that tells whether exports are larger than imports. Sort in descending order of gdp.

In [ ]:
# Solution cell

query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
In [ ]:
# Testing cell

assert resultdf.shape == (12862,9)
assert resultdf.loc[2,'new'] == 0.0

Switching to the school Database

The following cell, assuming dbsource is defined from above, creates a new connection string and establishes a new connection to the school database.

In [ ]:
db = "school"
cstring = dbutil.db_cstring(dbsource, "creds.json", ".", db)
%sql $cstring

Q3 In reference to the instructors table in the school database, compute and project a single new column full_name obtained as the last name, followed by a comma and a space, followed by the first name. Put the names in alphabetical order.

In [ ]:
# Solution cell
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
In [ ]:
len(resultdf)
In [ ]:
# Testing cell
assert len(resultdf) == 292
assert 'Aguilar, Stephen' in list(resultdf['full_name'])

Q4 In reference to the courses table in the school database, select a new column catalog obtained as the course subject, followed by a space, followed by the course number, followed by a colon and a space, followed by the course title. Only include distinct catalogue entries and don't include any NULL entries.

In [ ]:
# Solution cell
%sql USE school
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()
resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()
In [ ]:
# Testing cell
assert len(resultdf) == 904
assert 'WMST 101: Issues in Feminism' in list(resultdf['catalog'])