Denison CS181/DA210 Homework

Before you turn this problem in, make sure everything runs as expected. This is a combination of restarting the kernel and then running all cells (in the menubar, select Kernel$\rightarrow$Restart And Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE".


Database Programming Homework

No SQL Magics should be used in solving any of the following problems

In [ ]:
import pandas as pd
import os
import os.path
import json
import sys
import importlib
import sqlalchemy as sa

if os.path.isdir(os.path.join("../../..", "modules")):
    module_dir = os.path.join("../../..", "modules")
else:
    module_dir = os.path.join("../..", "modules")

module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
    sys.path.append(module_path)

import dbutil
importlib.reload(dbutil)

dbfiledir = "../../dbfiles"
if not os.path.isdir(dbfiledir):
    dbfiledir =  "../" + dbfiledir
assert os.path.isdir(dbfiledir)

Connection Strings and Connecting

Q1 Write a function

sqlite_cstring(database, dbdir=".", driver="pysqlite", protocol="sqlite")

that generates a connection string suitable for use with a local SQLite database. Note how we order the parameters and give default values for the database directory, the driver, and the protocol, so that invocation of the function can omit the later arguments if the default values have the correct (common) values. So, for instance, an invocation sqlite_cstring("book", "../../dbfiles") yields the string "sqlite+pysqlite:///../../dbfiles/book.db".

If the passed value for argument driver is None, then both the + and the driver should be omitted. So the invocation sqlite_cstring("book", "../../dbfiles", driver=None) yields the string "sqlite:///../../dbfiles/book.db"

You should be able to use the steps we followed in our inclass on database programming to help write this function.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
print(sqlite_cstring("book", "../../dbfiles"))
print(sqlite_cstring("book", "../../dbfiles", driver=None))
In [ ]:
# Testing Cell

assert sqlite_cstring("book", "../../dbfiles") == "sqlite+pysqlite:///../../dbfiles/book.db"
assert sqlite_cstring("book", "../../dbfiles", driver=None) == "sqlite:///../../dbfiles/book.db"

Q2 In this question, we again have the goal of creating a function that gives us a connection string. But this time, we combine the formatting for a remote database covered in class with the convenience of getting many of the pieces of information from a JSON file. Write a function:

mysql_creds_cstring(database, credsdir=".", credsfile="creds.json")

that generates a connection string suitable for use with a remote MySQL database. The majority of the arguments for the connection string will come from the credsfile file. In that JSON-formatted file, there will be a top-level dictionary that maps from string "mysql" to a dictionary. This subordinate dictionary will have keys of "user", "pass", "server", and "scheme" (which has both the protocol and the lower-level driver). The database part of the connection string will come from the database argument, not from the creds file. We will omit the port, and let it default to the default port for mysql protocol connections.

If the creds file is not found in the credsdir, or if any of the required keys are missing, the function should return None.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
print(mysql_creds_cstring("book"))
In [ ]:
# Testing Cell

assert mysql_creds_cstring("book") == "mysql+mysqlconnector://studen_j1:studen_j1@hadoop.mathsci.denison.edu/book"

Setting up for making connections

The following cell assumes correct operation for the functions you defined in Q1 and/or Q2. As in the past, you can change the value of dbsource depending on whether you want to use local or remote database.

If your answers to Q1 and/or Q2 are not generating the correct values for connection strings, you should be able to uncomment the last line in this next cell to get the instructor-provided utility function to generate a connection string.

In [ ]:
dbsource = "sqlite"
db = "school"

if dbsource == "sqlite":
    cstring = sqlite_cstring(db, dbfiledir)
elif dbsource == "mysql":
    cstring = mysql_creds_cstring(db)


# cstring = dbutil.db_cstring(dbsource, "creds.json", ".", db)

print("Connection string:", cstring)

Q3 Write a function

db_connect(cstring)

that performs the two sqlalchemy steps of creating a database engine based on a connection string, cstring, and then establishes a connection. The function should return both the engine and the connection object. Note that the sqlalchemy import has already been done for you in the first code cell of this notebook.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Testing Cell

engine, connection = db_connect(cstring)
assert isinstance(engine, sa.engine.base.Engine)
assert isinstance(connection, sa.engine.base.Connection)

Q4 Write a function

db_shutdown(engine, connection)

that, under the protection of a try/except block, attempts to close the given connection and then deletes the given engine. The function has no return value.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Testing Cell

## Assumes Q3 testing cell has been executed and that, on entry, 
## engine and connection are defined global variables.

db_shutdown(engine, connection)
assert connection.closed

# Calling db_shutdwon a second time should **not** result in an exception
# if properly implemented

db_shutdown(engine, connection)

Preparing for Making Queries

We now use the functions defined above, along with the value of cstring to establish a connection to be used for subsequent operations. If prior questions have not been correctly answered, you may substitute other code to create a connection named connection in place of the following cell.

In [ ]:
dbsource = "sqlite"
db = "book"

if dbsource == "sqlite":
    cstring = sqlite_cstring(db, dbfiledir)
elif dbsource == "mysql":
    cstring = mysql_creds_cstring(db)


# cstring = dbutil.db_cstring(dbsource, "creds.json", ".", db)

print("Connection string:", cstring)
engine, connection = db_connect(cstring)

Query Functions

As part of good programming developement, we should develop a single function for each SQL query we want to make. Such functions should be made general and use function parameters for parts of the query where associating Python variables as part of the query makes sense.

This will be the approach taken in the remaining exercises in this homework.

Q5 Write a function

indicators0_table(conn)

that, given a connection, conn, obtains all rows and all columns of the indicators0 table, returning the result as a pandas data frame.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
resultdf = indicators0_table(connection)
resultdf.head()
In [ ]:
# Testing Cell

resultdf = indicators0_table(connection)
assert isinstance(resultdf, pd.core.frame.DataFrame)
assert resultdf.shape == (5, 5)
assert 'code' in resultdf.columns

Q6 Write a function

indicators_pop(conn, pop_threshold)

that obtains all the columns of the indicators table over the connection, conn, but limits the rows to those whose pop column is greater than or equal to pop_threshold.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
resultdf = indicators_pop(connection, 1000)
resultdf.head()
In [ ]:
# Testing Cell

resultdf = indicators_pop(connection, 1000)
assert isinstance(resultdf, pd.core.frame.DataFrame)
assert resultdf.shape == (59, 8)
assert 'code' in resultdf.columns

Q7 Write a function

topnames_names(conn, name_list)

that queries the topnames table and returns the rows in which the name field of the table matches one of the names in name_list. Hint: On the SQL side, this is a form of a set inclusion predicate. This means the need for "translating" name_list into the equivalent SQL string.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
resultdf = topnames_names(connection, ['Mary', 'John'])
resultdf.head()
In [ ]:
# Testing Cell

resultdf = topnames_names(connection, ['Mary', 'John'])
assert isinstance(resultdf, pd.core.frame.DataFrame)
assert resultdf.shape == (120, 4)