{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Denison CS181/DA210 Homework\n", "\n", "Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\\rightarrow$Restart And Run All).\n", "\n", "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Database Programming Homework" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**No SQL Magics should be used in solving any of the following problems**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6aa891e077d43c3069705d7a4c951055", "grade": false, "grade_id": "cell-b9abcf27cf7faf8f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "import pandas as pd\n", "import os\n", "import os.path\n", "import json\n", "import sys\n", "import importlib\n", "import sqlalchemy as sa\n", "\n", "if os.path.isdir(os.path.join(\"../../..\", \"modules\")):\n", " module_dir = os.path.join(\"../../..\", \"modules\")\n", "else:\n", " module_dir = os.path.join(\"../..\", \"modules\")\n", "\n", "module_path = os.path.abspath(module_dir)\n", "if not module_path in sys.path:\n", " sys.path.append(module_path)\n", "\n", "import dbutil\n", "importlib.reload(dbutil)\n", "\n", "dbfiledir = \"../../dbfiles\"\n", "if not os.path.isdir(dbfiledir):\n", " dbfiledir = \"../\" + dbfiledir\n", "assert os.path.isdir(dbfiledir)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Connection Strings and Connecting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q1** Write a function\n", "\n", " sqlite_cstring(database, dbdir=\".\", driver=\"pysqlite\", protocol=\"sqlite\")\n", " \n", "that generates a connection string suitable for use with a local SQLite database. Note how we order the parameters and give default values for the database directory, the driver, and the protocol, so that invocation of the function can omit the later arguments if the default values have the correct (common) values. So, for instance, an invocation `sqlite_cstring(\"book\", \"../../dbfiles\")` yields the string `\"sqlite+pysqlite:///../../dbfiles/book.db\"`. \n", "\n", "If the passed value for argument `driver` is `None`, then **both** the `+` and the driver should be omitted. So the invocation `sqlite_cstring(\"book\", \"../../dbfiles\", driver=None)` yields the string `\"sqlite:///../../dbfiles/book.db\"`\n", "\n", "You should be able to use the steps we followed in our inclass on database programming to help write this function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "6f7801d0872d6099de352b0287cd8eda", "grade": false, "grade_id": "cell-47381ac7661105cd", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "print(sqlite_cstring(\"book\", \"../../dbfiles\"))\n", "print(sqlite_cstring(\"book\", \"../../dbfiles\", driver=None))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "e0c8fb0ff70212a7dadd3bb1a414cb56", "grade": true, "grade_id": "cell-3333861dad010c6c", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing Cell\n", "\n", "assert sqlite_cstring(\"book\", \"../../dbfiles\") == \"sqlite+pysqlite:///../../dbfiles/book.db\"\n", "assert sqlite_cstring(\"book\", \"../../dbfiles\", driver=None) == \"sqlite:///../../dbfiles/book.db\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q2** In this question, we again have the goal of creating a function that gives us a connection string. But this time, we combine the formatting for a remote database covered in class with the convenience of getting many of the pieces of information from a JSON file. Write a function:\n", "\n", " mysql_creds_cstring(database, credsdir=\".\", credsfile=\"creds.json\")\n", " \n", "that generates a connection string suitable for use with a remote MySQL database. The majority of the arguments for the connection string will come from the `credsfile` file. In that JSON-formatted file, there will be a top-level dictionary that maps from string `\"mysql\"` to a dictionary. This subordinate dictionary will have keys of `\"user\"`, `\"pass\"`, `\"server\"`, and `\"scheme\"` (which has both the protocol and the lower-level driver). The database part of the connection string will come from the `database` argument, not from the creds file. We will omit the port, and let it default to the default port for mysql protocol connections.\n", "\n", "If the creds file is not found in the `credsdir`, or if any of the required keys are missing, the function should return `None`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "de607a43d15d3a69f56e8c6171a9b790", "grade": false, "grade_id": "cell-67046d5b3810ec61", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "print(mysql_creds_cstring(\"book\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "a9d4a31382695ac4e5bfa5f7c27906c1", "grade": true, "grade_id": "cell-2b4dbb0790c431b3", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing Cell\n", "\n", "assert mysql_creds_cstring(\"book\") == \"mysql+mysqlconnector://studen_j1:studen_j1@hadoop.mathsci.denison.edu/book\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting up for making connections\n", "\n", "The following cell assumes correct operation for the functions you defined in Q1 and/or Q2. As in the past, you can change the value of `dbsource` depending on whether you want to use local or remote database. \n", "\n", "If your answers to Q1 and/or Q2 are not generating the correct values for connection strings, you should be able to uncomment the last line in this next cell to get the instructor-provided utility function to generate a connection string." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dbsource = \"sqlite\"\n", "db = \"school\"\n", "\n", "if dbsource == \"sqlite\":\n", " cstring = sqlite_cstring(db, dbfiledir)\n", "elif dbsource == \"mysql\":\n", " cstring = mysql_creds_cstring(db)\n", "\n", "\n", "# cstring = dbutil.db_cstring(dbsource, \"creds.json\", \".\", db)\n", "\n", "print(\"Connection string:\", cstring)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q3** Write a function\n", "\n", " db_connect(cstring)\n", " \n", "that performs the two `sqlalchemy` steps of creating a database engine based on a connection string, `cstring`, and then establishes a connection. The function should return **both** the engine and the connection object. Note that the `sqlalchemy` import has already been done for you in the first code cell of this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "6ef30f0067b1252b3c7ec7718c5f97d9", "grade": false, "grade_id": "cell-2b524ba413a83836", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "808b1010053c8fde1f0410008ea06c2d", "grade": true, "grade_id": "cell-6d174ea5e428e7e5", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing Cell\n", "\n", "engine, connection = db_connect(cstring)\n", "assert isinstance(engine, sa.engine.base.Engine)\n", "assert isinstance(connection, sa.engine.base.Connection)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q4** Write a function\n", "\n", " db_shutdown(engine, connection)\n", " \n", "that, under the protection of a `try`/`except` block, attempts to close the given connection and then deletes the given engine. The function has no return value." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "4819161b48be605e2b768eb28ffb17da", "grade": false, "grade_id": "cell-1d2655383c15edec", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "bd1519964a5170e4d5b4df80305767f3", "grade": true, "grade_id": "cell-283d6b5e208a549f", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing Cell\n", "\n", "## Assumes Q3 testing cell has been executed and that, on entry, \n", "## engine and connection are defined global variables.\n", "\n", "db_shutdown(engine, connection)\n", "assert connection.closed\n", "\n", "# Calling db_shutdwon a second time should **not** result in an exception\n", "# if properly implemented\n", "\n", "db_shutdown(engine, connection)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preparing for Making Queries\n", "\n", "We now use the functions defined above, along with the value of `cstring` to establish a connection to be used for subsequent operations. If prior questions have not been correctly answered, you may substitute other code to create a connection named `connection` in place of the following cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dbsource = \"sqlite\"\n", "db = \"book\"\n", "\n", "if dbsource == \"sqlite\":\n", " cstring = sqlite_cstring(db, dbfiledir)\n", "elif dbsource == \"mysql\":\n", " cstring = mysql_creds_cstring(db)\n", "\n", "\n", "# cstring = dbutil.db_cstring(dbsource, \"creds.json\", \".\", db)\n", "\n", "print(\"Connection string:\", cstring)\n", "engine, connection = db_connect(cstring)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Query Functions\n", "\n", "As part of good programming developement, we should develop a **single function** for each SQL query we want to make. Such functions should be made general and use **function parameters** for parts of the query where associating Python variables as part of the query makes sense.\n", "\n", "This will be the approach taken in the remaining exercises in this homework." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q5** Write a function\n", "\n", " indicators0_table(conn)\n", "\n", "that, given a connection, `conn`, obtains all rows and all columns of the `indicators0` table, returning the result as a `pandas` data frame." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "137c533a6ed677f48afecc70e21823cf", "grade": false, "grade_id": "cell-98aec166dd90bf74", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultdf = indicators0_table(connection)\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "a63e7ce035839e8c861555dc0d6da0c8", "grade": true, "grade_id": "cell-a3b89ce3d295de5f", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing Cell\n", "\n", "resultdf = indicators0_table(connection)\n", "assert isinstance(resultdf, pd.core.frame.DataFrame)\n", "assert resultdf.shape == (5, 5)\n", "assert 'code' in resultdf.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q6** Write a function\n", "\n", " indicators_pop(conn, pop_threshold)\n", " \n", "that obtains all the columns of the `indicators` table over the connection, `conn`, but limits the rows to those whose `pop` column is greater than or equal to `pop_threshold`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "fa8a8878546ffa7082855b32651c9fe4", "grade": false, "grade_id": "cell-3e7fc8a4ac92d212", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultdf = indicators_pop(connection, 1000)\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "bce1766ffe5ed153c644d02238e2999e", "grade": true, "grade_id": "cell-d719222000e1198c", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing Cell\n", "\n", "resultdf = indicators_pop(connection, 1000)\n", "assert isinstance(resultdf, pd.core.frame.DataFrame)\n", "assert resultdf.shape == (59, 8)\n", "assert 'code' in resultdf.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q7** Write a function \n", "\n", " topnames_names(conn, name_list)\n", " \n", "that queries the `topnames` table and returns the rows in which the `name` field of the table matches one of the names in `name_list`. Hint: On the SQL side, this is a form of a **set inclusion** predicate. This means the need for \"translating\" `name_list` into the equivalent SQL string." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "569c0595968f0ef8dac9fe6ba89fcc74", "grade": false, "grade_id": "cell-b7ccaa8b57ab6e4d", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultdf = topnames_names(connection, ['Mary', 'John'])\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "de63c9314303360f4065ce12f819b394", "grade": true, "grade_id": "cell-e5538fbc92cd0b7e", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing Cell\n", "\n", "resultdf = topnames_names(connection, ['Mary', 'John'])\n", "assert isinstance(resultdf, pd.core.frame.DataFrame)\n", "assert resultdf.shape == (120, 4)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }