{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Denison CS181/DA210 Homework\n", "\n", "Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\\rightarrow$Restart And Run All).\n", "\n", "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# SQL Single Table Exercises" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "2c15bf49fe13507b473a89ebc6ffdea5", "grade": false, "grade_id": "cell-b9abcf27cf7faf8f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "import pandas as pd\n", "import os\n", "import os.path\n", "import json\n", "import sys\n", "import importlib\n", "\n", "module_dir = \"../../modules\"\n", "module_path = os.path.abspath(module_dir)\n", "if not module_path in sys.path:\n", " sys.path.append(module_path)\n", "\n", "import dbutil\n", "importlib.reload(dbutil)\n", "\n", "%load_ext sql" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Instructions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set User Credentials\n", "\n", "> **Edit `creds.json` to reflect your mysql user and password**\n", "\n", "> This must be done **prior** to executing the following cell \n", "\n", "In general, you will be able to choose whether you are using the remote MySQL database or the SQLite database(s) by setting the `dbsource` variable to `\"mysql\"` or `\"sqlite\"` respectively. The function `dbutil.db_cstring` function computes a *connection string* for the chosen `dbsource` using the information in the `creds.json` file. If the last argument to this function is present, the generated connection string uses that datbase as superceding the name of the database in `creds.json`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dbsource = \"mysql\"\n", "db = \"book\"\n", "cstring = dbutil.db_cstring(dbsource, \"creds.json\", \".\", db)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Connection string:\", cstring)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Establish Connection from Client to Server\n", "\n", "The cell below uses the Python variable constructed earlier to actually establish a connection with the server. \n", "\n", "If you cannot get a connection established, switch over to MySQL Workbench and try from that different client." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%sql $cstring" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following cells, your only action is to, as usual, cut out the two lines, and to put a valid SQL statement as the **value** of string variable `query`. In each case, when you execute the cell, the query will be sent to the database management system, a result obtained, and the result converted into a `pandas` data frame, whose prefix is shown." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q1** Using the table `countries`, give the SQL query you would use to obtain a table all country names. There should be no onter columns projected." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "66971366dad89b41619217baf13d0be6", "grade": false, "grade_id": "cell-757ce202bc078bc4", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "aa64bea735e89014b2084c4581f0e9ac", "grade": true, "grade_id": "cell-7672cfefe3ddc5e3", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 217\n", "assert 'Aruba' in list(resultdf['country'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q2** Project the year, code, population, and number of cell-phones from `indicators`. In a comment in the solution cell, describe a derived column and additional processing that could be applied to obtain useful result. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "a4f717434d780c5bd34b29e1a90923e2", "grade": false, "grade_id": "cell-08c4d61671ba8bfd", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "75197a452b0171392f96fdaa1f789980", "grade": true, "grade_id": "cell-7acf0300d80ac59c", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 12862\n", "assert resultdf.shape == (12862,4)\n", "assert 'pop' in resultdf.columns\n", "assert 'cell' in resultdf.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q3** Using the SQL table `countries`, project all columns and produce a table of rows ordered by landmass, from smallest to largest. What kind of aggregation might be applied to this result? Answer this question as a comment within your solution cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "398a16f8cdc3bac234cc5d7f8d6df8ed", "grade": false, "grade_id": "cell-0567ba933b31bc32", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6a9d6ea6521d303ee0e6a4c20abb495e", "grade": true, "grade_id": "cell-5bb4b821bf6b3004", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 217\n", "assert resultdf.iloc[0,0] == 'CUW'\n", "assert resultdf.iloc[-1,0] == 'RUS'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q4** Use an SQL query to answer: what country and in what year was the greatest number of cell phones? Project year, code, and cell. An answer that yields multiple rows, but allowing easy answer to the question will get most of the points, but the ideal answer will only have a single row. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "7b6a77e3d65fcd26dc3b0bc818675791", "grade": false, "grade_id": "cell-4714110cb2b3f7c9", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "ee86e1ea49d9ced2f47a4bd5f1fc5c40", "grade": true, "grade_id": "cell-974b7f30401e9717", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert resultdf.loc[0,'cell'] == 1469.88\n", "assert resultdf.loc[0,'year'] == 2017\n", "assert resultdf.loc[0,'code'] == 'CHN'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "b22fa323768103a65eef4c05843dbf88", "grade": true, "grade_id": "cell-7f51343cdec471c5", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Secondary testing cell\n", "assert True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q5** Use SQL to find the rows with the top 20 GDP values in `indicators`. You may project all the columns." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "8dafbe4be536f6fa0cbfd9805b84eb49", "grade": false, "grade_id": "cell-2dbac366a673244e", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "dff559e785f832fbe0ee26a96c5467d0", "grade": true, "grade_id": "cell-2a07fe194b23f427", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert len(resultdf) == 20\n", "assert resultdf.loc[0, 'year'] == 2018\n", "assert resultdf.loc[0, 'code'] == 'USA'\n", "assert resultdf.loc[0, 'gdp'] == 20494.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q6** In reference to the table `indicators`, write a query to find all unique years that appear in the table." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "83bb5273140026e216e5bc7b793bdae6", "grade": false, "grade_id": "cell-f9ead8b8eaedc383", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "a7b33be2a9489eceb640fe8460d18c1f", "grade": true, "grade_id": "cell-75b551f8ba091620", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 59" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "f89008f983e63a1061307fecc2ead44b", "grade": false, "grade_id": "cell-eccff0b8eaa18444", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Q7** In reference to the table `indicators`, write a query to find all rows with no missing data for `gdp`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "57d21e9dae9dc410cf0faf51dc4e5a18", "grade": false, "grade_id": "cell-8d26bc71587f0ad2", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "80f218d57fd221b2c0d6fa6d19ac5054", "grade": true, "grade_id": "cell-319bea24e24ab653", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 9660\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "537f62bfb67e5151b5f25d2e6b42f8af", "grade": false, "grade_id": "cell-56366d5929ce7c6f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Q8** In reference to the table `indicators`, write a query to find all rows with no missing data for any of the numeric fields." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "319ae0f58417bbd0c69e0618253cb191", "grade": false, "grade_id": "cell-4d28f2b86a379c2a", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "ca9de682d64d7d7262450f18d5ebc81e", "grade": true, "grade_id": "cell-35b9f207c1ff90ce", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 7117\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q9** In reference to the table `indicators`, write a query to find all rows where exports is higher than imports. Select all fields for such rows." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "6b9e0590bb1361916a9ed5b9ee4641cc", "grade": false, "grade_id": "cell-581f30f531cc251e", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "c47d36b0178047a90280760b85329321", "grade": true, "grade_id": "cell-56f0921f58548c69", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 2492\n", "assert resultdf.shape == (2492,8)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "a1616c8f88ed4b8d4b388d3f28edb165", "grade": false, "grade_id": "cell-4ace07af2a13c843", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Q10** In reference to the table `indicators`, write a query to find the minimum non-zero number for `cell` that appears. Since we haven't learned about `MIN` you'll have to use `ORDER BY` to arrange it so that the first row in your result has that minimum value." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "772c942f4391e6aa8818d76be5171a65", "grade": false, "grade_id": "cell-18a2fdc8df7f8ecc", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "1eeda110a1ba16b626db067f315de791", "grade": true, "grade_id": "cell-4d3932caa0c8bc0f", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert resultdf.loc[0,'cell'] == 0.01" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "359ee23e3497c68bf6fcd5adca817462", "grade": false, "grade_id": "cell-cf78a8757481fff0", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Q11** Use a subquery to select the top ten entries in `indicators` with the highest population, then select the three from that group of 10 that have the lowest GDP. If you need to, review section 11.2.3 on subqueries." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "3ea1847eed0e222454f257a240115ba0", "grade": false, "grade_id": "cell-2befdf0a89212338", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "f0466743924a1ed26f01bda9a69e6853", "grade": true, "grade_id": "cell-3383e5e0f4737e49", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 3\n", "assert resultdf.loc[0,'gdp'] == 2652.55\n", "assert resultdf.loc[1,'pop'] == 1352.62\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }