{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Denison CS181/DA210 Homework\n", "\n", "Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\\rightarrow$Restart And Run All).\n", "\n", "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# SQL Group By Exercises" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "2c15bf49fe13507b473a89ebc6ffdea5", "grade": false, "grade_id": "cell-b9abcf27cf7faf8f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "import pandas as pd\n", "import os\n", "import os.path\n", "import json\n", "import sys\n", "import importlib\n", "\n", "module_dir = \"../../modules\"\n", "module_path = os.path.abspath(module_dir)\n", "if not module_path in sys.path:\n", " sys.path.append(module_path)\n", "\n", "import dbutil\n", "importlib.reload(dbutil)\n", "\n", "%load_ext sql" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Instructions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set User Credentials\n", "\n", "> **Edit `creds.json` to reflect your mysql user and password**\n", "\n", "> This must be done **prior** to executing the following cell \n", "\n", "In general, you will be able to choose whether you are using the remote MySQL database or the SQLite database(s) by setting the `dbsource` variable to `\"mysql\"` or `\"sqlite\"` respectively. The function `dbutil.db_cstring` function computes a *connection string* for the chosen `dbsource` using the information in the `creds.json` file. If the last argument to this function is present, the generated connection string uses that datbase as superceding the name of the database in `creds.json`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dbsource = \"sqlite\"\n", "db = \"book\"\n", "cstring = dbutil.db_cstring(dbsource, \"creds.json\", \".\", db)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Connection string:\", cstring)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%sql $cstring" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following cells, your only action is to, as usual, cut out the two lines, and to put a valid SQL statement as the **value** of string variable `query`. In each case, when you execute the cell, the query will be sent to the database management system, a result obtained, and the result converted into a `pandas` data frame, whose prefix is shown." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "0b948e3990d0deb2baf351c568f2973d", "grade": false, "grade_id": "cell-79ddf82f41462dae", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Q1** Using the SQL table `countries`, use a select query to answer the question: how many countries are there in each region? Alias your new column as `new`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "f6954bd0bb0207812e6b61a4d57755d6", "grade": false, "grade_id": "cell-4c8f2803414fa403", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "fd0351906050b073bf87202543e32bd3", "grade": true, "grade_id": "cell-ab373181b399dce2", "locked": true, "points": 0, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "\n", "assert len(resultdf) == 7\n", "assert 58 in list(resultdf['new'])" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "b0c542c768598fb5158f276bee37390d", "grade": false, "grade_id": "cell-2901f1296bb6e62f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Q2** Use the `indicators` database to find the total world population in each year (as the sum of country populations). Use the alias `total_pop` for your new column. Sort the result in ascending year value." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "d355c0854aa1c3ab8ca136fca2ce303f", "grade": false, "grade_id": "cell-4e00b362912f0743", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "2918f794a76940eae74c4f3538c35758", "grade": true, "grade_id": "cell-9e2f0240bddcc016", "locked": true, "points": 0, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 59\n", "assert resultdf.loc[58,'total_pop'] > 7500\n", "assert resultdf.loc[58,'total_pop'] < 7600" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "2c693884809ca797739b7ec07452c0bd", "grade": false, "grade_id": "cell-6e0c5fefd03ae8ee", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Q3** Treating your query above as a subquery (without the ORDER BY), find the minimum for `total_pop` over all years. Use the alias `m` for the new column." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "11c4e090899b85249e37a50a626a9227", "grade": false, "grade_id": "cell-b96de6baf5f97fad", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "66178f0691c0a46b5c618e384068d3ab", "grade": true, "grade_id": "cell-90e29bdba973d83c", "locked": true, "points": 0, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 1\n", "assert resultdf.loc[0,'m'] > 3014\n", "assert resultdf.loc[0,'m'] < 3015" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "9d92f32ccf279ce57909c16e2aac3da8", "grade": false, "grade_id": "cell-e6a17c82ecc25654", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Q4** Not all countries are growing, so the largest population a country ever had might be in a previous year. For each country code in `indicators`, find the max population that country ever had. Alias your new column as `max_pop`. You should have one row per country. Don't change the order (that is, your records should still be ordered by `code`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "2e40ffb7ed84fdc7b7773e047c16c9bd", "grade": false, "grade_id": "cell-a872397ffe55566d", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "61f5782bfeca686a11ef8fa1f1c8bf9b", "grade": true, "grade_id": "cell-dfa099931224150d", "locked": true, "points": 0, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 218\n", "assert resultdf.loc[0,'max_pop'] == 0.11\n", "assert resultdf.loc[1,'max_pop'] == 37.17" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "6399963121d752fa7e43ea373bd43514", "grade": false, "grade_id": "cell-77157e0f834cb8ae", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Q5** With reference to the above, find all records where the max population is less than 1 (remember, this is measured in millions of people). Use a `HAVING` clause. Keep the original ordering of the data (alphabetically, by `code`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "1e1cc63ec5603b92d14dea1eae9e20b6", "grade": false, "grade_id": "cell-6e74056737f411c0", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "91627e20b05852b54019ee4986359981", "grade": true, "grade_id": "cell-65bcd17ea22c6762", "locked": true, "points": 0, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 58\n", "assert resultdf.loc[0,'max_pop'] == 0.11\n", "assert resultdf.loc[1,'max_pop'] == 0.08" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "529df2bdbbe454f22ecd292bd508ca9e", "grade": false, "grade_id": "cell-229195dd6cf70da3", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Q6** Use the `indicators` database to find the total world population in each year (as the sum of country populations), then return the rows where the total population is greater than 6000 (measured in millions of people). Use the alias `total_pop` for your new column." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "a1095adaae7eb20cb001e706760d6863", "grade": false, "grade_id": "cell-bee73edca4961f80", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "query = \"\"\"\n", "\"\"\"\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "resultset = %sql $query\n", "resultdf = resultset.DataFrame()\n", "resultdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "4f0f2e7dde340717ce16d259e60289d4", "grade": true, "grade_id": "cell-5b9767551411a51a", "locked": true, "points": 0, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(resultdf) == 20\n", "assert resultdf.loc[0,'total_pop'] < 6014\n", "assert resultdf.loc[1,'total_pop'] > 6090" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }