{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Denison CS181/DA210 Homework\n", "\n", "Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\\rightarrow$Restart And Run All).\n", "\n", "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import os.path\n", "import pandas as pd\n", "\n", "datadir = \"publicdata\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q1** Assuming that `path` refers to a CSV file that has the same format of $x$ rows of data, with one header line of `year,sex,name,count` and data lines with those same four fields, write a function\n", "\n", " readTopNamesDoL(path)\n", "\n", "that reads the file and creates a DoL representation and returns that dictionary from the function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "c02bdfa684199538b536275c66ad31a4", "grade": false, "grade_id": "cell-8849b03d485bbf5d", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "tn10 = readTopNamesDoL(os.path.join(datadir, \"tn10.csv\"))\n", "print(tn10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "0f3fb6d610559cb8d452878d6cf231ec", "grade": true, "grade_id": "cell-44470aa818a11e56", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "tn10 = readTopNamesDoL(os.path.join(datadir, \"tn10.csv\"))\n", "assert isinstance(tn10, dict)\n", "assert len(tn10) == 4\n", "assert 'year' in tn10\n", "assert 'sex' in tn10\n", "assert 'count' in tn10\n", "assert len(tn10['year']) == 10" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "ff100c8b8c0ac05ba0a55ded679243d7", "grade": true, "grade_id": "cell-6b52995ee056618a", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# hiddent tests here\n", "assert True\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q2** Write a function \n", "\n", " filterTopNamesDoL(tnDoL, threshold)\n", " \n", "to create a **filter** copy of a topnames DoL `tnDoL` (with columns `year`, `sex`, `name`, `count`) so that only rows with a count value greater than or equal to `threshold` are present in the newly created DoL. Your function should return the new and filtered DoL." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "09b53b39fbc116d67f9b40338344d92e", "grade": false, "grade_id": "cell-698093a1af18a3b9", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "tn10 = readTopNamesDoL(os.path.join(datadir, \"tn10.csv\"))\n", "tn = readTopNamesDoL(os.path.join(datadir, \"topnames.csv\"))\n", "tn10_filter = filterTopNamesDoL(tn10, 19000)\n", "print(tn10_filter)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "d7fc19dece4966af91d3dd7601edb13d", "grade": true, "grade_id": "cell-a61460d2e6253eef", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "topnames = {'year': [2018, 2018, 2017, 2017, 2016, 2016],\n", " 'sex': ['Male', 'Female', 'Male',\n", " 'Female', 'Male', 'Female'],\n", " 'name': ['Liam', 'Emma', 'Liam', 'Emma',\n", " 'Noah', 'Emma'],\n", " 'count': [19837, 18688, 18798, 19800, 19117, 19496]}\n", "\n", "filtered = filterTopNamesDoL(topnames, 19000)\n", "assert len(filtered['count']) == 4\n", "assert len(filtered['year']) == 4\n", "assert len(filtered['name']) == 4\n", "assert len(filtered['sex']) == 4" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "a7f77619dafe0ef3b703b8bb439f23b7", "grade": true, "grade_id": "cell-f2b1c0fdb91f8089", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Tests are hidden to allow def of correct read function\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q3** Write a function \n", "\n", " addCatColumnDoL(tnDoL, threshold1, threshold2)\n", " \n", "that adds a categorical column to a DoL representation in parameter `tnDoL` with the new column named `category` whose values are the strings `\"small\"` when count is below `threshold1`, is `\"medium\"` when count is greater than or equal to `threshold1` and less than `threshold2`, and `large` when count is greater than or equal to `threshold2`. This change to `tnDoL` happens in place, rather than creating a new dictionary, and so nothing is returned from the function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "2f82dc0a92c47d54237b96295ac6aa31", "grade": false, "grade_id": "cell-5a913843f1fafff9", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "topnames = {'year': [2018, 2018, 2017, 2017, 2016, 2016],\n", " 'sex': ['Male', 'Female', 'Male',\n", " 'Female', 'Male', 'Female'],\n", " 'name': ['Liam', 'Emma', 'Liam', 'Emma',\n", " 'Noah', 'Emma'],\n", " 'count': [19837, 18688, 18798, 19800, 19117, 19496]}\n", "addCatColumnDoL(topnames, 19000, 19500)\n", "print(topnames)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "41d78f4fa725816ef9ba33adba98d349", "grade": true, "grade_id": "cell-fec37caa6e87b5ef", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "topnames = {'year': [2018, 2018, 2017, 2017, 2016, 2016],\n", " 'sex': ['Male', 'Female', 'Male',\n", " 'Female', 'Male', 'Female'],\n", " 'name': ['Liam', 'Emma', 'Liam', 'Emma',\n", " 'Noah', 'Emma'],\n", " 'count': [19837, 18688, 18798, 19800, 19117, 19496]}\n", "addCatColumnDoL(topnames, 19000, 19500)\n", "assert 'category' in topnames\n", "assert 'small' in topnames['category']\n", "assert 'medium' in topnames['category']\n", "assert 'large' in topnames['category']\n", "assert len(topnames['category']) == len(topnames['year'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q4** Write a function\n", "\n", " dropColumnDoL(DoL, columnname)\n", " \n", "that drops the column specified by `columnname` from the dictionary of lists representation given in `DoL`. This should be done \"in place\". If `columnname` does not refer to one of the columns in `DoL`, the function should simply return." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "daa14c44f6fd56a084ac7e9edce5affd", "grade": false, "grade_id": "cell-7660b2b70deeb730", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6fc80870db911e1620052c6d96ccb5d6", "grade": true, "grade_id": "cell-81d9c990ab44329d", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "topnames = {'year': [2018, 2018, 2017, 2017, 2016, 2016],\n", " 'sex': ['Male', 'Female', 'Male',\n", " 'Female', 'Male', 'Female'],\n", " 'name': ['Liam', 'Emma', 'Liam', 'Emma',\n", " 'Noah', 'Emma'],\n", " 'count': [19837, 18688, 18798, 19800, 19117, 19496]}\n", "\n", "dropColumnDoL(topnames, 'sex')\n", "assert 'year' in topnames\n", "assert 'name' in topnames\n", "assert 'count' in topnames\n", "assert 'sex' not in topnames\n", "\n", "topnames = {'year': [2018, 2018, 2017, 2017, 2016, 2016],\n", " 'sex': ['Male', 'Female', 'Male',\n", " 'Female', 'Male', 'Female'],\n", " 'name': ['Liam', 'Emma', 'Liam', 'Emma',\n", " 'Noah', 'Emma'],\n", " 'count': [19837, 18688, 18798, 19800, 19117, 19496]}\n", "\n", "dropColumnDoL(topnames, 'foo')\n", "assert 'year' in topnames\n", "assert 'name' in topnames\n", "assert 'count' in topnames\n", "assert 'sex' in topnames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q5** Assuming a CSV that has the same format of $x$ rows of data, with one header line and data lines with the same four fields, write a function\n", "\n", " readTopNamesLoL(path)\n", "\n", "that reads the file and creates a LoL representation and returns both the list of column names and the list of lists structure from the function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "3a0d1c4e8d0d0c76bb4f0259b80b6440", "grade": false, "grade_id": "cell-a2f3d39ffa88a59d", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "tn10columns, tn10data = readTopNamesLoL(os.path.join(datadir, \"tn10.csv\"))\n", "print(tn10columns)\n", "print(tn10data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "85e738f6f1e0ff14f5a75d0b64dcc5d1", "grade": true, "grade_id": "cell-06e29c14178a4333", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "tn10columns, tn10data = readTopNamesLoL(os.path.join(datadir, \"tn10.csv\"))\n", "assert isinstance(tn10data, list)\n", "assert len(tn10data) == 10\n", "assert 'year' in tn10columns\n", "assert 'sex' in tn10columns\n", "assert 'count' in tn10columns\n", "assert len(tn10data[0]) == 4" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "f2e9d6bd5d8313e5fc56574643b053ce", "grade": true, "grade_id": "cell-c2dabcd118a45389", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# hiddent tests here\n", "assert True\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q6** Write a function \n", "\n", " filterTopNamesLoL(tnLoL, threshold)\n", " \n", "to **filter** a topnames LoL `tnLoL` (with columns `year`, `sex`, `name`, `count`) so that only rows with a count value greater than or equal to `threshold` are present in the newly created LoL. Note that you are creating a **new** LoL with the filtered data, and **not** modifying `tnLoL` in place. Your function should return the new LoL." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "1001d16f5d94a0c9870f32ebea0ca03b", "grade": false, "grade_id": "cell-a97c99c74a286d53", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "tn10columns, tn10data = readTopNamesLoL(os.path.join(datadir, \"tn10.csv\"))\n", "tncolumns,tndata = readTopNamesLoL(os.path.join(datadir, \"topnames.csv\"))\n", "tn_filter = filterTopNamesLoL(tn10data, 19000)\n", "print(tn_filter)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "9f8a8b9a44dfd4babc7d11a8b6bf0b93", "grade": true, "grade_id": "cell-e47341b9e033c921", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "topnames = [[2018, 'Male', 'Liam', 19837],\n", " [2018, 'Female', 'Emma', 18688],\n", " [2017, 'Male', 'Liam', 18798],\n", " [2017, 'Female', 'Emma', 19800],\n", " [2016, 'Male', 'Noah', 19117],\n", " [2016, 'Female', 'Emma', 19496]]\n", "columns = ['year', 'sex', 'name', 'count']\n", "\n", "filtered = filterTopNamesLoL(topnames, 19000)\n", "assert len(filtered) == 4" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "ba385c3181df2f6f6b6324295d93337c", "grade": true, "grade_id": "cell-c685e1fd5ce3d54a", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Tests are hidden to allow def of correct read function\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q (HW)** Write a function \n", "\n", " addCatColumnLoL(tnLoL, threshold1, threshold2)\n", " \n", "that adds a categorical column to a LoL representation in parameter `tnDoL` with the new column named `category` whose values are the strings `\"small\"` when count is below `threshold1`, is `\"medium\"` when count is greater than or equal to `threshold1` and less than `threshold2`, and `large` when count is greater than or equal to `threshold2`. The function should perform its modifications in place." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q (HW)** Write a function\n", "\n", " dropColumnLoL(LoL, columns, columnname)\n", " \n", "that drops the column specified by `columnname` from the list of lists representation given in `LoL`. This should be done \"in place\" and should **not** assume the topnames columns, but rather use the list of columns specified in `columns` to determine which column to drop." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }