{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Denison CS181/DA210 Homework\n",
    "\n",
    "Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\\rightarrow$Restart And Run All).\n",
    "\n",
    "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import os.path\n",
    "import pandas as pd\n",
    "\n",
    "datadir = \"publicdata\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q1** Assuming that `path` refers to a CSV file that has the same format of $x$ rows of data, with one header line of `year,sex,name,count` and data lines with those same four fields, write a function\n",
    "\n",
    "    readTopNamesDoL(path)\n",
    "\n",
    "that reads the file and creates a DoL representation and returns that dictionary from the function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "c02bdfa684199538b536275c66ad31a4",
     "grade": false,
     "grade_id": "cell-8849b03d485bbf5d",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "tn10 = readTopNamesDoL(os.path.join(datadir, \"tn10.csv\"))\n",
    "print(tn10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "0f3fb6d610559cb8d452878d6cf231ec",
     "grade": true,
     "grade_id": "cell-44470aa818a11e56",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "tn10 = readTopNamesDoL(os.path.join(datadir, \"tn10.csv\"))\n",
    "assert isinstance(tn10, dict)\n",
    "assert len(tn10) == 4\n",
    "assert 'year' in tn10\n",
    "assert 'sex' in tn10\n",
    "assert 'count' in tn10\n",
    "assert len(tn10['year']) == 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ff100c8b8c0ac05ba0a55ded679243d7",
     "grade": true,
     "grade_id": "cell-6b52995ee056618a",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# hiddent tests here\n",
    "assert True\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q2** Write a function \n",
    "\n",
    "    filterTopNamesDoL(tnDoL, threshold)\n",
    "    \n",
    "to create a **filter** copy of a topnames DoL `tnDoL` (with columns `year`, `sex`, `name`, `count`) so that only rows with a count value greater than or equal to `threshold` are present in the newly created DoL.  Your function should return the new and filtered DoL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "09b53b39fbc116d67f9b40338344d92e",
     "grade": false,
     "grade_id": "cell-698093a1af18a3b9",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "tn10 = readTopNamesDoL(os.path.join(datadir, \"tn10.csv\"))\n",
    "tn = readTopNamesDoL(os.path.join(datadir, \"topnames.csv\"))\n",
    "tn10_filter = filterTopNamesDoL(tn10, 19000)\n",
    "print(tn10_filter)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d7fc19dece4966af91d3dd7601edb13d",
     "grade": true,
     "grade_id": "cell-a61460d2e6253eef",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "topnames = {'year': [2018, 2018, 2017, 2017, 2016, 2016],\n",
    "          'sex': ['Male', 'Female', 'Male',\n",
    "                    'Female', 'Male', 'Female'],\n",
    "          'name': ['Liam', 'Emma', 'Liam', 'Emma',\n",
    "                     'Noah', 'Emma'],\n",
    "          'count': [19837, 18688, 18798, 19800, 19117, 19496]}\n",
    "\n",
    "filtered = filterTopNamesDoL(topnames, 19000)\n",
    "assert len(filtered['count']) == 4\n",
    "assert len(filtered['year']) == 4\n",
    "assert len(filtered['name']) == 4\n",
    "assert len(filtered['sex']) == 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "a7f77619dafe0ef3b703b8bb439f23b7",
     "grade": true,
     "grade_id": "cell-f2b1c0fdb91f8089",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Tests are hidden to allow def of correct read function\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q3** Write a function \n",
    "\n",
    "    addCatColumnDoL(tnDoL, threshold1, threshold2)\n",
    "   \n",
    "that adds a categorical column to a DoL representation in parameter `tnDoL` with the new column named `category` whose values are the strings `\"small\"` when count is below `threshold1`, is `\"medium\"` when count is greater than or equal to `threshold1` and less than `threshold2`, and `large` when count is greater than or equal to `threshold2`.  This change to `tnDoL` happens in place, rather than creating a new dictionary, and so nothing is returned from the function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "2f82dc0a92c47d54237b96295ac6aa31",
     "grade": false,
     "grade_id": "cell-5a913843f1fafff9",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "topnames = {'year': [2018, 2018, 2017, 2017, 2016, 2016],\n",
    "          'sex': ['Male', 'Female', 'Male',\n",
    "                    'Female', 'Male', 'Female'],\n",
    "          'name': ['Liam', 'Emma', 'Liam', 'Emma',\n",
    "                     'Noah', 'Emma'],\n",
    "          'count': [19837, 18688, 18798, 19800, 19117, 19496]}\n",
    "addCatColumnDoL(topnames, 19000, 19500)\n",
    "print(topnames)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "41d78f4fa725816ef9ba33adba98d349",
     "grade": true,
     "grade_id": "cell-fec37caa6e87b5ef",
     "locked": true,
     "points": 3,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "topnames = {'year': [2018, 2018, 2017, 2017, 2016, 2016],\n",
    "          'sex': ['Male', 'Female', 'Male',\n",
    "                    'Female', 'Male', 'Female'],\n",
    "          'name': ['Liam', 'Emma', 'Liam', 'Emma',\n",
    "                     'Noah', 'Emma'],\n",
    "          'count': [19837, 18688, 18798, 19800, 19117, 19496]}\n",
    "addCatColumnDoL(topnames, 19000, 19500)\n",
    "assert 'category' in topnames\n",
    "assert 'small' in topnames['category']\n",
    "assert 'medium' in topnames['category']\n",
    "assert 'large' in topnames['category']\n",
    "assert len(topnames['category']) == len(topnames['year'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q4** Write a function\n",
    "\n",
    "    dropColumnDoL(DoL, columnname)\n",
    "    \n",
    "that drops the column specified by `columnname` from the dictionary of lists representation given in `DoL`.  This should be done \"in place\". If `columnname` does not refer to one of the columns in `DoL`, the function should simply return."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "daa14c44f6fd56a084ac7e9edce5affd",
     "grade": false,
     "grade_id": "cell-7660b2b70deeb730",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "6fc80870db911e1620052c6d96ccb5d6",
     "grade": true,
     "grade_id": "cell-81d9c990ab44329d",
     "locked": true,
     "points": 3,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "topnames = {'year': [2018, 2018, 2017, 2017, 2016, 2016],\n",
    "          'sex': ['Male', 'Female', 'Male',\n",
    "                    'Female', 'Male', 'Female'],\n",
    "          'name': ['Liam', 'Emma', 'Liam', 'Emma',\n",
    "                     'Noah', 'Emma'],\n",
    "          'count': [19837, 18688, 18798, 19800, 19117, 19496]}\n",
    "\n",
    "dropColumnDoL(topnames, 'sex')\n",
    "assert 'year' in topnames\n",
    "assert 'name' in topnames\n",
    "assert 'count' in topnames\n",
    "assert 'sex' not in topnames\n",
    "\n",
    "topnames = {'year': [2018, 2018, 2017, 2017, 2016, 2016],\n",
    "          'sex': ['Male', 'Female', 'Male',\n",
    "                    'Female', 'Male', 'Female'],\n",
    "          'name': ['Liam', 'Emma', 'Liam', 'Emma',\n",
    "                     'Noah', 'Emma'],\n",
    "          'count': [19837, 18688, 18798, 19800, 19117, 19496]}\n",
    "\n",
    "dropColumnDoL(topnames, 'foo')\n",
    "assert 'year' in topnames\n",
    "assert 'name' in topnames\n",
    "assert 'count' in topnames\n",
    "assert 'sex' in topnames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q5** Assuming a CSV that has the same format of $x$ rows of data, with one header line and data lines with the same four fields, write a function\n",
    "\n",
    "    readTopNamesLoL(path)\n",
    "\n",
    "that reads the file and creates a LoL representation and returns both the list of column names and the list of lists structure from the function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "3a0d1c4e8d0d0c76bb4f0259b80b6440",
     "grade": false,
     "grade_id": "cell-a2f3d39ffa88a59d",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "tn10columns, tn10data = readTopNamesLoL(os.path.join(datadir, \"tn10.csv\"))\n",
    "print(tn10columns)\n",
    "print(tn10data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "85e738f6f1e0ff14f5a75d0b64dcc5d1",
     "grade": true,
     "grade_id": "cell-06e29c14178a4333",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "tn10columns, tn10data = readTopNamesLoL(os.path.join(datadir, \"tn10.csv\"))\n",
    "assert isinstance(tn10data, list)\n",
    "assert len(tn10data) == 10\n",
    "assert 'year' in tn10columns\n",
    "assert 'sex' in tn10columns\n",
    "assert 'count' in tn10columns\n",
    "assert len(tn10data[0]) == 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "f2e9d6bd5d8313e5fc56574643b053ce",
     "grade": true,
     "grade_id": "cell-c2dabcd118a45389",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# hiddent tests here\n",
    "assert True\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q6** Write a function \n",
    "\n",
    "    filterTopNamesLoL(tnLoL, threshold)\n",
    "    \n",
    "to **filter** a topnames LoL `tnLoL` (with columns `year`, `sex`, `name`, `count`) so that only rows with a count value greater than or equal to `threshold` are present in the newly created LoL.  Note that you are creating a **new** LoL with the filtered data, and **not** modifying `tnLoL` in place.  Your function should return the new LoL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "1001d16f5d94a0c9870f32ebea0ca03b",
     "grade": false,
     "grade_id": "cell-a97c99c74a286d53",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "tn10columns, tn10data = readTopNamesLoL(os.path.join(datadir, \"tn10.csv\"))\n",
    "tncolumns,tndata = readTopNamesLoL(os.path.join(datadir, \"topnames.csv\"))\n",
    "tn_filter = filterTopNamesLoL(tn10data, 19000)\n",
    "print(tn_filter)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "9f8a8b9a44dfd4babc7d11a8b6bf0b93",
     "grade": true,
     "grade_id": "cell-e47341b9e033c921",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "topnames = [[2018, 'Male', 'Liam', 19837],\n",
    "            [2018, 'Female', 'Emma', 18688],\n",
    "            [2017, 'Male', 'Liam', 18798],\n",
    "            [2017, 'Female', 'Emma', 19800],\n",
    "            [2016, 'Male', 'Noah', 19117],\n",
    "            [2016, 'Female', 'Emma', 19496]]\n",
    "columns = ['year', 'sex', 'name', 'count']\n",
    "\n",
    "filtered = filterTopNamesLoL(topnames, 19000)\n",
    "assert len(filtered) == 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ba385c3181df2f6f6b6324295d93337c",
     "grade": true,
     "grade_id": "cell-c685e1fd5ce3d54a",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Tests are hidden to allow def of correct read function\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q (HW)** Write a function \n",
    "\n",
    "    addCatColumnLoL(tnLoL, threshold1, threshold2)\n",
    "   \n",
    "that adds a categorical column to a LoL representation in parameter `tnDoL` with the new column named `category` whose values are the strings `\"small\"` when count is below `threshold1`, is `\"medium\"` when count is greater than or equal to `threshold1` and less than `threshold2`, and `large` when count is greater than or equal to `threshold2`.  The function should perform its modifications in place."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q (HW)** Write a function\n",
    "\n",
    "    dropColumnLoL(LoL, columns, columnname)\n",
    "    \n",
    "that drops the column specified by `columnname` from the list of lists representation given in `LoL`.  This should be done \"in place\" and should **not** assume the topnames columns, but rather use the list of columns specified in `columns` to determine which column to drop."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}