{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Denison CS181/DA210 Homework\n",
    "\n",
    "Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\\rightarrow$Restart And Run All).\n",
    "\n",
    "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import os.path\n",
    "import pandas as pd\n",
    "\n",
    "datadir = \"publicdata\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q1** Read CSV file `topnames.csv` in `datadir` into a data frame named `topnames0`, with no index.  Using individual operations on the `count` column, find the mean, the median, and the max count, assigning to `mean_counts`, `median_counts`, and `max_counts`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "56a21f4d74ca19c2fd0b3be41c1d6a0d",
     "grade": false,
     "grade_id": "cell-273768f03f8357e7",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "print(\"mean: \", mean_counts, \", median: \", median_counts, \", max: \", max_counts, sep=\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "33717e77cbc90703a46c471991939d68",
     "grade": true,
     "grade_id": "cell-057e3c90c18b71de",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Testing Cell\n",
    "\n",
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q2** Using the `agg` method on the `Series` of the column vector of counts, perform the same calculation of mean, median, and max in a single step, and assign to `agg_values`.  Note in a comment in the code solution cell the **data type** of the result.  Note that this invocation may not have an exact correspondent in the book, so you may have to look up documentation of using `agg` on a `Series`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "a13203da17c6ea29f0c767ea0ec98983",
     "grade": false,
     "grade_id": "cell-b34127acc546301e",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "agg_values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "0318b1296664a2fde935df60396140c4",
     "grade": true,
     "grade_id": "cell-69c4bfbbc5c3e7c6",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Testing Cell\n",
    "\n",
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q3** Create a subset of `topnames0` restricted to `Female` entries between 1960 and 1969 inclusive.  ssign this to `female_subset`.  Then use the `agg` function, in one step, to determine the mean and median count and the number of unique names, assigning to `female_aggvalues`.  In a comment in the markdown solution cell, indicate the data type of the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "e99a40d1c35bb027073f74d32e7f679d",
     "grade": false,
     "grade_id": "cell-66dbb0e2aa2de096",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "female_aggvalues"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "9c023894c1ade68ec83ff46c4684e4b4",
     "grade": true,
     "grade_id": "cell-a39ace3d52b87ed8",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Testing Cell\n",
    "\n",
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q4** The constraints for selecting the rows from the last problem are based on `sex` and `year`.  We often use these independent variables to set an index for a data set.  Then, when we want to filter rows, our operations that use row label/`index` values for filtering are different.\n",
    "\n",
    "Start by creating dataframe `topnames` with its index drawn from the columns `year` and `sex`.  Then,\n",
    "with a goal of the same use of the `agg` function from **Q3**, use `xs` to take a cross section of `topnames` to get the Female entries and then use `loc` to get a data frame, `female_subset`.  Finally, use `agg` on this data frame to, in one step, determine the mean and median count and the number of unique names, assigning to female_aggvalues."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "e4ebd7ea5c540803aee99c39ae158f75",
     "grade": false,
     "grade_id": "cell-26dd6084fbc01496",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "female_aggvalues"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "2a55f1a050739815b8fe278f3edc2f5b",
     "grade": true,
     "grade_id": "cell-28e7b414f2cfbfdb",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Testing Cell\n",
    "\n",
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q5** Read CSV file `indicators2016.csv` in `datadir` into a data frame named `indicators0`, with no index. Write code to add a new column `popSize` to `indicators0` which takes value 'high' if `pop > 300`, 'low' if `pop < 50`, and 'medium' otherwise. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "9fd6fb8998593a87d41d8b90bd19914e",
     "grade": false,
     "grade_id": "cell-12f1961184fc392c",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "indicators0.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "2fb7c51b66f77ef77a3eb9f3036de4a6",
     "grade": true,
     "grade_id": "cell-2b5d0d4dc0d329f3",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Testing Cell\n",
    "\n",
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q6** Building on the question above, use `groupby` to partition `indicators0` by this new column `popSize`, assigning to variable `groupby_pop`.  Note in a comment the data type of `groupby_pop`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "41cf473f64d9393b8af048383d7b5ef6",
     "grade": false,
     "grade_id": "cell-023cf542221ea589",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "print(len(groupby_pop))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "80be9380098501633e920a6db849ed57",
     "grade": true,
     "grade_id": "cell-b3a522c4ffed3e54",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Testing Cell\n",
    "\n",
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q7** Building on the question above, aggregate this groupby partitioning, determining the **number of non-missing elements** for each of the columns by partitiion.  Assign to `partition_counts` and also include a comment giving the data type of the result, and the row labels of the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "4f2f284bd522c5d9cf53bee23d16cd06",
     "grade": false,
     "grade_id": "cell-0c97c6a6e4ad592d",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "partition_counts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "718874848940afc4004b1b565ede7c6f",
     "grade": true,
     "grade_id": "cell-66f8badba21e1cb5",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Testing Cell\n",
    "\n",
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q8** In similar fashion, determine the mean, by partition, of pop, gdp, and cell, and the max of life, assigning to `partition_aggvalues`.  Use the `round()` method of DataFrames to round the numeric values to 2 decimal places."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "6fb0e630743d95db2c1a3a47fb2914c7",
     "grade": false,
     "grade_id": "cell-a188147fe195416e",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "partition_aggvalues"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "96f4b68141020b6b367eddb098be4aa2",
     "grade": true,
     "grade_id": "cell-527de16856555a86",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Testing Cell\n",
    "\n",
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q8** In similar fashion, determine the mean, min, and max of `gdp` and `life`, again assiging to `partition_aggvalues`.  Explain in the markdown cell after the code cell how the columns of this result differ from the last two questions, and **why**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d9af44e0deb5ebdec044311a5d94f65f",
     "grade": false,
     "grade_id": "cell-4a555edd9837afc7",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "partition_aggvalues"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "3fcfff28bd3f04d2c80b1c0d5e0309fd",
     "grade": true,
     "grade_id": "cell-0eac3eb8a2736913",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Testing Cell\n",
    "\n",
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "ec32bc897cca88b7d7d32e7845f2d6ad",
     "grade": true,
     "grade_id": "cell-e305e2a6bcb52188",
     "locked": false,
     "points": 2,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "source": [
    "YOUR ANSWER HERE"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}