{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Denison CS181/DA210 Homework\n", "\n", "Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\\rightarrow$Restart And Run All).\n", "\n", "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import os.path\n", "import pandas as pd\n", "\n", "datadir = \"publicdata\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q1** This question deals with the tuburculosis dataset. At no point should you rearrange the order of the rows.\n", "\n", "1. Read `table6.csv` into a dataframe `df1`.\n", "2. Combine 'century' and 'yearDigits' into one column, 'year' (whose values are strings), then drop the two old columns. Use `copy()` to avoid modifying the original data frame. Store the result as `df1a`.\n", "3. Starting from `df1a`, split the column 'rate' into two new columns 'cases' (the number before the slash) and 'population' (the number after). After you're done, drop 'rate'. Store the result as `df1b`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "eff717fe378b975de146463896a3c037", "grade": false, "grade_id": "cell-eda84885c5aa2fe1", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "57656dee5eb6c7c519bd24dc13f40310", "grade": true, "grade_id": "cell-604a9ca2dedc8fe8", "locked": true, "points": 4, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "\n", "assert df1.shape == (6,4)\n", "assert df1a.shape == (6,3)\n", "assert df1b.shape == (6,4)\n", "assert df1.iloc[2,3] == '37737/172006362'\n", "assert df1a.iloc[3,2] == '2000'\n", "assert df1b.iloc[4,3] == '1272915272'\n", "assert df1b.iloc[0,2] == \"745\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q2** Read `us_rent_income.csv` into a dataframe (with \"GEOID\" as the index), then transform as needed to make it tidy. Store the result as `df_rent`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "98f7f4a7a3892610c5622e1a8552c611", "grade": false, "grade_id": "cell-5bba2b9d91e4bf79", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "99a5c2e77454339e73a2dddf81015491", "grade": true, "grade_id": "cell-d927d8a1051f9d9d", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "\n", "assert(df_rent.shape == (52,4))\n", "assert(df_rent.iloc[0,0] == 24476.0)\n", "assert(df_rent.iloc[0,1] == 747.0)\n", "assert(df_rent.iloc[0,2] == 136.0)\n", "assert(df_rent.iloc[0,3] == 3.0)\n", "assert(df_rent.iloc[20,0] == 37147.0)\n", "assert(df_rent.iloc[31,1] == 809.0) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q3** Consider the data on religions and income, gathered by Pew Research Center and hosted at this link:\n", "\n", "https://github.com/chendaniely/pandas_for_everyone/blob/master/data/pew.csv\n", "\n", "The data is also available as `\"pew.csv\"` in the data folder. In the markdown cell that follows, read the data into a DataFrame assigned to `df`. In the subsequent markdown cell, answer the question: Is this data in tidy data form? Explain your answer." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "dc4c0884fecbd619066e9038f415ee6b", "grade": true, "grade_id": "cell-cefd70c125c26355", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "cell_type": "markdown", "checksum": "2fb2cbeec43eaf387c568aa7a5d8f366", "grade": true, "grade_id": "cell-75bc095bb1409b07", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q4** Explore the data from the previous exercise, then **from the data** list the independent variable(s) and the dependent variable(s). Note: this data came from a survey of counting individuals based on their religion and their income category." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "cell_type": "markdown", "checksum": "480ef83aa1ff55c3a190fbc5624c29fa", "grade": true, "grade_id": "cell-3b610ab224c05354", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q5** Transform as needed to make it tidy." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "f8ec55ae1b35557dd83263d5015f0785", "grade": false, "grade_id": "cell-63953f65aa520b07", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "a5acfce4ac3b370cbd5149e6d76b3b45", "grade": true, "grade_id": "cell-42c6a8c97256fdaa", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "\n", "assert(df_rel.shape == (180,3))\n", "assert(df_rel.iloc[0,0] == \"Agnostic\")\n", "assert(df_rel.iloc[0,1] == \"<$10k\")\n", "assert(df_rel.iloc[0,2] == 27)\n", "assert(df_rel.iloc[41,0] == \"Evangelical Prot\")\n", "assert(df_rel.iloc[89,1] == \"$40-50k\")\n", "assert(df_rel.iloc[104,2] == 14)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }