{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Denison CS181/DA210 Homework\n", "\n", "Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\\rightarrow$Restart And Run All).\n", "\n", "Make sure you fill in any place that says YOUR CODE HERE or \"YOUR ANSWER HERE\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import os.path\n", "import pandas as pd\n", "\n", "datadir = \"publicdata\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q1** This question deals with the tuburculosis dataset. At no point should you rearrange the order of the rows.\n", "\n", "1. Read table6.csv into a dataframe df1.\n", "2. Combine 'century' and 'yearDigits' into one column, 'year' (whose values are strings), then drop the two old columns. Use copy() to avoid modifying the original data frame. Store the result as df1a.\n", "3. Starting from df1a, split the column 'rate' into two new columns 'cases' (the number before the slash) and 'population' (the number after). After you're done, drop 'rate'. Store the result as df1b." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "eff717fe378b975de146463896a3c037", "grade": false, "grade_id": "cell-eda84885c5aa2fe1", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "57656dee5eb6c7c519bd24dc13f40310", "grade": true, "grade_id": "cell-604a9ca2dedc8fe8", "locked": true, "points": 4, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "\n", "assert df1.shape == (6,4)\n", "assert df1a.shape == (6,3)\n", "assert df1b.shape == (6,4)\n", "assert df1.iloc[2,3] == '37737/172006362'\n", "assert df1a.iloc[3,2] == '2000'\n", "assert df1b.iloc[4,3] == '1272915272'\n", "assert df1b.iloc[0,2] == \"745\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q2** Read us_rent_income.csv into a dataframe (with \"GEOID\" as the index), then transform as needed to make it tidy. Store the result as df_rent." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "98f7f4a7a3892610c5622e1a8552c611", "grade": false, "grade_id": "cell-5bba2b9d91e4bf79", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "99a5c2e77454339e73a2dddf81015491", "grade": true, "grade_id": "cell-d927d8a1051f9d9d", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "\n", "assert(df_rent.shape == (52,4))\n", "assert(df_rent.iloc[0,0] == 24476.0)\n", "assert(df_rent.iloc[0,1] == 747.0)\n", "assert(df_rent.iloc[0,2] == 136.0)\n", "assert(df_rent.iloc[0,3] == 3.0)\n", "assert(df_rent.iloc[20,0] == 37147.0)\n", "assert(df_rent.iloc[31,1] == 809.0) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q3** Consider the data on religions and income, gathered by Pew Research Center and hosted at this link:\n", "\n", "https://github.com/chendaniely/pandas_for_everyone/blob/master/data/pew.csv\n", "\n", "The data is also available as \"pew.csv\" in the data folder. In the markdown cell that follows, read the data into a DataFrame assigned to df. In the subsequent markdown cell, answer the question: Is this data in tidy data form? Explain your answer." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "dc4c0884fecbd619066e9038f415ee6b", "grade": true, "grade_id": "cell-cefd70c125c26355", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "cell_type": "markdown", "checksum": "2fb2cbeec43eaf387c568aa7a5d8f366", "grade": true, "grade_id": "cell-75bc095bb1409b07", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q4** Explore the data from the previous exercise, then **from the data** list the independent variable(s) and the dependent variable(s). Note: this data came from a survey of counting individuals based on their religion and their income category." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "cell_type": "markdown", "checksum": "480ef83aa1ff55c3a190fbc5624c29fa", "grade": true, "grade_id": "cell-3b610ab224c05354", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q5** Transform as needed to make it tidy." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "f8ec55ae1b35557dd83263d5015f0785", "grade": false, "grade_id": "cell-63953f65aa520b07", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "a5acfce4ac3b370cbd5149e6d76b3b45", "grade": true, "grade_id": "cell-42c6a8c97256fdaa", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "\n", "assert(df_rel.shape == (180,3))\n", "assert(df_rel.iloc[0,0] == \"Agnostic\")\n", "assert(df_rel.iloc[0,1] == \"<$10k\")\n", "assert(df_rel.iloc[0,2] == 27)\n", "assert(df_rel.iloc[41,0] == \"Evangelical Prot\")\n", "assert(df_rel.iloc[89,1] == \"$40-50k\")\n", "assert(df_rel.iloc[104,2] == 14)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }