{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n", "\n", "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name and collaborators below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "NAME = \"\"\n", "COLLABORATORS = \"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Frame: Creation and Basic Access\n", "\n", "## Creation from Native Data Structure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import os.path\n", "import pandas as pd\n", "\n", "datadir = \"publicdata\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q1** Fill in the remaing code in the following cell, to build List of Dictionaries representation of the data in `madison_temp.csv` in the data directory, assigning to `LoD`. Your code is a single assigment to add exactly one field to a dictionary row." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "0ec767b73748338ec8bd9cbbe4a5be25", "grade": true, "grade_id": "cell-68d9f41c9730a7f8", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "path = os.path.join(datadir, \"madison_temp.csv\")\n", "\n", "LoD = []\n", "with open(path, 'r') as f:\n", " columns = f.readline().strip().split(',')\n", " for line in f:\n", " D = {}\n", " fields = line.strip().split(',')\n", " for column_number, value in enumerate(fields):\n", " # YOUR CODE HERE\n", " raise NotImplementedError()\n", " LoD.append(D)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q2** Using the same incantation as the book in section 6.3.1 on data frame creation, create a pandas data frame named `temps` using the `LoD` created in **Q1**." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "7f54747a6d62951bd2b975f981c79264", "grade": true, "grade_id": "cell-969255558e24c607", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q3** Using the methods shown in 6.3.2 to obtain the prefix and suffix of the data, do so for the `temps` data frame in the following two cells." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "b6b1b810900854b0beca9b0dfa9b69e6", "grade": true, "grade_id": "cell-851c7d99e842827a", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "b7c529c54164abf3498eba14fd732361", "grade": true, "grade_id": "cell-ccd18a93fe1bcef5", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q4** In a single assignment line, using an **attribute** of the data frame object, assign to `nrows` and `ncols` the number of rows and columns in the temperatures data set." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "87cb23aa2e28953f880e1a8b090254c4", "grade": true, "grade_id": "cell-95a884efb4c95c06", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "print(nrows, ncols)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q5** In the following code cell, invoke the data frame method that gives detailed information and the dataframe (its `Index`, columns, and data types for each of the columns. The, in the markdown cell below that, write your observations on the `Index` and the data types. Given the data in the data set, what data types are **not** correct if we want to work with the data set? (Hint: On page 168, in the first paragraph, we see a similar set of observations for the `topnames` data set.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "956b6b31f0618770fcd704a184678f4e", "grade": true, "grade_id": "cell-0876863e052a5f0f", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "cell_type": "markdown", "checksum": "d14d75e6fce4f97b0508a5745ff7975a", "grade": true, "grade_id": "cell-1fccd52d920f2d3c", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q6** Print the `columns` attribute of the data frame and then the `index` attribute of the data frame." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "dbb14fce0a91567457123353de89decd", "grade": true, "grade_id": "cell-4c8b9a4532d9dbfa", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q7** Carefully consider the data set. In the cell that follows, identify the independent variable(s) and the dependent variable(s)." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "cell_type": "markdown", "checksum": "4c77c004cc402f857b56440e2fb53199", "grade": true, "grade_id": "cell-7ec5857915d8c023", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q8** As done in the book (pg 169) set the `Index` (the logical row identifiers) to be the variable or variables you have identified as the independent variable(s). It you identified more than one independent variable, you specify a list. If there is just a single independent variable, you can either specify a list with one element, or can specify just the string column name. Assign the result to `temps2`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "3d1ece632031ef1f57b56578c69c183a", "grade": true, "grade_id": "cell-d37e2e5a9d62fd48", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q9** Using the method that gives a prefix of the data, invoke the method to show the first **10** rows of `temps2`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "cd437e254a7c0abeb8a40c1e81546c35", "grade": true, "grade_id": "cell-e502fa41572e8145", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q10** Repeat the method to find information about a data frame, this time using the `temps2` data frame. What are the differences between the earlier information gathering about `temps`?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "4da84b71cc526f681afe796c70dff8ac", "grade": true, "grade_id": "cell-d522148291b3ee4a", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "cell_type": "markdown", "checksum": "9834db579a012edf57ca7b29e3b4a0db", "grade": true, "grade_id": "cell-94f730b01fb6ba73", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q11** The last DataFrame method discussed in Section 6.3 allows us to transform an existing data frame and convert data types associated with columns. Invoke this method on either the `temps` or `temps2` data frames and convert the `EMXT` and `EMNT` columns to be integer numbers. Assign the result to `temps3`, and show the last 10 rows of the result." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "350b04351ddc942ffd498e81c0a3ca8b", "grade": true, "grade_id": "cell-e02b0f846b2d0f0d", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q12** At the end of section 6.3.1, the book demonstrates how to construct a data frame directly from a CSV file. Do so now to create a data frame named `earthquakes` from the CSV file named `\"earthquake_all_month.csv\"` in the data directory." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "741309540c4002452b9d37c98e39558e", "grade": true, "grade_id": "cell-cd78774fba01a30b", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "earthquakes.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q_n** In any time remaining, explore the `earthquakes` data frame performing as many of the methods from **Q3** through **Q11** as are applicable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }