{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Denison CS181/DA210 Homework\n",
    "\n",
    "Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\\rightarrow$Restart And Run All).\n",
    "\n",
    "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Client Data Acquisition\n",
    "\n",
    "> Focus on obtaining and then using data requested over the network, and in CSV and XML format.  Requisite use of `StringIO` and `BytesIO`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import os.path\n",
    "import sys\n",
    "import importlib\n",
    "import io\n",
    "import pandas as pd\n",
    "from lxml import etree\n",
    "\n",
    "if os.path.isdir(os.path.join(\"../../..\", \"modules\")):\n",
    "    module_dir = os.path.join(\"../../..\", \"modules\")\n",
    "else:\n",
    "    module_dir = os.path.join(\"../..\", \"modules\")\n",
    "\n",
    "module_path = os.path.abspath(module_dir)\n",
    "if not module_path in sys.path:\n",
    "    sys.path.append(module_path)\n",
    "\n",
    "import util\n",
    "importlib.reload(util)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q1** The purpose of `io.StringIO()` is to create a file-like object from *any* string in a Python program.  The object created \"acts\" just like a file object obtained from an `open()` file would.\n",
    "\n",
    "Consider the following single Python string, `s`, composed over multiple continued lines:\n",
    "\n",
    "    s = \"Twilight and evening bell,\\n\" \\\n",
    "        \"And after that the dark!\\n\" \\\n",
    "        \"And may there be no sadness of farewell,\\n\" \\\n",
    "        \"When I embark;\\n\"\n",
    "\n",
    "First, write some code to deal with `s` as a string:\n",
    "\n",
    "- determine the length of `s`, assign to `len_s`\n",
    "- find the integer start and end indices (inclusive) of the substring `\"dark\"` within `s`, and assign to `dark_start`/`dark_end`\n",
    "- create string `s2` by replacing `\"embark\"` with `\"disembark\"`\n",
    "\n",
    "Now, create a file-like object from `s`, and perform a first `readline()`, assigning to variable `line1` and then write a `for` loop to use the file-like object as an iterator to accumulate into a list called `lines` a list of the remaining lines.  For each of the strings in `line1` and `lines`, make sure you omit any trailing newline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "8c5aa5f46db2ef30705092d581601525",
     "grade": false,
     "grade_id": "cell-7af55ab11001c5ef",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "s = \"Twilight and evening bell,\\n\" \\\n",
    "        \"And after that the dark!\\n\" \\\n",
    "        \"And may there be no sadness of farewell,\\n\" \\\n",
    "        \"When I embark;\\n\"\n",
    "\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "print(len_s)\n",
    "print(\"start:\", dark_start, \"end:\", dark_end, \"substring:\", s[dark_start:dark_end+1])\n",
    "print(\"length s2:\", len(s2))\n",
    "print(line1)\n",
    "print(lines)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "f2fd93d0e89b10bade86cff95b35d739",
     "grade": true,
     "grade_id": "cell-125cbee27c447392",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "392f00d9a5031517ba860eafb22b47c8",
     "grade": false,
     "grade_id": "cell-c7302157b1568dc5",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "The next set of exercises involve a file at resource path `/data/mystery3.dat` on host `datasystems.denison.edu`.  You can assume the file is textual, and is a tab-separated data collection where each line consists of:\n",
    "\n",
    "    male_name <tab> male_count <tab> female_name <tab> female_count\n",
    "    \n",
    "for the top 10 name applications of each sex to the US Social Security Administration for the year 2015."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q2** Suppose the encoding of the file is unknown, but will be from one of the following:\n",
    "\n",
    "- 'UTF-8'\n",
    "- 'UTF-16BE'\n",
    "- 'UTF-16LE'\n",
    "- 'cp037'\n",
    "- 'latin_1'\n",
    "\n",
    "Write code to:\n",
    "\n",
    "- acquire the file from the web server\n",
    "- ensure the status_code is 200\n",
    "- assign to `content_type` the *value* of the `Content-Type` header line of the response\n",
    "- determine the *correct* encoding and assign to `real_encoding`\n",
    "- set the `.encoding` attribute of the response to `real_encoding`\n",
    "- assign to `tsv_body` the string text for the body of the response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "e7d6d070bf8146c863977bccc4e77164",
     "grade": false,
     "grade_id": "cell-4be125be19293b31",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "import requests\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "print(tsv_body)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "7756a6774a7f3aa0c995f73492125c80",
     "grade": true,
     "grade_id": "cell-c7b0f0b8c66a829d",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q3** In this question, you will start with a *string* and create a *Dictionary of Lists* representation of the data entailed in the string.  It is suggested to use the result of the previous problem, `csv_ body` as the starting point.  But to start independently, you can use the following string literal constant assignment to get to the same starting point:\n",
    "\n",
    "    csv_body = \"Noah\\t19635\\tEmma\\t20455\\n\" \\\n",
    "               \"Liam\\t18374\\tOlivia\\t19691\\n\" \\\n",
    "               \"Mason\\t16627\\tSophia\\t17417\\n\" \\\n",
    "               \"Jacob\\t15949\\tAva\\t16378\\n\" \\\n",
    "               \"William\\t15909\\tIsabella\\t15617\\n\" \\\n",
    "               \"Ethan\\t15077\\tMia\\t14905\\n\" \\\n",
    "               \"James\\t14824\\tAbigail\\t12401\\n\" \\\n",
    "               \"Alexander\\t14547\\tEmily\\t11786\\n\" \\\n",
    "               \"Michael\\t14431\\tCharlotte\\t11398\\n\" \\\n",
    "               \"Benjamin\\t13700\\tHarper\\t10295\\n\"\n",
    "               \n",
    "Construct a file-like object from `csv_body` and then use file object operations to create a dictionary of lists representation of the tab-separated data.  Note that there is no header line in the data, so you can name the columns `malename`, `malecount`, `femalename`, `femalecount`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "66fc42b4d5380f9596a691c954c9ea78",
     "grade": false,
     "grade_id": "cell-8e6f74a53dcbf0e9",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "tsv_body = \"Noah\\t19635\\tEmma\\t20455\\n\" \\\n",
    "               \"Liam\\t18374\\tOlivia\\t19691\\n\" \\\n",
    "               \"Mason\\t16627\\tSophia\\t17417\\n\" \\\n",
    "               \"Jacob\\t15949\\tAva\\t16378\\n\" \\\n",
    "               \"William\\t15909\\tIsabella\\t15617\\n\" \\\n",
    "               \"Ethan\\t15077\\tMia\\t14905\\n\" \\\n",
    "               \"James\\t14824\\tAbigail\\t12401\\n\" \\\n",
    "               \"Alexander\\t14547\\tEmily\\t11786\\n\" \\\n",
    "               \"Michael\\t14431\\tCharlotte\\t11398\\n\" \\\n",
    "               \"Benjamin\\t13700\\tHarper\\t10295\\n\"\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "13f8f57060ea122c8cd63e6f7f1c0947",
     "grade": true,
     "grade_id": "cell-52311628c4f73476",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert len(DoL['malename']) == 10\n",
    "assert DoL['malename'][0] == 'Noah'\n",
    "assert DoL['femalename'][9] == 'Harper'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "943f19a53a35c60ed1538e9be35ad463",
     "grade": true,
     "grade_id": "cell-6378c904ae819202",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q4** Use `pandas` to obtain a data frame named `df` by using a file-like object based on `tsv_body` and use `read_csv()`.  Name your resultant data frame `df`.  Make sure you have reasonable column names.\n",
    "\n",
    "Be careful to call `read_csv` so that the separators are tabs, not commas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "574ee630f368e1f38872a03a99add78f",
     "grade": false,
     "grade_id": "cell-3c3d6bac2b935136",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "tsv_body = \"Noah\\t19635\\tEmma\\t20455\\n\" \\\n",
    "               \"Liam\\t18374\\tOlivia\\t19691\\n\" \\\n",
    "               \"Mason\\t16627\\tSophia\\t17417\\n\" \\\n",
    "               \"Jacob\\t15949\\tAva\\t16378\\n\" \\\n",
    "               \"William\\t15909\\tIsabella\\t15617\\n\" \\\n",
    "               \"Ethan\\t15077\\tMia\\t14905\\n\" \\\n",
    "               \"James\\t14824\\tAbigail\\t12401\\n\" \\\n",
    "               \"Alexander\\t14547\\tEmily\\t11786\\n\" \\\n",
    "               \"Michael\\t14431\\tCharlotte\\t11398\\n\" \\\n",
    "               \"Benjamin\\t13700\\tHarper\\t10295\\n\"\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "07e1288cd58965ac7ec68dde34a371cc",
     "grade": true,
     "grade_id": "cell-f5595d0d045d8f6e",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert len(df) == 10\n",
    "assert isinstance(df.columns[0],str)\n",
    "assert df.iloc[0,0] == 'Noah'\n",
    "assert df.iloc[0,1] == 19635\n",
    "assert df.iloc[9,2] == 'Harper'\n",
    "assert df.iloc[9,3] == 10295"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In many of the following exercises, we will show a `curl` incantation that obtains XML-formatted text data from the Internet.  Your task will be to translate the incantation into the equivalent `requests` module programming steps, and to obtain the *parsed* XML-based `ElementTree` structure from the result, assigning to variable `root` the root of the result.  You must **always** check the status code returned from the request before further processing.  In some cases, we will ask for a specific method from among those demonstrated in the textbook section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "4d8118d32da061d0df31343bd082ae22",
     "grade": false,
     "grade_id": "cell-e1cb47c5d7b2dfd2",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Q5** Using any method, get the XML data from `school0.xml`:\n",
    "\n",
    "    curl -s -o school0.xml \\\n",
    "         https://datasystems.denison.edu/data/school0.xml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "e4a719f7c0ceaf02fc938315df4db08a",
     "grade": false,
     "grade_id": "cell-44990515d3a7ff46",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "util.print_xml(root, nlines=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "97e0f884b236ae05f326890168a52c03",
     "grade": true,
     "grade_id": "cell-b38607585e9356f3",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q6** Using the bytes data in `.content`, a *file-like-object*, and `etree.parse()`, get the XML data from `school0.xml`.\n",
    "\n",
    "    curl -s -o school0.xml \\\n",
    "         https://datasystems.denison.edu/data/school0.xml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "c734952d46144b0c3f276bb85ec0d790",
     "grade": false,
     "grade_id": "cell-2a101731e1d15e91",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "util.print_xml(root, nlines=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "aa4d35e43e422346b808e78f463c6035",
     "grade": true,
     "grade_id": "cell-7dd6de64c3c283a6",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q7** Write a function\n",
    "\n",
    "    getXMLdata(resource, location, protocol='http')\n",
    "\n",
    "that makes a request to `location` for `resource` with the specified protocol, then uses the bytes data in the `.content` of the response, with a *file-like-object*, and `etree.parse()`, to get the XML data.  On success, return the root of the tree.  On failure of either the request or the parse of the data, return `None`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "22be6d60f0a80fd60e39993760b695f6",
     "grade": false,
     "grade_id": "cell-8156b2e8688663b6",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "root = getJSONdata(\"/data/school0.xml\", \"datasystems.denison.edu\", \"https\")\n",
    "util.print_xml(root, nlines=15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "5dd91c6278e5fca1a1a9052df28f1f5f",
     "grade": true,
     "grade_id": "cell-cb7277b1e202c4e4",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q8** The `school0_32.xml` resource is encoded with `utf-32be`.  Use the method of setting the `.encoding` attribute of the response and then accessing the `.text` string body, and using `fromstring()`.  Remember that `fromstring()` expects to start from an Element, not from the header line, so you will need to skip the header to get the string to pass.\n",
    "\n",
    "    curl -s -o school0_32.xml \\\n",
    "         https://datasystems.denison.edu/data/school0_32.xml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "76eab7ff4869bbf98366d008c9c56534",
     "grade": false,
     "grade_id": "cell-493b2d0830c02eae",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "util.print_xml(root, nlines=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "12225fd6c98ff51f3c5b196f12931668",
     "grade": true,
     "grade_id": "cell-2ff200546e2864d1",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert True"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}