{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Denison CS181/DA210 Homework\n", "\n", "Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\\rightarrow$Restart And Run All).\n", "\n", "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import io\n", "from lxml import etree\n", "import json\n", "import sys\n", "import os.path\n", "import pandas as pd\n", "\n", "datadir = \"publicdata\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q1** Consider the following table of `subjects` data:\n", "\n", "subject | name | department\n", "---------|-----|--------------\n", "CS | Computer Science | MATH\n", "MATH | Mathematics | MATH\n", "ENGL | English Literature | ENGL\n", "\n", "Using a *text editor*, edit and create a file named `subjects.xml` in the current directory that creates a legal XML representation of this data. Once created, write a Python code sequence to read and parse the file, and then, using the technique from this section, print the entire tree. In the penultimate step, you create a Python string to reference the decoded string version of the tree before you print. Name this variable `subjects_str`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "e93ab9966e2b2d631aab76b8bc049101", "grade": false, "grade_id": "cell-352a2a50e91b618c", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "4b344f790b8cab4278de4fe85353a432", "grade": true, "grade_id": "cell-33dd26f22f1d5e39", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing Cell\n", "\n", "path = os.path.join(\".\", \"subjects.xml\")\n", "assert os.path.isfile(path)\n", "assert isinstance(subjects_str, str)\n", "assert 75 < len(subjects_str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q2** Now consider the `courses` table below. Using a text editor, edit and create `courses.xml` that contains a an XML tree representing this table:\n", "\n", "subject | coursenum | title\n", "---------|-----------|-----------------------\n", "CS | 110 | Computing with Digital Media\n", "CS | 372 | Operating Systems\n", "MATH | 210 | Proof Techniques\n", "ENGL | 213 | Early British Literature\n", "\n", "Once created, write a Python code sequence to read and parse the file, and then, using the technique from this section, print the entire tree. Assign the string version of the courses tree to the variable `courses_str`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "d716c9113b683222048af52e7f6f946e", "grade": false, "grade_id": "cell-90c8b3ec459f1a09", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "784faed5cc0d1d0a1996daac8a2847f2", "grade": true, "grade_id": "cell-559f37743a566fde", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing Cell\n", "\n", "path = os.path.join(\".\", \"courses.xml\")\n", "assert os.path.isfile(path)\n", "assert isinstance(courses_str, str)\n", "assert 75 < len(courses_str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q3** Suppose you wanted a tree that contained **both** of the above tables. Write a file named `school.xml` in the current directory that composes **as a single tree** both of the above component tables.\n", "\n", "As before, once created, write a Python code sequence to read and parse the file, and the print the entire tree. In order to not depend on the correctness of the prior two questions, this problem will be graded manually, so you do not need any particular variable names." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "48dc8dad0bd6435c04f41365e7c4340c", "grade": true, "grade_id": "cell-04bdc0fe17d48c37", "locked": false, "points": 3, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q3** Write a function:\n", " \n", " getLocalXML(filename, datadir=\".\", parser=None)\n", " \n", "that performs the common steps of creating a path from the given `filename` and `datadir` and parses the XML file, using the passed `parser`, if any, and returns the Element at the **root** of the tree. If parser is not passed, the standard `XMLParser` should be used.\n", "\n", "If the file is not found, or if the parse is unsuccessful (due to XML not being \"well formed\"), the function should return `None`. Remember that if a parse is unsuccessful, the `etree` module raises an exception. That means that you should have a `try` block, and indented within that block, the `parse()` invocation should occur. The `try` block is followed by an `except Exception as e:` line, and within that, your return `None`. If no exception is raised, code execution will proceed beyond the `try`/`except` block, and that is where you would return the root of the parsed tree." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "dc60129e2c9655261e8becb2445eadd7", "grade": false, "grade_id": "cell-81428647baf27b6c", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "d6144ce6aa101a8303fecf3330e31d89", "grade": true, "grade_id": "cell-6f7ca02f25f931f2", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "myparser = etree.XMLParser(remove_blank_text=True)\n", "# Testing cell\n", "wroot = getLocalXML(\"widombooks.xml\", datadir, myparser)\n", "assert len(wroot) == 8\n", "bad = getLocalXML(\"foo.xml\", datadir, myparser)\n", "assert bad == None\n", "bad2 = getLocalXML(\"bad.xml\", datadir)\n", "assert bad2==None" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }