{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Denison CS181/DA210 Homework\n", "\n", "Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\\rightarrow$Restart And Run All).\n", "\n", "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from lxml import etree\n", "import pandas as pd\n", "import os.path\n", "\n", "datadir = \"publicdata\"\n", "\n", "myparser = etree.XMLParser(remove_blank_text=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bpath = os.path.join(datadir, \"breakfast.xml\")\n", "print(bpath)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q1** In the following cell, reproduce your function from last homework\n", "\n", " getLocalXML(filename, datadir=\".\", parser=None)\n", " \n", "that performs the common steps of creating a path from the given `filename` and `datadir` and parses the XML file, using the passed `parser`, if any, and returns the Element at the **root** of the tree. If parser is not passed, the standard `XMLParser` should be used.\n", "\n", "If the file is not found, or if the parse is unsuccessful (due to XML not being \"well formed\"), the function should return `None`. Remember that if a parse is unsuccessful, the `etree` module raises an exception. That means that you should have a `try` block, and indented within that block, the `parse()` invocation should occur. The `try` block is followed by an `except Exception as e:` line, and within that, your return `None`. If no exception is raised, code execution will proceed beyond the `try`/`except` block, and that is where you would return the root of the parsed tree." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "6d96b21ca080255baf1e5e778e2d15c2", "grade": false, "grade_id": "cell-80875b80b016b255", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "46c41ecc1e5e08407b88eac0a4770aef", "grade": true, "grade_id": "cell-b84232ff1935b471", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "myparser = etree.XMLParser(remove_blank_text=True)\n", "# Testing cell\n", "wroot = getLocalXML(\"widombooks.xml\", datadir, myparser)\n", "assert len(wroot) == 8\n", "bad2 = getLocalXML(\"bad.xml\", datadir)\n", "assert bad2==None\n", "assert isinstance(wroot, etree._Element)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q2** Use your function to obtain the root Element from the data directory and the XML file named `breakfast.xml`, assigning to Python variable `broot`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "9ff3fd14b56b034b0b9547333e139f9d", "grade": false, "grade_id": "cell-c8f8f1f5c7e0fdbd", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "8fbdec9b20bf867fac15a3796b38df2d", "grade": true, "grade_id": "cell-a8a452ac07ba6026", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert isinstance(broot, etree._Element)\n", "assert len(broot) == 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q3** Using the Element `broot`, find all children with the tag `'food'` and store them in a list of Elements called `foodlist`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "128fa3374583d32447a28e70e5fdf399", "grade": false, "grade_id": "cell-35070b31963283bb", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "foodlist" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "537b7578a98a51ef5477b670a6efcc2e", "grade": true, "grade_id": "cell-69ec6b1ab4ae757b", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert isinstance(foodlist, list)\n", "assert len(foodlist) == 5\n", "assert isinstance(foodlist[0], etree._Element)\n", "assert foodlist[0].tag == 'food'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q4** Create two parallel lists consisting of the prices and the calories for each of the food elements under menu. You can use your solution to **Q3** or can use another method for iterating over the children of the root node. For each, you will access the attributes of the food node and collect the values of the two desired attributes. The final lists should be assigned to `prices` and `calories`, respectively. Make sure you do your type conversions so that prices are real-valued and calories are integers." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "aa0edc9549e9f1f5f2c333c547c84fb9", "grade": false, "grade_id": "cell-842361b9b983b7aa", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "c938398b9ec1ad26a40e97b688fd85db", "grade": true, "grade_id": "cell-15a9f8809f9ae555", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert isinstance(prices, list)\n", "assert len(prices) == 5\n", "assert isinstance(prices[0], float)\n", "assert prices[0] == 5.95\n", "assert prices[-1] == 6.95\n", "\n", "assert isinstance(calories, list)\n", "assert len(calories) == 5\n", "assert isinstance(calories[0], int)\n", "assert calories[0] == 650\n", "assert calories[-1] == 950" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q5** Look up the documentation for the `iter` method. Use this to iterate over all the `description`-tagged Elements starting from `broot` and accumulate a list, `dlist`, with the `description` **text** value from these Element nodes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "020db8a77d186fe575cfe2a12ce93184", "grade": false, "grade_id": "cell-cb10c55ff6cba7e2", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "dlist" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "863d419554fe2871dae14328c7708fdc", "grade": true, "grade_id": "cell-64963fbd6b5eee63", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert isinstance(dlist, list)\n", "assert len(dlist) == 5\n", "assert isinstance(dlist[0], str)\n", "assert dlist[0].count('plenty') == 1\n", "assert dlist[-1].count(',') == 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q6** Assign to `wroot` the root `Element` object for the `widombooks.xml` in the data directory." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "3e98b699370234c02b1aeb699067a753", "grade": false, "grade_id": "cell-c340af5cd0f9e4ff", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "26cd2dfd498d4d2bd14345d66d06778a", "grade": true, "grade_id": "cell-97ef1944060ec677", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert isinstance(wroot, etree._Element)\n", "assert len(wroot) == 8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q7** Using the Element `wroot` from above, get the attributes of the first child tagged `'Book'`, and store your answer as a dictionary `myAttrib`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "5729f1e4d4a6c551bc3fd182f58eabe0", "grade": false, "grade_id": "cell-0709f3c991ed9627", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "\n", "print(myAttrib)\n", "print(type(myAttrib))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "d9d62becc0797db53db1e5da1f65d237", "grade": true, "grade_id": "cell-2e0723559f16076f", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "\n", "assert isinstance(myAttrib, etree._Attrib)\n", "assert myAttrib['Price'] == '85'\n", "assert len(myAttrib) == 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q8** Using the Element `wroot`, find all children with the tag `'Book'` and store them in a list of Elements called `booklist`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "ffffe76caa0dadb33091afb0c339eda9", "grade": false, "grade_id": "cell-1c4c6f6fbc20a50f", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "print(booklist)\n", "booklist[0].tag" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "382c202553f0696baab3fb1ed39b61e9", "grade": true, "grade_id": "cell-b01bc4ccde48f4aa", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert isinstance(booklist, list)\n", "assert len(booklist) == 4\n", "assert isinstance(booklist[0], etree._Element)\n", "assert booklist[0].tag == 'Book'" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "75a73e4f624399c7dbe76dec8bad2b4a", "grade": false, "grade_id": "cell-2594153d95c35d93", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Q9** Using the Element `wroot`, find all descendent children with the tag `'Magazine'`, extract the title text from each, and store them in a list of strings called `titlelist` (one title per magazine in `widombooks.xml`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "48e97ca37e78c951bdca39b269a12cd0", "grade": false, "grade_id": "cell-a5dceca149dd2bca", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# Solution cell\n", "# YOUR CODE HERE\n", "raise NotImplementedError()\n", "\n", "print(titlelist)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "067e2e4fd75aa36d8ff5d786f36b1906", "grade": true, "grade_id": "cell-c2591f1d19c9010c", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "# Testing cell\n", "assert len(titlelist) == 4\n", "assert \"Newsweek\" in titlelist\n", "assert \"Hector and Jeff's Database Hints\" in titlelist" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q10** Write a function\n", "\n", " findValue(node, tag)\n", " \n", "that, relative to `node` finds the first subelement matching `tag` and returns the `.text` attribute if found, and None, if no match was found." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "41710c42b97cae4eaba12db433daaf1f", "grade": false, "grade_id": "cell-7283f9c7a466f921", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "ec781b99e897b06d65f674414ef28909", "grade": true, "grade_id": "cell-8e84685fc6b4e470", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert findValue(wroot, \"Supplies\") == None\n", "booklist = wroot.findall(\"Book\")\n", "assert isinstance(findValue(booklist[1], \"Remark\"), str)\n", "assert findValue(booklist[1], \"Remark\").count(\"Buy\") == 1" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }