Denison CS181/DA210 Homework

Before you turn this problem in, make sure everything runs as expected. This is a combination of restarting the kernel and then running all cells (in the menubar, select Kernel$\rightarrow$Restart And Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE".


In [ ]:
from lxml import etree
import pandas as pd
import os.path

datadir = "publicdata"

myparser = etree.XMLParser(remove_blank_text=True)
In [ ]:
bpath = os.path.join(datadir, "breakfast.xml")
print(bpath)

Q1 In the following cell, reproduce your function from last homework

    getLocalXML(filename, datadir=".", parser=None)

that performs the common steps of creating a path from the given filename and datadir and parses the XML file, using the passed parser, if any, and returns the Element at the root of the tree. If parser is not passed, the standard XMLParser should be used.

If the file is not found, or if the parse is unsuccessful (due to XML not being "well formed"), the function should return None. Remember that if a parse is unsuccessful, the etree module raises an exception. That means that you should have a try block, and indented within that block, the parse() invocation should occur. The try block is followed by an except Exception as e: line, and within that, your return None. If no exception is raised, code execution will proceed beyond the try/except block, and that is where you would return the root of the parsed tree.

In [ ]:
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
myparser = etree.XMLParser(remove_blank_text=True)
# Testing cell
wroot = getLocalXML("widombooks.xml", datadir, myparser)
assert len(wroot) == 8
bad2 = getLocalXML("bad.xml", datadir)
assert bad2==None
assert isinstance(wroot, etree._Element)

Q2 Use your function to obtain the root Element from the data directory and the XML file named breakfast.xml, assigning to Python variable broot.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
assert isinstance(broot, etree._Element)
assert len(broot) == 5

Q3 Using the Element broot, find all children with the tag 'food' and store them in a list of Elements called foodlist.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
foodlist
In [ ]:
assert isinstance(foodlist, list)
assert len(foodlist) == 5
assert isinstance(foodlist[0], etree._Element)
assert foodlist[0].tag == 'food'

Q4 Create two parallel lists consisting of the prices and the calories for each of the food elements under menu. You can use your solution to Q3 or can use another method for iterating over the children of the root node. For each, you will access the attributes of the food node and collect the values of the two desired attributes. The final lists should be assigned to prices and calories, respectively. Make sure you do your type conversions so that prices are real-valued and calories are integers.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
assert isinstance(prices, list)
assert len(prices) == 5
assert isinstance(prices[0], float)
assert prices[0] == 5.95
assert prices[-1] == 6.95

assert isinstance(calories, list)
assert len(calories) == 5
assert isinstance(calories[0], int)
assert calories[0] == 650
assert calories[-1] == 950

Q5 Look up the documentation for the iter method. Use this to iterate over all the description-tagged Elements starting from broot and accumulate a list, dlist, with the description text value from these Element nodes.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
dlist
In [ ]:
assert isinstance(dlist, list)
assert len(dlist) == 5
assert isinstance(dlist[0], str)
assert dlist[0].count('plenty') == 1
assert dlist[-1].count(',') == 3

Q6 Assign to wroot the root Element object for the widombooks.xml in the data directory.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
assert isinstance(wroot, etree._Element)
assert len(wroot) == 8

Q7 Using the Element wroot from above, get the attributes of the first child tagged 'Book', and store your answer as a dictionary myAttrib.

In [ ]:
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()

print(myAttrib)
print(type(myAttrib))
In [ ]:
# Testing cell

assert isinstance(myAttrib, etree._Attrib)
assert myAttrib['Price'] == '85'
assert len(myAttrib) == 3

Q8 Using the Element wroot, find all children with the tag 'Book' and store them in a list of Elements called booklist.

In [ ]:
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()
print(booklist)
booklist[0].tag
In [ ]:
# Testing cell
assert isinstance(booklist, list)
assert len(booklist) == 4
assert isinstance(booklist[0], etree._Element)
assert booklist[0].tag == 'Book'

Q9 Using the Element wroot, find all descendent children with the tag 'Magazine', extract the title text from each, and store them in a list of strings called titlelist (one title per magazine in widombooks.xml).

In [ ]:
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()

print(titlelist)
In [ ]:
# Testing cell
assert len(titlelist) == 4
assert "Newsweek" in titlelist
assert "Hector and Jeff's Database Hints" in titlelist

Q10 Write a function

findValue(node, tag)

that, relative to node finds the first subelement matching tag and returns the .text attribute if found, and None, if no match was found.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
assert findValue(wroot, "Supplies") == None
booklist = wroot.findall("Book")
assert isinstance(findValue(booklist[1], "Remark"), str)
assert findValue(booklist[1], "Remark").count("Buy") == 1