Before you turn this problem in, make sure everything runs as expected. This is a combination of restarting the kernel and then running all cells (in the menubar, select Kernel$\rightarrow$Restart And Run All).
Make sure you fill in any place that says YOUR CODE HERE
or "YOUR ANSWER HERE".
import io
from lxml import etree
import json
import sys
import os.path
import pandas as pd
datadir = "publicdata"
Q1 Consider the following table of subjects
data:
subject | name | department |
---|---|---|
CS | Computer Science | MATH |
MATH | Mathematics | MATH |
ENGL | English Literature | ENGL |
Using a text editor, edit and create a file named subjects.xml
in the current directory that creates a legal XML representation of this data. Once created, write a Python code sequence to read and parse the file, and then, using the technique from this section, print the entire tree. In the penultimate step, you create a Python string to reference the decoded string version of the tree before you print. Name this variable subjects_str
.
# YOUR CODE HERE
raise NotImplementedError()
# Testing Cell
path = os.path.join(".", "subjects.xml")
assert os.path.isfile(path)
assert isinstance(subjects_str, str)
assert 75 < len(subjects_str)
Q2 Now consider the courses
table below. Using a text editor, edit and create courses.xml
that contains a an XML tree representing this table:
subject | coursenum | title |
---|---|---|
CS | 110 | Computing with Digital Media |
CS | 372 | Operating Systems |
MATH | 210 | Proof Techniques |
ENGL | 213 | Early British Literature |
Once created, write a Python code sequence to read and parse the file, and then, using the technique from this section, print the entire tree. Assign the string version of the courses tree to the variable courses_str
.
# YOUR CODE HERE
raise NotImplementedError()
# Testing Cell
path = os.path.join(".", "courses.xml")
assert os.path.isfile(path)
assert isinstance(courses_str, str)
assert 75 < len(courses_str)
Q3 Suppose you wanted a tree that contained both of the above tables. Write a file named school.xml
in the current directory that composes as a single tree both of the above component tables.
As before, once created, write a Python code sequence to read and parse the file, and the print the entire tree. In order to not depend on the correctness of the prior two questions, this problem will be graded manually, so you do not need any particular variable names.
# YOUR CODE HERE
raise NotImplementedError()
Q3 Write a function:
getLocalXML(filename, datadir=".", parser=None)
that performs the common steps of creating a path from the given filename
and datadir
and parses the XML file, using the passed parser
, if any, and returns the Element at the root of the tree. If parser is not passed, the standard XMLParser
should be used.
If the file is not found, or if the parse is unsuccessful (due to XML not being "well formed"), the function should return None
. Remember that if a parse is unsuccessful, the etree
module raises an exception. That means that you should have a try
block, and indented within that block, the parse()
invocation should occur. The try
block is followed by an except Exception as e:
line, and within that, your return None
. If no exception is raised, code execution will proceed beyond the try
/except
block, and that is where you would return the root of the parsed tree.
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()
myparser = etree.XMLParser(remove_blank_text=True)
# Testing cell
wroot = getLocalXML("widombooks.xml", datadir, myparser)
assert len(wroot) == 8
bad = getLocalXML("foo.xml", datadir, myparser)
assert bad == None
bad2 = getLocalXML("bad.xml", datadir)
assert bad2==None