Before you turn this problem in, make sure everything runs as expected. This is a combination of restarting the kernel and then running all cells (in the menubar, select Kernel$\rightarrow$Restart And Run All).
Make sure you fill in any place that says YOUR CODE HERE
or "YOUR ANSWER HERE".
In the questions that follow, we are looking for XPath declarative solutions to the problems, not procedural solutions. You will only get 1/2 credit for procedural solutions.
Please begin by importing whatever modules you need, reading in and parsing the relevant datasets, and familiarizing yourself with them.
Q1: Using the provided bookstore.xml
file, create a Python list called "books" containing the titles of all books. Your list books
should be a list of strings.
# Solution cell
books = []
# YOUR CODE HERE
raise NotImplementedError()
print(books)
type(books[0])
assert len(books) > 0 and type(books[0]) is etree._ElementUnicodeResult
assert 'Lover Birds' in books and 'Splish Splash' in books
assert len(books)==12
Q2: Create a list of books ids named less
that cost less than $6
. Note that id
is an attribute.
# Solution cell
less = []
# YOUR CODE HERE
raise NotImplementedError()
less
assert len(less) > 0 and type(less[0]) is etree._ElementUnicodeResult
assert 'bk104' in less
assert 'bk101' not in less
assert len(less)==7
Q3: Create a list of book titles called "eva" where Eva Corets was the author. Your list eva
should be a list of strings.
# Solution cell
eva = []
# YOUR CODE HERE
raise NotImplementedError()
eva
assert len(eva) > 0 and type(eva[0]) is etree._ElementUnicodeResult
assert len(eva)==3
assert 'Maeve Ascendant' in eva
assert 'Paradox Lost' not in eva
Q4: Find the average book price for all books that are not fantasy in this file, assigning to variable avgprice
. Hints First, use XPath to get a list of the price strings (text) based on a single XPath query. Then use a list comprehension to build a list of float
values converting the strings to real-valued numbers. Finally, perform the average based on the values and length of the list.
# Solution cell
avgprice = 0
# YOUR CODE HERE
raise NotImplementedError()
avgprice
assert(avgprice > 23.82)
assert(avgprice < 24)
Q5: Create a list called lessFantasy
containing the titles of the books where the price is under $40
and not in the fantasy genre.
# Solution cell
lessFantasy = []
# YOUR CODE HERE
raise NotImplementedError()
lessFantasy
assert len(lessFantasy)==6
assert 'Paradox Lost' in lessFantasy
assert 'Maeve Ascendant' not in lessFantasy
Q6: Using countries.xml
, generate a list of all the countries in the countries.xml
file, assigning to a variable countries
; then assign the number of countries to the variable countrycount
. When you read in and parse the file, please name the root element croot
.
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()
assert(countrycount == 231)
assert('Uruguay' in countries)
assert type(croot) is etree._Element
Q7: Write a function findPop(root,country)
that finds the population of a given country
in the dataset countries.xml
. Use an XPath expression and a format string. Return your answer as an integer.
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()
# Testing cell
assert findPop(croot,'Cuba') == 10951334
assert findPop(croot,'Uruguay') == 3238952
Q8: Study the countries
data carefully. Then use the position()
function to create a node set consisting of, for countries in positions 5-55 inclusive, the population of the second city listed, if there are at least two cities listed. For example, nothing is in the node set for Aruba (no cities listed) or Armenia (only Yerevan listed), but Cordoba is in the node set thanks to Argentina. Your answer should use a single XPath expression. Please store the results in a list secondPops
of integers.
# YOUR CODE HERE
raise NotImplementedError()
assert len(secondPops) == 6
assert secondPops[0] == 1111811
Q9: With reference to the topnames
dataset, please find all years where there was a count (either gender) that was strictly larger than 50,000. Please navigate to the appropriate attribute, rather than returning a list of elements.
# YOUR CODE HERE
raise NotImplementedError()
assert nodeset[0] == '1915'
assert len(nodeset) == 78
Q10: With reference to the topnames
dataset, please find all years where the top female name had a count that was strictly larger than 50,000. Please navigate to the appropriate attribute, rather than returning a list of elements.
# YOUR CODE HERE
raise NotImplementedError()
assert nodeset[0] == '1915'
assert len(nodeset) == 68