Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All).
Make sure you fill in any place that says YOUR CODE HERE
or "YOUR ANSWER HERE", as well as your name and collaborators below:
NAME = ""
COLLABORATORS = ""
The main goals of this inclass activity are
In the reading, you learned about APIs for GitHub and TMDB. With this activity and the homework you'll learn about Kiva loans.
Step 1
The API has changed in the past year to become more complicated, but you can still access data from Kiva using the old API approach. That old approach is decribed at the following link. This link uses the "Way Back Machine", which archives web pages from points back in time:
https://web.archive.org/web/20190629032937/https://build.kiva.org/api
Step 2
Start playing with how to get data from http://api.kivaws.org
API methods can be tested easily with most any browser. As an example, try out the loans/search method using HTML output:
http://api.kivaws.org/v1/loans/search.html?status=fundraising
API calls with the .html extension are designed for testing or debugging. If the browser or tool you are using easily supports viewing XML output you might try using the .xml extension instead:
http://api.kivaws.org/v1/loans/search.xml?status=fundraising
Try changing up some of the parameters and see how the search results change. What URL would you use to access the same data in JSON format?
YOUR ANSWER HERE
From the reading, it should be clear to you that we are using a query parameter corresponding to a Python dictionary {'status':'fundraising'}
. Just like in the reading, we can use &
to provide the URI with multiple query parameters at once. Here are two more that we commonly use (and that come up on your homework):
Both of these parameters will hopefully be familiar to you from times you have used a search engine like Google.
In our example link above, the default is to take you to Page 1 (out of 145 at the time of writing):
http://api.kivaws.org/v1/loans/search.html?status=fundraising
You can see "Page 1 out of 145" in the top line of the results. Note that, by the time you go to this link, it might have changed, if more loans were made and the data source updated correspondingly.
To get the second page of results instead of the first, you can add a query parameter using page
:
http://api.kivaws.org/v1/loans/search.html?status=fundraising&page=2
Note that the first line now says "Page 2 out of 145". You can also change how many results you want to see per page, just like a search engine. For example, to see 100 results per page, you would do:
http://api.kivaws.org/v1/loans/search.html?status=fundraising&per_page=100
Note that when showing 100 results per page, you only need 29 pages to get through all the results, instead of 145. How would you modify this URI to show you the third page, with 50 pages per day?
YOUR ANSWER HERE
Here are some more parameters that the loans/search method can take:
Here's how you'd make a request for all loans in Cambodia or Mongolia that are actively paying back, sorted by the amount of the loan:
http://api.kivaws.org/v1/loans/search.html?country_code=kh,mn&sort;_by=loan_amount&status;=in_repayment
Please come up with three more examples that make use of the first eight parameters above (that is, everything except page
).
YOUR ANSWER HERE
YOUR ANSWER HERE
YOUR ANSWER HERE
In all the examples above, you needed to physically copy and paste the URLs into a web browser to test your results. This is obviously terrible from a computer science perspective.
One step in the right direction (to at least have a reproducible workflow all in one document) is to use curl
. For example, with the first link given above, we have
curl --get --url http://api.kivaws.org/v1/loans/search.html?status=fundraising
We can run this command in Jupyter via the following cell:
%%bash
curl --get --url http://api.kivaws.org/v1/loans/search.html?status=fundraising
Please use three %%bash
or !curl
cells below to run curl
commands for the three links provided above in the examples about the query parameters page=
and per_page=
.
# YOUR CODE HERE
raise NotImplementedError()
# YOUR CODE HERE
raise NotImplementedError()
# YOUR CODE HERE
raise NotImplementedError()
While curl
is fun and powerful, we have not illustrated yet how to actually store the results of a curl
GET command for use in a program. Thankfully, we previously learned how to get results from the web into Python native data structures using the requests
module. Please run the following cell.
import requests
import json
import io
from lxml import etree
We return now to our Kiva loan problem.
Step 3
Write, in a global cell, the programmatic way to get the data as xml and yields the root Element of the xml tree.
url = "http://api.kivaws.org/v1/loans/search.xml"
searchTerms = {'status': 'fundraising'}
resp = requests.get(url, params=searchTerms)
print(resp.status_code)
xmldata = etree.parse(io.BytesIO(resp.content)).getroot()
for child in xmldata:
print(child.tag, child.attrib, child.text)
Step 4 Make the above a function getRootKivas()
with no parameters that returns a Python data structure, or None
, if there was a problem.
# YOUR CODE HERE
raise NotImplementedError()
Step 5 Refine your function to take and use a page parameter, so getRootKivas(p)
gives the results from page p
. Please solve this by creating a URI with a ?
and bringing p
into a format string.
# YOUR CODE HERE
raise NotImplementedError()
r = getRootKivas(42)
r.request.url
5b Now solve the same problem by passing a dictionary to get
. Here is an example. Please write a general function.
kiva_api = 'http://api.kivaws.org/v1/'
endpoint = 'loans/newest.json'
getargs = {'page': 42}
r = req.get(kiva_api + endpoint, params=getargs)
n = json.loads(r.text)
print(n['paging'])
print()
print(n['loans'][0])
# YOUR CODE HERE
raise NotImplementedError()
Step 6 Refine your function further to take and use a per_page parameter, so getRootKivas(p,n)
gives the results from page p
, when each page has n
results.
# YOUR CODE HERE
raise NotImplementedError()
Step 7 Refine your function further to take and use a sector parameter, so getRootKivas(p,n,s)
gives the results from page p
, when each page has n
results, and the sector is s
(e.g., 'Agriculture').
# YOUR CODE HERE
raise NotImplementedError()
# Testing cell
r = getRootKivas(2,10,'agriculture')
n = json.loads(r.text)
print(n['paging'])
Step 8 Refine your function further to take and use theme and status parameters, so getRootKivas(p,n,sec,theme,stat)
gives the results from page p
, when each page has n
results, and the sector is sec
, and the theme is theme
(e.g., 'Higher Eduction') and the status is stat
(e.g. 'funded').
# YOUR CODE HERE
raise NotImplementedError()
# Testing cell
r = getRootKivas(1,10,'agriculture','Higher Education','funded')
n = json.loads(r.text)
print(n['paging'])
Step 9 Refine your function further to take a parameter for the endpoint_type. In all the examples above, it was newest
, but poking around on the Kiva website, search
would have also worked. Think about what other types work.
# YOUR CODE HERE
raise NotImplementedError()
Step 10 Can you make your function even more general? For example, every invocation above seeks data along the path http://api.kivaws.org/v1/loans/
. What else in that path can be modified to be a parameter?
# YOUR CODE HERE
raise NotImplementedError()
In the example above, we assumed the data would come to us in XML form. We now generalize that, then show how to build in query parameters.
Q1 Please write a function
kiva_newest(baseurl, apiobject, method, form = 'json')
that takes four string parameters (where the fourth is optional), builds a correct URL, and executes a requests.get
, returning the result (or None
if there is a problem). The fourth parameter is the format of the data. Please refer to the examples above.
# YOUR CODE HERE
raise NotImplementedError()
# Testing cell
baseurl = "https://api.kivaws.org/v1"
apiobject = "loans"
method = "newest"
resp = kiva_newest(baseurl, apiobject, method)
print(resp.text)
resp = kiva_newest(baseurl, apiobject, method,'json')
print(resp.text)
resp = kiva_newest(baseurl, apiobject, method,'xml')
print(resp.text)
By now we should be familiar with the query parameters for page
and per_page
. Another useful query parameter is ids_only
which can be either True or False. Please take a moment to familiarize yourself with this parameter, e.g., by playing with the following URI:
https://api.kivaws.org/v1/loans/newest.json?page=3&ids_only=true
For more practice (in a different setting than the previous problem), please solve the following. Pay careful attention to the test invocation to understand the parameters.
Q2 Please write a function with five string parameters
kiva_query(result,page,pp,ids_only,endpoint)
that uses endpoint
and result
to create a correct endpoint path for newest loans, then creates a dictionary for the other three parameters suitable for query parameters to go with the Kiva API. Please pass the URI and query parameters to requests.get
, returning the result. Once you have a working version, please think about how to generalize it to make all arguments optional.
# YOUR CODE HERE
raise NotImplementedError()
my_end="https://api.kivaws.org/v1/loans"
r = kiva_query(result='xml', page=5, pp=10, ids_only='false',endpoint=my_end)
r.request.path_url
stripparser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(io.BytesIO(r.content), stripparser)
root = tree.getroot()
root = etree.parse(io.BytesIO(r.content)).getroot()
print(etree.tostring(root, pretty_print=True).decode("utf-8"))
Q3 Please use the result from the previous problem, and XPath, to extract a list sector_list
of sectors
that appear in your query, e.g. "Agriculture", etc.
# YOUR CODE HERE
raise NotImplementedError()
print(sector_list)
Please visit the following link in Google Chrome and use the drop-down menus to select all years from 2002 to 2015
http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php
Now use Dev Tools so you can see what POST is really doing, and write down what you learned.
Q4 Write a function makePostDict(from_year,to_year)
that takes the from year and to year (as strings) and returns a correct dictionary that could be sent with the POST.
# YOUR CODE HERE
raise NotImplementedError()
# Testing cell
D = makePostDict('2004','2011')
print(D)
# Testing cell
endpoint = 'http://httpbin.org/post'
r = requests.post(endpoint, data=D)
r.request.body
Q5 Following what you learned in the book, formulate a curl
POST for the example of 2002 to 2015.
# YOUR CODE HERE
raise NotImplementedError()
Q6 Write a function get_inflation(from_year,to_year)
that uses the requests
module to issue a POST request whose body is obtained via a call to makePostDict
, returning the result of the requests.post
invocation.
# YOUR CODE HERE
raise NotImplementedError()
r = get_inflation('2004','2011')
r.status_code
r.text
r.request.url
r.request.body