Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [ ]:
NAME = ""
COLLABORATORS = ""

Chapter 23 - first half

The main goals of this inclass activity are

  • get practice with curl in the terminal
  • learn how to use curl in an ipynb setting, with %%bash
  • practice with learning about APIs
  • practice with GET requests and file extensions
  • practice with POST

In the reading, you learned about APIs for GitHub and TMDB. With this activity and the homework you'll learn about Kiva loans.

Understanding Paginated Results

Step 1

  1. Read the Terms of Use Agreement: https://www.kiva.org/legal/terms
  2. Read the Code of Conduct: https://www.kiva.org/build/code-of-conduct
  3. Study the API: we've given you the info you need to avoid this in the current worksheet

The API has changed in the past year to become more complicated, but you can still access data from Kiva using the old API approach. That old approach is decribed at the following link. This link uses the "Way Back Machine", which archives web pages from points back in time:

https://web.archive.org/web/20190629032937/https://build.kiva.org/api

Step 2

Start playing with how to get data from http://api.kivaws.org

API methods can be tested easily with most any browser. As an example, try out the loans/search method using HTML output:

http://api.kivaws.org/v1/loans/search.html?status=fundraising

API calls with the .html extension are designed for testing or debugging. If the browser or tool you are using easily supports viewing XML output you might try using the .xml extension instead:

http://api.kivaws.org/v1/loans/search.xml?status=fundraising

Try changing up some of the parameters and see how the search results change. What URL would you use to access the same data in JSON format?

YOUR ANSWER HERE

From the reading, it should be clear to you that we are using a query parameter corresponding to a Python dictionary {'status':'fundraising'}. Just like in the reading, we can use & to provide the URI with multiple query parameters at once. Here are two more that we commonly use (and that come up on your homework):

  • page
    • A number for the page of data to return (results are segmented into pages).
  • per_page
    • A number telling how many results per page you want to see

Both of these parameters will hopefully be familiar to you from times you have used a search engine like Google.

In our example link above, the default is to take you to Page 1 (out of 145 at the time of writing):

http://api.kivaws.org/v1/loans/search.html?status=fundraising

You can see "Page 1 out of 145" in the top line of the results. Note that, by the time you go to this link, it might have changed, if more loans were made and the data source updated correspondingly.

To get the second page of results instead of the first, you can add a query parameter using page:

http://api.kivaws.org/v1/loans/search.html?status=fundraising&page=2

Note that the first line now says "Page 2 out of 145". You can also change how many results you want to see per page, just like a search engine. For example, to see 100 results per page, you would do:

http://api.kivaws.org/v1/loans/search.html?status=fundraising&per_page=100

Note that when showing 100 results per page, you only need 29 pages to get through all the results, instead of 145. How would you modify this URI to show you the third page, with 50 pages per day?

YOUR ANSWER HERE

Here are some more parameters that the loans/search method can take:

  • status
    • Any of: fundraising,funded,in_repayment,paid,defaulted
  • gender
    • Any of: male,female
  • sector
    • Matches against a sector name such as agriculture
  • region
    • Any of: na,ca,sa,af,as,me,ee
  • country_code
    • Matches a two-digit ISO country code.
  • partner
    • Matches one or more partner IDs.
  • q
    • A general search string to match against various properties of loans
  • sort_by
    • Any of: popularity,loan_amount,oldest,expiration,newest, amount_remaining,repayment_term

Here's how you'd make a request for all loans in Cambodia or Mongolia that are actively paying back, sorted by the amount of the loan:

http://api.kivaws.org/v1/loans/search.html?country_code=kh,mn&sort;_by=loan_amount&status;=in_repayment

Please come up with three more examples that make use of the first eight parameters above (that is, everything except page).

YOUR ANSWER HERE

YOUR ANSWER HERE

YOUR ANSWER HERE

Curl

In all the examples above, you needed to physically copy and paste the URLs into a web browser to test your results. This is obviously terrible from a computer science perspective.

One step in the right direction (to at least have a reproducible workflow all in one document) is to use curl. For example, with the first link given above, we have

curl --get --url http://api.kivaws.org/v1/loans/search.html?status=fundraising

We can run this command in Jupyter via the following cell:

In [ ]:
%%bash

curl --get --url http://api.kivaws.org/v1/loans/search.html?status=fundraising

Please use three %%bash or !curl cells below to run curl commands for the three links provided above in the examples about the query parameters page= and per_page=.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Programmatic

While curl is fun and powerful, we have not illustrated yet how to actually store the results of a curl GET command for use in a program. Thankfully, we previously learned how to get results from the web into Python native data structures using the requests module. Please run the following cell.

In [ ]:
import requests
import json
import io
from lxml import etree

We return now to our Kiva loan problem.

Step 3

Write, in a global cell, the programmatic way to get the data as xml and yields the root Element of the xml tree.

In [ ]:
url = "http://api.kivaws.org/v1/loans/search.xml"
searchTerms = {'status': 'fundraising'}
resp = requests.get(url, params=searchTerms)
print(resp.status_code)
In [ ]:
xmldata = etree.parse(io.BytesIO(resp.content)).getroot()
In [ ]:
for child in xmldata:
    print(child.tag, child.attrib, child.text)

Step 4 Make the above a function getRootKivas() with no parameters that returns a Python data structure, or None, if there was a problem.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Step 5 Refine your function to take and use a page parameter, so getRootKivas(p) gives the results from page p. Please solve this by creating a URI with a ? and bringing p into a format string.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
r = getRootKivas(42)
r.request.url

5b Now solve the same problem by passing a dictionary to get. Here is an example. Please write a general function.

In [ ]:
kiva_api = 'http://api.kivaws.org/v1/'
endpoint = 'loans/newest.json'
getargs = {'page': 42}
r = req.get(kiva_api + endpoint, params=getargs)
n = json.loads(r.text)
print(n['paging'])
print()
print(n['loans'][0])
In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Step 6 Refine your function further to take and use a per_page parameter, so getRootKivas(p,n) gives the results from page p, when each page has n results.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Step 7 Refine your function further to take and use a sector parameter, so getRootKivas(p,n,s) gives the results from page p, when each page has n results, and the sector is s (e.g., 'Agriculture').

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Testing cell
r = getRootKivas(2,10,'agriculture')
n = json.loads(r.text)
print(n['paging'])

Step 8 Refine your function further to take and use theme and status parameters, so getRootKivas(p,n,sec,theme,stat) gives the results from page p, when each page has n results, and the sector is sec, and the theme is theme (e.g., 'Higher Eduction') and the status is stat (e.g. 'funded').

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Testing cell
r = getRootKivas(1,10,'agriculture','Higher Education','funded')
n = json.loads(r.text)
print(n['paging'])

Step 9 Refine your function further to take a parameter for the endpoint_type. In all the examples above, it was newest, but poking around on the Kiva website, search would have also worked. Think about what other types work.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Step 10 Can you make your function even more general? For example, every invocation above seeks data along the path http://api.kivaws.org/v1/loans/. What else in that path can be modified to be a parameter?

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

A more in-depth example

In the example above, we assumed the data would come to us in XML form. We now generalize that, then show how to build in query parameters.

Q1 Please write a function

kiva_newest(baseurl, apiobject, method, form = 'json')

that takes four string parameters (where the fourth is optional), builds a correct URL, and executes a requests.get, returning the result (or None if there is a problem). The fourth parameter is the format of the data. Please refer to the examples above.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Testing cell

baseurl = "https://api.kivaws.org/v1"
apiobject = "loans"
method = "newest"

resp = kiva_newest(baseurl, apiobject, method)
print(resp.text)
In [ ]:
resp = kiva_newest(baseurl, apiobject, method,'json')
print(resp.text)
In [ ]:
resp = kiva_newest(baseurl, apiobject, method,'xml')
print(resp.text)

By now we should be familiar with the query parameters for page and per_page. Another useful query parameter is ids_only which can be either True or False. Please take a moment to familiarize yourself with this parameter, e.g., by playing with the following URI:

https://api.kivaws.org/v1/loans/newest.json?page=3&ids_only=true

For more practice (in a different setting than the previous problem), please solve the following. Pay careful attention to the test invocation to understand the parameters.

Q2 Please write a function with five string parameters

kiva_query(result,page,pp,ids_only,endpoint)

that uses endpoint and result to create a correct endpoint path for newest loans, then creates a dictionary for the other three parameters suitable for query parameters to go with the Kiva API. Please pass the URI and query parameters to requests.get, returning the result. Once you have a working version, please think about how to generalize it to make all arguments optional.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
my_end="https://api.kivaws.org/v1/loans"

r = kiva_query(result='xml', page=5, pp=10, ids_only='false',endpoint=my_end)
r.request.path_url
In [ ]:
stripparser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(io.BytesIO(r.content), stripparser)
root = tree.getroot()
In [ ]:
root = etree.parse(io.BytesIO(r.content)).getroot()
In [ ]:
print(etree.tostring(root, pretty_print=True).decode("utf-8"))

Q3 Please use the result from the previous problem, and XPath, to extract a list sector_list of sectors that appear in your query, e.g. "Agriculture", etc.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
print(sector_list)

Practice with POST

Please visit the following link in Google Chrome and use the drop-down menus to select all years from 2002 to 2015

http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php

Now use Dev Tools so you can see what POST is really doing, and write down what you learned.

Q4 Write a function makePostDict(from_year,to_year) that takes the from year and to year (as strings) and returns a correct dictionary that could be sent with the POST.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# Testing cell
D = makePostDict('2004','2011')
print(D)
In [ ]:
# Testing cell
endpoint = 'http://httpbin.org/post'
r = requests.post(endpoint, data=D)
r.request.body

Q5 Following what you learned in the book, formulate a curl POST for the example of 2002 to 2015.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Q6 Write a function get_inflation(from_year,to_year) that uses the requests module to issue a POST request whose body is obtained via a call to makePostDict, returning the result of the requests.post invocation.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
r = get_inflation('2004','2011')
r.status_code
r.text
r.request.url
r.request.body
In [ ]: