Developing to the GitHub (v3) API

Authorization Code Grant

This Oauth Flow is used when the client/app/consumer of data is separate from the user, and when the app needs to access protected resources maintainded by the Provider on behalf of the user. Note that there is another type of delegated authority OAuth Flow, the implicit code grant. In this flow, the User redirect to the Provider and the subsequent redirect back to the App yields the token directly. This suffers from the lack of authentication between the App and the Provider, but can simplify the process when the Resource Owner and the App are closely aligned.

Like in our final project, we want to "separate concerns" and the end result of this notebook is to get one or more user access tokens where there would be one user access token for each different resource owner supported by this App.

We need to "communicate" between notebooks, so that a token obtained here can be brought into a different notebook (nominally the data acquisition notebook). So we will use a creds.json file that can be both read and updated by this notebook, and would be read by another notebook in order to be able to see the tokens created here. We can also use it to store something like the authorization redirect url, to be able to shortcut its creation.

Become a Registered Developer and Create an App

For GitHub, as for any other enviroment with the separation of the resource owner from the app/client/consumer, the provider must be able to identify/authenticate the app/client/consumer. This involves the human developer creating an account and further defining a particular App. The same developer may, in fact, create multiple distinct Apps, but for now we will assume a single App.

In the case of GitHub, we first logged in as our normal GitHub user. We then followed the directions as posted under GitHub Developer:

https://docs.github.com/en/free-pro-team@latest/developers/apps/creating-an-oauth-app

Summarizing:

When logged into GitHub, in the upper right corner of the window is either your picture, or a small square icon, with a drop down for accessing account stuff.

  1. From that dropdown, select "Settings"
  2. On the Settings page, from the set of choices on the left, select "Developer Settings"
  3. From Developer settings, from the set of choices on the left, select OAuth Apps.
  4. From the Oauth Apps page, in the upper right, click New OAuth App
  5. Fill in:
  6. Then click Register Application

On step 5: Consider what you put here. This is used as part of the OAuth back-and-forth to get an authorization code to the App through the Resource Owner. When, at the end of the authorization process, the User/Owner clicks "Accept" (or the equivalent), this is where the Provider does a redirect in order to communicate the code to the App. We could use something that does not have a running web server, like https://localhost/callback/ because, if the redirect goes to a location and resource that does not actually serve up a web page, the URL that was attempted will include the code, and we will be able to copy and paste it into our credentials and/or notebook. But what we put in here with the Provider, and what we use

Once the application is registered, we are interested in 3 pieces of information.

  1. The Client ID (or App ID or consumer id)
  2. The Client Secret (or App secret or App password)
  3. The Authorization callback URL
    • most often, exactly the same as we specified in registering, although some providers, like eBay, create their own.

These three pieces of information are necessary in downstream operations, so it makes sense to record them in a file (in creds.json), so that this and other notebooks can get at them easily. My creds.json has a top level that is a dictionary, and that dictionary maps from a provider (like "github") to a dictionary. The github dictionary maps from pieces of information that I need (for now, an appid, an appsecret, and a redirect_uri to the character string values generated by the Provider. So for me the creds.json currently looks like this:

{
    "github": {"client_id": "304b9ef49da340d86570", 
               "client_secret": "994ee891af84642471e1a4820b1ba84fe4ce57e3", 
               "redirect_uri": "https://caileighmarshall.github.io/cs181project/"}
}

The idea here is that the creds.json file acts as a kind of database for the application, maintaining information colocated with the App that can be used for storing and reading the info needed for authentication and authorization.

In [1]:
import requests

import os
import os.path
import sys
import importlib
import webbrowser
import base64
from datetime import datetime, timedelta
from requests.auth import HTTPBasicAuth

if os.path.isdir(os.path.join("../../..", "modules")):
    module_dir = os.path.join("../../..", "modules")
else:
    module_dir = os.path.join("../..", "modules")

module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
    sys.path.append(module_path)

import util
importlib.reload(util)
Out[1]:
<module 'util' from '/Users/tcbressoud/Dropbox/cs181-DataSystems/cs181-bressoud/f20_class/modules/util.py'>

Reading the contents of creds.json into a Python dictionary data structure is as we have done before:

In [2]:
creds = util.read_creds("github", file="creds.json")
print(str(creds))
{'client_id': '', 'client_secret': '', 'redirect_uri': 'https://caileighmarshall.github.io/cs181project/'}

Construct the URL to give to the User/Resource Owner

Line 1 of the OAuth Dance

For GitHub, we are using the documentation at

https://docs.github.com/en/free-pro-team@latest/developers/apps/authorizing-oauth-apps for a so-called Web Application Flow to determine the URL, which is Step 1 in the list at that page. Through the documentation, we are able to determine the protocol, location, resource, and parameters to be used in the constructed URL. In particular, for GitHub, we have:

variable value comment
protocol https Given in GET specification
location github.com Given in GET specification
resource /login/oauth/authorize Given in GET specification
client_id creds['client_id'] Based on App registration and entering info in creds.json
redirect_uri creds['redirect_uri] Based on App registration and entering info in creds.json
scopes user%20repo%20read:org Possible values specified at https://developer.github.com/apps/building-oauth-apps/scopes-for-oauth-apps/. This value is just representative, and depends on what the App needs to do. For Github, when we need more than one scope, we space-separate and URL-encode (to get the %20). Many other providers use comma separated lists, and some scopes look like URLs instead of simple strings.
In [3]:
state = util.random_string()
scope = "user repo read:org"
creds['state'] = state
creds['scope'] = scope
util.update_creds("github", creds, file="creds.json")

Build the URL

In [4]:
url = util.buildURL("/login/oauth/authorize", "github.com")
print(url)
https://github.com/login/oauth/authorize

Construct the Query Parameters Dictionary

In [5]:
paramsD = {
    'client_id': creds['client_id'],
    'response_type': 'code',
    'redirect_uri': creds['redirect_uri'],
    'state': creds['state'],
    'scope': creds['scope']
}
In [6]:
session = requests.Session()
req = requests.Request('GET', url, params=paramsD)

prepped = session.prepare_request(req)
user_url = prepped.url
print(user_url)
https://github.com/login/oauth/authorize?client_id=&response_type=code&redirect_uri=https%3A%2F%2Fcaileighmarshall.github.io%2Fcs181project%2F&state=38BQS4JZ&scope=user+repo+read%3Aorg

User Agent/Resource Owner Interacts with Authorization Sever

Line 2 of the OAuth Dance

Given the URL above, an App would redirect the User, working in their web browser, to the Provider's Authorization Server. We can simulate this in a couple of ways:

  • Copy and Paste of this link into a web browser
  • Use a Python function to bring up a web browser tab, using the URL as its address
    • An example of this second approach is shown in the cell below:

After you run the next cell, and you (presumably) approve (i.e. you give delegated authority to the application), you will be redirected to the redirect_uri site. When you get there, copy the long string of characters, called a code and displayed in red, and then come back to this notebook.

In [52]:
webbrowser.open(user_url)
Out[52]:
True

What happens next depends on

  1. Whether the constructed URL is correct,
  2. whether the User is already logged in to the Provider, and
  3. whether or not the Authorization has already been tried before.

If the constructed URL is invalid, the most likely reason is that the redirect_uri given at App registration does not match the one in the constructed URL, and most Providers will summarily reject the URL. There could be other reasons on mismatch between information as well (like the client_id or another URL parameter.

If the user is not already logged in, then

Line 3 of the OAuth Dance

The user authenticates. If we are not saving password and autofilling passwords and if not already logged in, we go through the normal username/password/2 factor authentication between the User and the Provider. This is because the location of the URL is that of the Provider, and the principal entering the address is presumed to be the User/Resource Owner (and not the developer).

Line 4 of the OAuth Dance

Once Authenticated, the Resource Owner is presented, by the Provider, with a screen asking for authorization for the App to have permission for the desired scope. Other details may be given for the user to make an informed choice. If the User has previously granted authorization for this user and for this scope and with this state (i.e. this is a replay of a prior authorization), then the User may not see this screen at all, and be skipped ahead to the next step.

Authorization Server Redirects User's Web Browser to App/Client

Line 5 of the OAuth Dance

The clicking of the Accept (or equivalent button) is like clicking a link at the provider. Like the work done above in this notebook, the Provider is creating a URL that is the so-called redirect provided by the Accept button. It's location is intended to be at the App, and it encodes the information needed by the App, namely the authorization code (code) as well as reflecting the state to increase security.

So the User's browser is given an HTTP URL whose location is based on the redirect_uri provided earlier and whose URL parameters include code and a state keys with their assoicated values.

But our App (this notebook) is not running a web server to process the HTTP GET that would happen based on this URL

The local machine (localhost) would not have a server running on port 80, nor would there be a resource named /callback/ at that location. So the web browser will throw a "This Site cannot be reached" or similar error.

But look at the address line of the browser! It contains the attempted URL. Important to us are the code=<value> and state=<value> URL parameters in that address line.

User/Resource Owner Communcates code and state to the App

Line 6 of the OAuth Dance

A word of caution

For some Providers, their authorization codes may have a limited lifetime. This means that if the App does not authenticate and exchange the code for a token soon enough, the process may need to be repeated. Also note that an issued code is a "one time deal". To prevent replay attacks, the code may only be exchanged for a token a single time.

For a regular application, the redirect_uri would take the resource owner to a web server under the control of the app, which would take the code provided, and put it in a database.

We simulate that with copy-and-paste.

Copy the code between the string delimiters, and then execute the following cell.

In [47]:
code = "6bdaad26d15b1a7dc4e4"

Our Equivalent of conveying, through the user, the code generated by the provider and approved by the user, and storing it in our "database" of our credentials file.

In [48]:
creds['code'] = code
util.update_creds("github", creds, file="creds.json")

App Exchanges Code for a Token

Line 7 and Line 8 of the OAuth Dance

This corresponds, for GitHub, to Step 2 in their OAuth Authorization Options (Web application flow) See https://developer.github.com/apps/building-oauth-apps/authorization-options-for-oauth-apps/.

This is again a time for gathering information from the Provider on the information necessary to make the appropriate GitHub request to get this done. We must determine the HTTP method, protocol, location, resource, and parameters to be used. In particular, for GitHub, we have:

variable value comment
method POST Given in initial specification
protocol https Given in initial specification
location github.com Given in initial specification
resource /login/oauth/access_token Given in initial specification
client_id creds['client_id'] Based on App registration and entering info in creds.json
client_secret creds['client_secret'] Based on App registration and entering info in creds.json
code creds['code'] Saved in creds.json above
state creds['state'] Saved in creds.json above

Build URL

In [7]:
url = util.buildURL("/login/oauth/access_token", "github.com")
url
Out[7]:
'https://github.com/login/oauth/access_token'

Using the documentation, determine WHERE request parameters go in an exchange of code for token

  1. Query parameters for client_id, client_secret, code, and state
  2. Accept header to accept application/json

Construct the Query Parameters Dictionary

In [39]:
paramsD = {
    'client_id': creds['client_id'],
    'client_secret': creds['client_secret'],
    'code': creds['code'],
    'state': creds['state'],
    'scope': 
}

Construct the Header Parameters Dictionariy

In [40]:
headerD = {'Accept': 'application/json'}
In [41]:
resp = requests.post(url, headers=headerD, params=paramsD)
In [42]:
resp.status_code
Out[42]:
200
In [43]:
resp.text
Out[43]:
'{"access_token":"8147be10c75d1be53100e20d714afb40ba5470bc","token_type":"bearer","scope":""}'
In [ ]:
creds['token'] = resp.json["access_token"]
update_creds("github", creds, file="creds.json")