This Oauth Flow is used when the client/app/consumer of data is separate from the user, and when the app needs to access protected resources maintainded by the Provider on behalf of the user. Note that there is another type of delegated authority OAuth Flow, the implicit code grant. In this flow, the User redirect to the Provider and the subsequent redirect back to the App yields the token directly. This suffers from the lack of authentication between the App and the Provider, but can simplify the process when the Resource Owner and the App are closely aligned.
Like in our final project, we want to "separate concerns" and the end result of this notebook is to get one or more user access tokens where there would be one user access token for each different resource owner supported by this App.
We need to "communicate" between notebooks, so that a token obtained here can be brought into a different notebook (nominally the data acquisition notebook). So we will use a creds.json
file that can be both read and updated by this notebook, and would be read by another notebook in order to be able to see the tokens created here. We can also use it to store something like the authorization redirect url, to be able to shortcut its creation.
For GitHub, as for any other enviroment with the separation of the resource owner from the app/client/consumer, the provider must be able to identify/authenticate the app/client/consumer. This involves the human developer creating an account and further defining a particular App. The same developer may, in fact, create multiple distinct Apps, but for now we will assume a single App.
In the case of GitHub, we first logged in as our normal GitHub user. We then followed the directions as posted under GitHub Developer:
https://docs.github.com/en/free-pro-team@latest/developers/apps/creating-an-oauth-app
Summarizing:
When logged into GitHub, in the upper right corner of the window is either your picture, or a small square icon, with a drop down for accessing account stuff.
On step 5: Consider what you put here. This is used as part of the OAuth back-and-forth to get an authorization code to the App through the Resource Owner. When, at the end of the authorization process, the User/Owner clicks "Accept" (or the equivalent), this is where the Provider does a redirect in order to communicate the code to the App. We could use something that does not have a running web server, like https://localhost/callback/
because, if the redirect goes to a location and resource that does not actually serve up a web page, the URL that was attempted will include the code, and we will be able to copy and paste it into our credentials and/or notebook. But what we put in here with the Provider, and what we use
Once the application is registered, we are interested in 3 pieces of information.
These three pieces of information are necessary in downstream operations, so it makes sense to record them in a file (in creds.json
), so that this and other notebooks can get at them easily. My creds.json
has a top level that is a dictionary, and that dictionary maps from a provider (like "github"
) to a dictionary. The github dictionary maps from pieces of information that I need (for now, an appid
, an appsecret
, and a redirect_uri
to the character string values generated by the Provider. So for me the creds.json
currently looks like this:
{
"github": {"client_id": "304b9ef49da340d86570",
"client_secret": "994ee891af84642471e1a4820b1ba84fe4ce57e3",
"redirect_uri": "https://caileighmarshall.github.io/cs181project/"}
}
The idea here is that the creds.json file acts as a kind of database for the application, maintaining information colocated with the App that can be used for storing and reading the info needed for authentication and authorization.
import requests
import os
import os.path
import sys
import importlib
import webbrowser
import base64
from datetime import datetime, timedelta
from requests.auth import HTTPBasicAuth
if os.path.isdir(os.path.join("../../..", "modules")):
module_dir = os.path.join("../../..", "modules")
else:
module_dir = os.path.join("../..", "modules")
module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
sys.path.append(module_path)
import util
importlib.reload(util)
Reading the contents of creds.json
into a Python dictionary data structure is as we have done before:
creds = util.read_creds("github", file="creds.json")
print(str(creds))
Line 1 of the OAuth Dance
For GitHub, we are using the documentation at
https://docs.github.com/en/free-pro-team@latest/developers/apps/authorizing-oauth-apps for a so-called Web Application Flow to determine the URL, which is Step 1 in the list at that page. Through the documentation, we are able to determine the protocol, location, resource, and parameters to be used in the constructed URL. In particular, for GitHub, we have:
variable | value | comment |
---|---|---|
protocol | https |
Given in GET specification |
location | github.com |
Given in GET specification |
resource | /login/oauth/authorize |
Given in GET specification |
client_id | creds['client_id'] |
Based on App registration and entering info in creds.json |
redirect_uri | creds['redirect_uri] |
Based on App registration and entering info in creds.json |
scopes | user%20repo%20read:org |
Possible values specified at https://developer.github.com/apps/building-oauth-apps/scopes-for-oauth-apps/. This value is just representative, and depends on what the App needs to do. For Github, when we need more than one scope, we space-separate and URL-encode (to get the %20 ). Many other providers use comma separated lists, and some scopes look like URLs instead of simple strings. |
state = util.random_string()
scope = "user repo read:org"
creds['state'] = state
creds['scope'] = scope
util.update_creds("github", creds, file="creds.json")
Build the URL
url = util.buildURL("/login/oauth/authorize", "github.com")
print(url)
Construct the Query Parameters Dictionary
paramsD = {
'client_id': creds['client_id'],
'response_type': 'code',
'redirect_uri': creds['redirect_uri'],
'state': creds['state'],
'scope': creds['scope']
}
session = requests.Session()
req = requests.Request('GET', url, params=paramsD)
prepped = session.prepare_request(req)
user_url = prepped.url
print(user_url)
Line 2 of the OAuth Dance
Given the URL above, an App would redirect the User, working in their web browser, to the Provider's Authorization Server. We can simulate this in a couple of ways:
After you run the next cell, and you (presumably) approve (i.e. you give delegated authority to the application), you will be redirected to the
redirect_uri
site. When you get there, copy the long string of characters, called acode
and displayed in red, and then come back to this notebook.
webbrowser.open(user_url)
What happens next depends on
If the constructed URL is invalid, the most likely reason is that the redirect_uri
given at App registration does not match the one in the constructed URL, and most Providers will summarily reject the URL. There could be other reasons on mismatch between information as well (like the client_id
or another URL parameter.
If the user is not already logged in, then
Line 3 of the OAuth Dance
The user authenticates. If we are not saving password and autofilling passwords and if not already logged in, we go through the normal username/password/2 factor authentication between the User and the Provider. This is because the location of the URL is that of the Provider, and the principal entering the address is presumed to be the User/Resource Owner (and not the developer).
Line 4 of the OAuth Dance
Once Authenticated, the Resource Owner is presented, by the Provider, with a screen asking for authorization for the App to have permission for the desired scope. Other details may be given for the user to make an informed choice. If the User has previously granted authorization for this user and for this scope and with this state (i.e. this is a replay of a prior authorization), then the User may not see this screen at all, and be skipped ahead to the next step.
Line 5 of the OAuth Dance
The clicking of the Accept (or equivalent button) is like clicking a link at the provider. Like the work done above in this notebook, the Provider is creating a URL that is the so-called redirect provided by the Accept button. It's location is intended to be at the App, and it encodes the information needed by the App, namely the authorization code (code) as well as reflecting the state to increase security.
So the User's browser is given an HTTP URL whose location is based on the redirect_uri
provided earlier and whose URL parameters include code
and a state
keys with their assoicated values.
But our App (this notebook) is not running a web server to process the HTTP GET that would happen based on this URL
The local machine (localhost) would not have a server running on port 80, nor would there be a resource named /callback/
at that location. So the web browser will throw a "This Site cannot be reached" or similar error.
But look at the address line of the browser! It contains the attempted URL. Important to us are the code=<value>
and state=<value>
URL parameters in that address line.
code
and state
to the App¶Line 6 of the OAuth Dance
A word of caution
For some Providers, their authorization codes may have a limited lifetime. This means that if the App does not authenticate and exchange the code for a token soon enough, the process may need to be repeated. Also note that an issued code is a "one time deal". To prevent replay attacks, the code may only be exchanged for a token a single time.
For a regular application, the redirect_uri
would take the resource owner to a web server under the control of the app, which would take the code
provided, and put it in a database.
We simulate that with copy-and-paste.
Copy the code between the string delimiters, and then execute the following cell.
code = "6bdaad26d15b1a7dc4e4"
Our Equivalent of conveying, through the user, the code generated by the provider and approved by the user, and storing it in our "database" of our credentials file.
creds['code'] = code
util.update_creds("github", creds, file="creds.json")
Line 7 and Line 8 of the OAuth Dance
This corresponds, for GitHub, to Step 2 in their OAuth Authorization Options (Web application flow) See https://developer.github.com/apps/building-oauth-apps/authorization-options-for-oauth-apps/.
This is again a time for gathering information from the Provider on the information necessary to make the appropriate GitHub request to get this done. We must determine the HTTP method, protocol, location, resource, and parameters to be used. In particular, for GitHub, we have:
variable | value | comment |
---|---|---|
method | POST |
Given in initial specification |
protocol | https |
Given in initial specification |
location | github.com |
Given in initial specification |
resource | /login/oauth/access_token |
Given in initial specification |
client_id | creds['client_id'] |
Based on App registration and entering info in creds.json |
client_secret | creds['client_secret'] |
Based on App registration and entering info in creds.json |
code | creds['code'] |
Saved in creds.json above |
state | creds['state'] |
Saved in creds.json above |
Build URL
url = util.buildURL("/login/oauth/access_token", "github.com")
url
Using the documentation, determine WHERE request parameters go in an exchange of code for token
Construct the Query Parameters Dictionary
paramsD = {
'client_id': creds['client_id'],
'client_secret': creds['client_secret'],
'code': creds['code'],
'state': creds['state'],
'scope':
}
Construct the Header Parameters Dictionariy
headerD = {'Accept': 'application/json'}
resp = requests.post(url, headers=headerD, params=paramsD)
resp.status_code
resp.text
creds['token'] = resp.json["access_token"]
update_creds("github", creds, file="creds.json")