{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Developing to the GitHub (v3) API\n", "\n", "## Authorization Code Grant\n", "\n", "This **Oauth Flow** is used when the client/app/consumer of data is **separate** from the user, and when the app needs to access protected resources maintainded by the Provider on behalf of the user. Note that there is another type of delegated authority OAuth Flow, the **implicit code grant**. In this flow, the User redirect to the Provider and the subsequent redirect back to the App yields the token directly. This suffers from the lack of authentication between the App and the Provider, but can simplify the process when the Resource Owner and the App are closely aligned.\n", "\n", "Like in our final project, we want to \"separate concerns\" and the end result of this notebook is to get one or more **user access tokens** where there would be one user access token for each different resource owner supported by this App.\n", "\n", "We need to \"communicate\" between notebooks, so that a token obtained here can be brought into a different notebook (nominally the data acquisition notebook). So we will use a `creds.json` file that can be both read and updated by this notebook, and would be read by another notebook in order to be able to see the tokens created here. We can also use it to store something like the authorization redirect url, to be able to shortcut its creation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Become a Registered Developer and Create an App\n", "\n", "For GitHub, as for any other enviroment with the separation of the resource owner from the app/client/consumer, the provider must be able to **identify/authenticate** the app/client/consumer. This involves the human developer creating an account and further defining a particular App. The same developer may, in fact, create multiple distinct Apps, but for now we will assume a single App." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the case of GitHub, we first logged in as our normal GitHub user. We then followed the directions as posted under GitHub Developer:\n", "\n", "https://docs.github.com/en/free-pro-team@latest/developers/apps/creating-an-oauth-app\n", "\n", "Summarizing:\n", "\n", "When logged into GitHub, in the upper right corner of the window is either your picture, or a small square icon, with a drop down for accessing account stuff.\n", "\n", "1. From that dropdown, select \"Settings\"\n", "2. On the Settings page, from the set of choices on the left, select \"Developer Settings\"\n", "3. From Developer settings, from the set of choices on the left, select OAuth Apps.\n", "4. From the Oauth Apps page, in the upper right, click New OAuth App\n", "5. Fill in:\n", " - Application name: whatever you want\n", " - Homepage URL: https://personal.denison.edu/~bressoud/datasystems (or some other valid URL\n", " - Application callback URL: https://caileighmarshall.github.io/cs181project/\n", "6. Then click Register Application\n", "\n", "On step 5: Consider what you put here. This is used as part of the OAuth back-and-forth to get an authorization code to the App **through** the Resource Owner. When, at the end of the authorization process, the User/Owner clicks \"Accept\" (or the equivalent), this is where the Provider does a redirect in order to communicate the **code** to the App. We could use something that does not have a running web server, like `https://localhost/callback/` because, if the redirect goes to a location and resource that does not actually serve up a web page, the URL that was attempted will *include* the code, and we will be able to copy and paste it into our credentials and/or notebook. But what we put in here with the Provider, and what we use\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the application is registered, we are interested in 3 pieces of information.\n", "1. The Client ID (or App ID or consumer id)\n", "2. The Client Secret (or App secret or App password)\n", "3. The Authorization callback URL\n", " - most often, exactly the same as we specified in registering, although some providers, like eBay, create their own.\n", " \n", "These three pieces of information are necessary in downstream operations, so it makes sense to record them in a file (in `creds.json`), so that this and other notebooks can get at them easily. My `creds.json` has a top level that is a dictionary, and that dictionary maps from a provider (like `\"github\"`) to a dictionary. The github dictionary maps from pieces of information that I need (for now, an `appid`, an `appsecret`, and a `redirect_uri` to the character string values generated by the Provider. So for me the `creds.json` currently looks like this:\n", "\n", "```json\n", "{\n", " \"github\": {\"client_id\": \"304b9ef49da340d86570\", \n", " \"client_secret\": \"994ee891af84642471e1a4820b1ba84fe4ce57e3\", \n", " \"redirect_uri\": \"https://caileighmarshall.github.io/cs181project/\"}\n", "}\n", "```\n", "The idea here is that the creds.json file acts as a kind of database for the application, maintaining information colocated with the App that can be used for storing and reading the info needed for authentication and authorization." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "nbgrader": { "grade": false, "grade_id": "cell-b9abcf27cf7faf8f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import requests\n", "\n", "import os\n", "import os.path\n", "import sys\n", "import importlib\n", "import webbrowser\n", "import base64\n", "from datetime import datetime, timedelta\n", "from requests.auth import HTTPBasicAuth\n", "\n", "if os.path.isdir(os.path.join(\"../../..\", \"modules\")):\n", " module_dir = os.path.join(\"../../..\", \"modules\")\n", "else:\n", " module_dir = os.path.join(\"../..\", \"modules\")\n", "\n", "module_path = os.path.abspath(module_dir)\n", "if not module_path in sys.path:\n", " sys.path.append(module_path)\n", "\n", "import util\n", "importlib.reload(util)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reading the contents of `creds.json` into a Python dictionary data structure is as we have done before:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'client_id': '', 'client_secret': '', 'redirect_uri': 'https://caileighmarshall.github.io/cs181project/'}\n" ] } ], "source": [ "creds = util.read_creds(\"github\", file=\"creds.json\")\n", "print(str(creds))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Construct the URL to give to the User/Resource Owner\n", "\n", "> **Line 1** of the OAuth Dance\n", "\n", "For GitHub, we are using the documentation at\n", "\n", "https://docs.github.com/en/free-pro-team@latest/developers/apps/authorizing-oauth-apps\n", "for a so-called **Web Application Flow** to determine the URL, which is Step 1 in the list at that page. Through the documentation, we are able to determine the protocol, location, resource, and parameters to be used in the constructed URL. In particular, for GitHub, we have:\n", "\n", "variable | value | comment\n", "---------|-------|:--------\n", "protocol | `https` | Given in GET specification\n", "location | `github.com` | Given in GET specification\n", "resource | `/login/oauth/authorize` | Given in GET specification\n", "client_id | `creds['client_id']` | Based on App registration and entering info in `creds.json`\n", "redirect_uri | `creds['redirect_uri]` | Based on App registration and entering info in `creds.json`\n", "scopes | `user%20repo%20read:org` | Possible values specified at https://developer.github.com/apps/building-oauth-apps/scopes-for-oauth-apps/. This value is just representative, and depends on what the App needs to do. For Github, when we need more than one scope, we space-separate and URL-encode (to get the `%20`). Many other providers use comma separated lists, and some scopes look like URLs instead of simple strings.\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "state = util.random_string()\n", "scope = \"user repo read:org\"\n", "creds['state'] = state\n", "creds['scope'] = scope\n", "util.update_creds(\"github\", creds, file=\"creds.json\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Build the URL**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://github.com/login/oauth/authorize\n" ] } ], "source": [ "url = util.buildURL(\"/login/oauth/authorize\", \"github.com\")\n", "print(url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Construct the Query Parameters Dictionary**" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "paramsD = {\n", " 'client_id': creds['client_id'],\n", " 'response_type': 'code',\n", " 'redirect_uri': creds['redirect_uri'],\n", " 'state': creds['state'],\n", " 'scope': creds['scope']\n", "}" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://github.com/login/oauth/authorize?client_id=&response_type=code&redirect_uri=https%3A%2F%2Fcaileighmarshall.github.io%2Fcs181project%2F&state=38BQS4JZ&scope=user+repo+read%3Aorg\n" ] } ], "source": [ "session = requests.Session()\n", "req = requests.Request('GET', url, params=paramsD)\n", "\n", "prepped = session.prepare_request(req)\n", "user_url = prepped.url\n", "print(user_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### User Agent/Resource Owner Interacts with Authorization Sever\n", "\n", "> **Line 2** of the OAuth Dance\n", "\n", "Given the URL above, an App would redirect the User, working in their web browser, to the Provider's Authorization Server. We can simulate this in a couple of ways:\n", "- Copy and Paste of this link into a web browser\n", "- Use a Python function to bring up a web browser tab, using the URL as its address\n", " - An example of this second approach is shown in the cell below:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> After you run the next cell, and you (presumably) approve (i.e. you give delegated authority to the application), you will be redirected to the `redirect_uri` site. When you get there, copy the long string of characters, called a `code` and displayed in red, and then come back to this notebook." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "webbrowser.open(user_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happens next depends on \n", "1. Whether the constructed URL is correct, \n", "2. whether the User is already logged in to the Provider, and \n", "3. whether or not the Authorization has already been tried before. \n", "\n", "If the constructed URL is invalid, the most likely reason is that the `redirect_uri` given at App registration does not match the one in the constructed URL, and most Providers will summarily reject the URL. There could be other reasons on mismatch between information as well (like the `client_id` or another URL parameter.\n", "\n", "If the user is **not** already logged in, then\n", "\n", "> **Line 3** of the OAuth Dance\n", "\n", "The user authenticates. If we are not saving password and autofilling passwords and if not already logged in, we go through the normal username/password/2 factor authentication between the User and the Provider. This is because the location of the URL is that of the Provider, and the principal entering the address is presumed to be the User/Resource Owner (and not the developer).\n", "\n", "> **Line 4** of the OAuth Dance\n", "\n", "Once Authenticated, the Resource Owner is presented, by the Provider, with a screen asking for authorization for the **App** to have permission for the desired **scope**. Other details may be given for the user to make an informed choice. If the User has **previously** granted authorization for this user and for this scope and with this state (i.e. this is a replay of a prior authorization), then the User may not see this screen at all, and be skipped ahead to the next step." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Authorization Server Redirects User's Web Browser to App/Client\n", "\n", "> **Line 5** of the OAuth Dance\n", "\n", "The clicking of the **Accept** (or equivalent button) is like clicking a link at the provider. Like the work done above in this notebook, the Provider is **creating** a URL that is the so-called redirect provided by the Accept button. It's location is intended to be **at the App**, and it encodes the information needed by the App, namely the **authorization code** (**code**) as well as reflecting the **state** to increase security.\n", "\n", "So the User's browser is given an HTTP URL whose location is based on the `redirect_uri` provided earlier and whose URL parameters include `code` and a `state` keys with their assoicated values.\n", "\n", "**But our App (this notebook) is not running a web server to process the HTTP GET that would happen based on this URL**\n", "\n", "The local machine (localhost) would not have a server running on port 80, nor would there be a resource named `/callback/` at that location. So the web browser will throw a \"This Site cannot be reached\" or similar error.\n", "\n", "But look at the address line of the browser! It contains the attempted URL. Important to us are the `code=` and `state=` URL parameters in that address line." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### User/Resource Owner Communcates `code` and `state` to the App\n", "\n", "> **Line 6** of the OAuth Dance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**A word of caution**\n", "\n", "For some Providers, their authorization codes may have a limited lifetime. This means that if the App does not authenticate and exchange the code for a token soon enough, the process may need to be repeated. Also note that an issued code is a \"one time deal\". To prevent replay attacks, the code may only be exchanged for a token a single time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a regular application, the `redirect_uri` would take the resource owner to a web server under the control of the app, which would take the `code` provided, and put it in a database.\n", "\n", "We simulate that with copy-and-paste.\n", "\n", "Copy the code between the string delimiters, and then execute the following cell." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "code = \"6bdaad26d15b1a7dc4e4\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our Equivalent of conveying, through the user, the code generated by the provider and approved by the user, and storing it in our \"database\" of our credentials file." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "creds['code'] = code\n", "util.update_creds(\"github\", creds, file=\"creds.json\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### App Exchanges Code for a Token\n", "\n", "> **Line 7** and **Line 8** of the OAuth Dance\n", "\n", "This corresponds, for GitHub, to Step 2 in their OAuth Authorization Options (Web application flow) See https://developer.github.com/apps/building-oauth-apps/authorization-options-for-oauth-apps/.\n", "\n", "This is again a time for gathering information from the Provider on the information necessary to make the appropriate GitHub request to get this done. We must determine the HTTP method, protocol, location, resource, and parameters to be used. In particular, for GitHub, we have:\n", "\n", "variable | value | comment\n", "---------|-------|:--------\n", "method | `POST` | Given in initial specification\n", "protocol | `https` | Given in initial specification\n", "location | `github.com` | Given in initial specification\n", "resource | `/login/oauth/access_token` | Given in initial specification\n", "client_id | `creds['client_id']` | Based on App registration and entering info in `creds.json`\n", "client_secret | `creds['client_secret']` | Based on App registration and entering info in `creds.json`\n", "code | `creds['code']` | Saved in creds.json above\n", "state | `creds['state']` | Saved in creds.json above" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Build URL**" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'https://github.com/login/oauth/access_token'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "url = util.buildURL(\"/login/oauth/access_token\", \"github.com\")\n", "url" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Using the documentation, determine WHERE request parameters go in an exchange of code for token**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Query parameters for client_id, client_secret, code, and state\n", "2. Accept header to accept application/json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Construct the Query Parameters Dictionary**" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "paramsD = {\n", " 'client_id': creds['client_id'],\n", " 'client_secret': creds['client_secret'],\n", " 'code': creds['code'],\n", " 'state': creds['state'],\n", " 'scope': \n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Construct the Header Parameters Dictionariy**" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "headerD = {'Accept': 'application/json'}" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "resp = requests.post(url, headers=headerD, params=paramsD)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "200" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "resp.status_code" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'{\"access_token\":\"8147be10c75d1be53100e20d714afb40ba5470bc\",\"token_type\":\"bearer\",\"scope\":\"\"}'" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "resp.text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "creds['token'] = resp.json[\"access_token\"]\n", "update_creds(\"github\", creds, file=\"creds.json\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }