{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import os\n", "import os.path\n", "import json" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def getmysql_creds(dirname=\".\",filename=\"creds.json\"):\n", " \"\"\" Using directory and filename parameters, open a credentials file\n", " and obtain the five parts needed for a connection string to\n", " a remote provider using the \"mysql\" dictionary within\n", " an outer dictionary. \n", " \n", " Return a scheme, server, user, password, and database\n", " \"\"\"\n", " assert os.path.isfile(os.path.join(dirname, filename))\n", " with open(os.path.join(dirname, filename)) as f:\n", " D = json.load(f)\n", " mysql = D[\"mysql\"]\n", " database = None\n", " if \"database\" in mysql:\n", " database = mysql[\"database\"]\n", " return mysql[\"scheme\"], mysql[\"server\"], mysql[\"user\"], mysql[\"pass\"], database\n", "\n", "def getsqlite_creds(dirname=\".\",filename=\"creds.json\"):\n", " \"\"\" Using directory and filename parameters, open a credentials file\n", " and obtain the two parts needed for a connection string to\n", " a local provider using the \"sqlite\" dictionary within\n", " an outer dictionary. \n", " \n", " Return a scheme and a dbfile\n", " \"\"\"\n", " assert os.path.isfile(os.path.join(dirname, filename))\n", " with open(os.path.join(dirname, filename)) as f:\n", " D = json.load(f)\n", " sqlite = D[\"sqlite\"]\n", " return sqlite[\"scheme\"], sqlite[\"dbdir\"], sqlite[\"database\"]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get Independent Client(s)\n", "\n", "Working with databases gives us the opportunity to use an independent client as a tool to look at the tables of a database and even to create and try out SQL queries. This indpendent route allows us to separate the concerns of the Python programming that we do in Jupyter notebooks from problem solving data queries.\n", "\n", "In this class, I tend to use one of two independent clients:\n", "\n", "- **Table Plus**, a cross-platform (Mac, PC, Linux) with a free trial version. The trial version only allows two open tabs and/or two database connections, but can be sufficient for many \"check it out\" scenarios. \n", " - Link: https://tableplus.com/\n", " - Works with both remote MySQL databases as well as local SQLite file-based databases.\n", "- **MySQL Workbench**, another cross-platorm client application that is more full-featured than Table Plus, but is specific to remote MySQL databases.\n", " - Link: https://www.mysql.com/products/workbench/\n", " \n", "Go ahead and download and install at least the TablePlus program on your own machine.\n", "\n", "We will not spend time configuring these now, but you should take note of the information needed to connect to remote MySQL databases and to local SQLite databases from the cells below, as this same information will be used to configure connection setups in both of these independent clients." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set User Credentials\n", "\n", "With a shared resource at a provider like a MySQL RDBMS, we need to use credentials to authenticate ourselves to the server, and need the logical location of the server itself.\n", "\n", "For these notebooks, these are kept in a text file named 'creds.json', stored either in the same directory or in a data directory. For this notebook, this is stored in the same directory as the notebook.\n", "\n", "- Right click on the `creds.json` file and select *Open With*->*Editor*\n", "- Replace the mysql dictionary's key for \"user\" (currently `\"nostudent\"`) with the base part of your email address (i.e. without the `.denison.edu`). \n", "- Also Replace the mysql dictionary's key for \"pass\" (currently `\"nostudent\"`) with the base part of your email address. Your password on the mysql server, at present, is the same as your user. \n", "- The server should be correct, mapped to `\"hadoop.mathsci.denison.edu\"`. \n", "- The scheme should be correct, mapped to `\"mysql+mysqlconnector\"`.\n", "- The database should be correct, mapped to `\"book\"`.\n", "\n", "**Make sure to use double quotes for strings** ... this is `JSON`, not Python, and we have to follow JSON syntax.\n", "\n", "Once this is complete, execute the following cell to get these values into global variables." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Server: hadoop.mathsci.denison.edu\n", "MySQL user name: nostudent\n", "MySQL user password nostudent\n", "Database: 'book'\n" ] } ], "source": [ "scheme, server, user, password, database = getmysql_creds()\n", "print(\"Server:\", server)\n", "print(\"MySQL user name:\", user)\n", "print(\"MySQL user password\", password)\n", "print(\"Database:\", repr(database))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Package Double Check\n", "\n", "If you followed all the instructions at the initial install of Anaconda on your computer, these packages should already be installed on your machine. \n", "\n", "1. Check for what SQL-related packages are installed: Click the `+` in the upper left of Jupyter Lab and open a Terminal. If you have a mac, at the command prompt, type the following command (but without the dollar sign):\n", "\n", " `$ conda list | grep sql`\n", "\n", " I get the following:\n", " ```\n", " ipython-sql 0.3.9 pypi_0 pypi\n", " mysql-connector-c 6.1.11 hccea1a4_0 \n", " mysql-connector-python 8.0.18 py37h3febbb0_1 \n", " sqlalchemy 1.3.9 py37h1de35cc_0 \n", " sqlite 3.30.1 ha441bb4_0 \n", " sqlparse 0.3.0 pypi_0 pypi\n", " ```\n", "\n", "If you have a PC, the above command will not work for you but you can still run the two commands below (again, without typing the dollar sign).\n", "\n", "2. If `mysql-connector-python` is not present:\n", "\n", " `$ conda install mysql-connector-python`\n", " \n", " Accept the defaults and install.\n", " \n", "3. If `ipython-sql` is not present:\n", "\n", " `$ pip install ipython-sql`\n", " \n", " Accept the defaults and install" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Notebook Extension to Enable \"SQL Magic\"\n", "\n", "This extension is provided by the `ipython-sql` package installed previously, and allows the three forms of sql within Notebooks (code cells, not markdown cells)\n", "\n", "1. `%%sql`: Magic to make an entire multi-line cell be interpreted as an SQL to be sent as a request to a connected server.\n", "2. Single line `%sql`: Magic to allow a single line SQL, placed in its entirety to the right of the `%sql` and the only thing in the cell.\n", "3. Embedded `%sql`: Magic to allow the SQL command *following the `%sql`* to be executed as part of a larger Python code block. This allows the integration of Python code and variables incorporated with sql requests.\n", "\n", "**Execute the following cell to load the sql-magics extension**\n", "\n", "> Note that if this cell is executed a second time within the same kernel, it will not reload. Normally this is fine, unless the sql magics module has changed." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "%load_ext sql" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## REMOTE DATABASE SERVER\n", "\n", "### Connection String\n", "\n", "A connection string rolls togther the information about the scheme, the user, the password, and the server. At this point, it is still just a string that we have constructed in Python, and named `cstring`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "template = '{}://{}:{}@{}/{}'\n", "cstring = template.format(scheme, user, password, server, database if database is not None else \"\")\n", "print(\"Connection string:\", cstring)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Establish Connection from Client to Server\n", "\n", "The cell below uses the Python variable constructed earlier to actually establish a connection with the server. If when you execute this cell, you get a `'Connected'` message, you are ready to proceed. If not, then the most likely problem is one with the JSON specifications of your credentials. Much less likely is a problem with the user that was created in the MySQL Server.\n", "\n", "If you cannot get a connection established, switch over to MySQL Workbench and try from that different client." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%sql $cstring" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Make a Query" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try and execute the following line, which uses an \"inline\" sql magic to exectute a query to the database and retrieve the result." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "nbgrader": { "grade": true, "grade_id": "cell-bbdb09230afb60cf", "locked": false, "points": 0, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///../../dbfiles/book.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
codepopgdplifecell
CHN1386.412143.576.41469.88
FRA66.872586.2982.569.02
GBR66.062637.8781.279.1
IND1338.662652.5568.81168.9
USA325.1519485.478.5391.6
" ], "text/plain": [ "[('CHN', 1386.4, 12143.5, 76.4, 1469.88),\n", " ('FRA', 66.87, 2586.29, 82.5, 69.02),\n", " ('GBR', 66.06, 2637.87, 81.2, 79.1),\n", " ('IND', 1338.66, 2652.55, 68.8, 1168.9),\n", " ('USA', 325.15, 19485.4, 78.5, 391.6)]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT * from indicators0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----------------\n", "\n", "----------------" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## LOCAL DATABASE\n", "\n", "### Connection String\n", "\n", "For a local database file, a connection string rolls togther the information about the scheme, the directory holding the database files, and the name of the database. At this point, it is still just a string that we have constructed in Python, and named `cstring`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connection string: sqlite:///../../dbfiles/book.db\n" ] } ], "source": [ "scheme, dbdir, database = getsqlite_creds()\n", "template = '{}:///{}/{}.db'\n", "cstring = template.format(scheme, dbdir, database)\n", "print(\"Connection string:\", cstring)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Establish Connection from Client to Server\n", "\n", "The cell below uses the Python variable constructed earlier to actually establish a connection with the server. If you cannot get a connection established, switch over to `TablePlus` and try from that different client." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "%sql $cstring" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Make a Query" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try and execute the following line, which uses an \"inline\" sql magic to exectute a query to the database and retrieve the result." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "nbgrader": { "grade": true, "grade_id": "cell-bbdb09230afb60cf", "locked": false, "points": 0, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///../../dbfiles/book.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
codepopgdplifecell
CHN1386.412143.576.41469.88
FRA66.872586.2982.569.02
GBR66.062637.8781.279.1
IND1338.662652.5568.81168.9
USA325.1519485.478.5391.6
" ], "text/plain": [ "[('CHN', 1386.4, 12143.5, 76.4, 1469.88),\n", " ('FRA', 66.87, 2586.29, 82.5, 69.02),\n", " ('GBR', 66.06, 2637.87, 81.2, 79.1),\n", " ('IND', 1338.66, 2652.55, 68.8, 1168.9),\n", " ('USA', 325.15, 19485.4, 78.5, 391.6)]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT * from indicators0" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Create Assignment", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }