In [1]:
import pandas as pd
import os
import os.path
import json
In [2]:
def getmysql_creds(dirname=".",filename="creds.json"):
    """ Using directory and filename parameters, open a credentials file
        and obtain the five parts needed for a connection string to
        a remote provider using the "mysql" dictionary within
        an outer dictionary.  
        
        Return a scheme, server, user, password, and database
    """
    assert os.path.isfile(os.path.join(dirname, filename))
    with open(os.path.join(dirname, filename)) as f:
        D = json.load(f)
    mysql = D["mysql"]
    database = None
    if "database" in mysql:
        database = mysql["database"]
    return mysql["scheme"], mysql["server"], mysql["user"], mysql["pass"], database

def getsqlite_creds(dirname=".",filename="creds.json"):
    """ Using directory and filename parameters, open a credentials file
        and obtain the two parts needed for a connection string to
        a local provider using the "sqlite" dictionary within
        an outer dictionary.  
        
        Return a scheme and a dbfile
    """
    assert os.path.isfile(os.path.join(dirname, filename))
    with open(os.path.join(dirname, filename)) as f:
        D = json.load(f)
    sqlite = D["sqlite"]
    return sqlite["scheme"], sqlite["dbdir"], sqlite["database"]

Get Independent Client(s)

Working with databases gives us the opportunity to use an independent client as a tool to look at the tables of a database and even to create and try out SQL queries. This indpendent route allows us to separate the concerns of the Python programming that we do in Jupyter notebooks from problem solving data queries.

In this class, I tend to use one of two independent clients:

  • Table Plus, a cross-platform (Mac, PC, Linux) with a free trial version. The trial version only allows two open tabs and/or two database connections, but can be sufficient for many "check it out" scenarios.
    • Link: https://tableplus.com/
    • Works with both remote MySQL databases as well as local SQLite file-based databases.
  • MySQL Workbench, another cross-platorm client application that is more full-featured than Table Plus, but is specific to remote MySQL databases.

Go ahead and download and install at least the TablePlus program on your own machine.

We will not spend time configuring these now, but you should take note of the information needed to connect to remote MySQL databases and to local SQLite databases from the cells below, as this same information will be used to configure connection setups in both of these independent clients.

Set User Credentials

With a shared resource at a provider like a MySQL RDBMS, we need to use credentials to authenticate ourselves to the server, and need the logical location of the server itself.

For these notebooks, these are kept in a text file named 'creds.json', stored either in the same directory or in a data directory. For this notebook, this is stored in the same directory as the notebook.

  • Right click on the creds.json file and select Open With->Editor
  • Replace the mysql dictionary's key for "user" (currently "nostudent") with the base part of your email address (i.e. without the .denison.edu).
  • Also Replace the mysql dictionary's key for "pass" (currently "nostudent") with the base part of your email address. Your password on the mysql server, at present, is the same as your user.
  • The server should be correct, mapped to "hadoop.mathsci.denison.edu".
  • The scheme should be correct, mapped to "mysql+mysqlconnector".
  • The database should be correct, mapped to "book".

Make sure to use double quotes for strings ... this is JSON, not Python, and we have to follow JSON syntax.

Once this is complete, execute the following cell to get these values into global variables.

In [3]:
scheme, server, user, password, database = getmysql_creds()
print("Server:", server)
print("MySQL user name:", user)
print("MySQL user password", password)
print("Database:", repr(database))
Server: hadoop.mathsci.denison.edu
MySQL user name: nostudent
MySQL user password nostudent
Database: 'book'

Package Double Check

If you followed all the instructions at the initial install of Anaconda on your computer, these packages should already be installed on your machine.

  1. Check for what SQL-related packages are installed: Click the + in the upper left of Jupyter Lab and open a Terminal. If you have a mac, at the command prompt, type the following command (but without the dollar sign):

    $ conda list | grep sql

    I get the following:

     ipython-sql               0.3.9                    pypi_0    pypi
     mysql-connector-c         6.1.11               hccea1a4_0  
     mysql-connector-python    8.0.18           py37h3febbb0_1  
     sqlalchemy                1.3.9            py37h1de35cc_0  
     sqlite                    3.30.1               ha441bb4_0  
     sqlparse                  0.3.0                    pypi_0    pypi

If you have a PC, the above command will not work for you but you can still run the two commands below (again, without typing the dollar sign).

  1. If mysql-connector-python is not present:

    $ conda install mysql-connector-python

    Accept the defaults and install.

  2. If ipython-sql is not present:

    $ pip install ipython-sql

    Accept the defaults and install

Load Notebook Extension to Enable "SQL Magic"

This extension is provided by the ipython-sql package installed previously, and allows the three forms of sql within Notebooks (code cells, not markdown cells)

  1. %%sql: Magic to make an entire multi-line cell be interpreted as an SQL to be sent as a request to a connected server.
  2. Single line %sql: Magic to allow a single line SQL, placed in its entirety to the right of the %sql and the only thing in the cell.
  3. Embedded %sql: Magic to allow the SQL command following the %sql to be executed as part of a larger Python code block. This allows the integration of Python code and variables incorporated with sql requests.

Execute the following cell to load the sql-magics extension

Note that if this cell is executed a second time within the same kernel, it will not reload. Normally this is fine, unless the sql magics module has changed.

In [4]:
%load_ext sql

REMOTE DATABASE SERVER

Connection String

A connection string rolls togther the information about the scheme, the user, the password, and the server. At this point, it is still just a string that we have constructed in Python, and named cstring.

In [ ]:
template = '{}://{}:{}@{}/{}'
cstring = template.format(scheme, user, password, server, database if database is not None else "")
print("Connection string:", cstring)

Establish Connection from Client to Server

The cell below uses the Python variable constructed earlier to actually establish a connection with the server. If when you execute this cell, you get a 'Connected' message, you are ready to proceed. If not, then the most likely problem is one with the JSON specifications of your credentials. Much less likely is a problem with the user that was created in the MySQL Server.

If you cannot get a connection established, switch over to MySQL Workbench and try from that different client.

In [ ]:
%sql $cstring

Make a Query

Try and execute the following line, which uses an "inline" sql magic to exectute a query to the database and retrieve the result.

In [7]:
%sql SELECT * from indicators0
 * sqlite:///../../dbfiles/book.db
Done.
Out[7]:
code pop gdp life cell
CHN 1386.4 12143.5 76.4 1469.88
FRA 66.87 2586.29 82.5 69.02
GBR 66.06 2637.87 81.2 79.1
IND 1338.66 2652.55 68.8 1168.9
USA 325.15 19485.4 78.5 391.6


LOCAL DATABASE

Connection String

For a local database file, a connection string rolls togther the information about the scheme, the directory holding the database files, and the name of the database. At this point, it is still just a string that we have constructed in Python, and named cstring.

In [5]:
scheme, dbdir, database = getsqlite_creds()
template = '{}:///{}/{}.db'
cstring = template.format(scheme, dbdir, database)
print("Connection string:", cstring)
Connection string: sqlite:///../../dbfiles/book.db

Establish Connection from Client to Server

The cell below uses the Python variable constructed earlier to actually establish a connection with the server. If you cannot get a connection established, switch over to TablePlus and try from that different client.

In [6]:
%sql $cstring

Make a Query

Try and execute the following line, which uses an "inline" sql magic to exectute a query to the database and retrieve the result.

In [7]:
%sql SELECT * from indicators0
 * sqlite:///../../dbfiles/book.db
Done.
Out[7]:
code pop gdp life cell
CHN 1386.4 12143.5 76.4 1469.88
FRA 66.87 2586.29 82.5 69.02
GBR 66.06 2637.87 81.2 79.1
IND 1338.66 2652.55 68.8 1168.9
USA 325.15 19485.4 78.5 391.6
In [ ]: