SQL: Advanced Queries

In [1]:
import pandas as pd
import os
import os.path
import json
import sys
import importlib

module_dir = "../../modules"
module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
    sys.path.append(module_path)

import dbutil
importlib.reload(dbutil)

%load_ext sql

Instructions

Set User Credentials

Edit creds.json to reflect your mysql user and password

This must be done prior to executing the following cell

In general, you will be able to choose whether you are using the remote MySQL database or the SQLite database(s) by setting the dbsource variable to "mysql" or "sqlite" respectively. The function dbutil.db_cstring function computes a connection string for the chosen dbsource using the information in the creds.json file. If the last argument to this function is present, the generated connection string uses that datbase as superceding the name of the database in creds.json.

In [3]:
dbsource = "sqlite"
db = "book"
cstring = dbutil.db_cstring(dbsource, "creds.json", ".", db)

%sql $cstring

12.3.5 Join Table from Subquery

  1. Develop the subquery. Do you see the flaw in the following?
In [6]:
%%sql
SELECT year, code, MAX(pop)
FROM indicators
GROUP BY code
LIMIT 15
 * sqlite:///../../dbfiles/book.db
Done.
Out[6]:
year code MAX(pop)
2017 ABW 0.11
2018 AFG 37.17
2018 AGO 30.81
1990 ALB 3.29
2004 AND 0.08
2018 ARE 9.63
2018 ARG 44.49
1989 ARM 3.54
1997 ASM 0.06
2017 ATG 0.1
2018 AUS 24.99
2018 AUT 8.85
2018 AZE 9.94
2018 BDI 11.18
2018 BEL 11.42
In [10]:
%%sql
SELECT code, MAX(pop) AS max_pop
FROM indicators
GROUP BY code
LIMIT 7
 * sqlite:///../../dbfiles/book.db
Done.
Out[10]:
code max_pop
ABW 0.11
AFG 37.17
AGO 30.81
ALB 3.29
AND 0.08
ARE 9.63
ARG 44.49
In [13]:
%%sql
SELECT I.code, I.year, maxTable.max_pop FROM 
   (SELECT code, MAX(pop) AS max_pop
     FROM indicators
     GROUP BY code) AS maxTable
   INNER JOIN indicators AS I
     ON (I.pop = maxTable.max_pop) AND (I.code = maxTable.code)
ORDER BY I.code
LIMIT 14
 * sqlite:///../../dbfiles/book.db
Done.
Out[13]:
code year max_pop
ABW 2017 0.11
ABW 2018 0.11
AFG 2018 37.17
AGO 2018 30.81
ALB 1990 3.29
AND 2004 0.08
AND 2005 0.08
AND 2006 0.08
AND 2007 0.08
AND 2008 0.08
AND 2009 0.08
AND 2010 0.08
AND 2011 0.08
AND 2012 0.08
In [19]:
%%sql
SELECT code, MIN(year) AS year, ROUND(AVG(max_pop),2) AS max_pop FROM
 (SELECT I.code, I.year, maxTable.max_pop FROM 
   (SELECT code, MAX(pop) AS max_pop
     FROM indicators
     GROUP BY code) AS maxTable
   INNER JOIN indicators AS I
     ON (I.pop = maxTable.max_pop) AND (I.code = maxTable.code))  AS inner
GROUP BY code
ORDER BY code
LIMIT 10
 * sqlite:///../../dbfiles/book.db
Done.
Out[19]:
code year max_pop
ABW 2017 0.11
AFG 2018 37.17
AGO 2018 30.81
ALB 1990 3.29
AND 2004 0.08
ARE 2018 9.63
ARG 2018 44.49
ARM 1989 3.54
ASM 1997 0.06
ATG 2017 0.1
In [21]:
%%sql
SELECT region, ROUND(AVG(life),2) AS avg_life,
       COUNT(*) AS num_country, COUNT(life) AS num_life
FROM indicators LEFT JOIN countries USING (code)
WHERE year = 2017 
GROUP BY region
ORDER BY avg_life DESC
 * sqlite:///../../dbfiles/book.db
Done.
Out[21]:
region avg_life num_country num_life
North America 80.8 3 3
Europe & Central Asia 77.83 58 52
Latin America & Caribbean 75.1 42 36
Middle East & North Africa 74.74 21 21
East Asia & Pacific 74.63 37 31
South Asia 70.81 8 8
Sub-Saharan Africa 62.04 48 48
None None 1 0
In [1]:
from IPython.display import Image
Image("classimages/school_schema.jpg", width=600, height=600)
Out[1]: