Tabular Data

File Description Chapter(s)
topnames.csv
Top US social security administration application for baby names by year and by sex. Columns of year, sex, name, count. Year range from 1880 to 2018. 6, 7, 8, 9
topfemale.csv
Top female baby names, mapping year to name and count for years 1880 through 2018. 6
topmale.csv
Top male baby names, mapping year to name and count for years 1880 through 2018. 6
namesbyyear.csv
Top baby names from 2014 through 2018 with columns of years and rows of female/male sex. 6, 9
countsbyyear.csv
Application counts for the most popular baby names from 2014 through 2018 with columns of years and rows of female/male. 6
namesbyyear2.csv
Cells containing top baby names along with application counts from 2014 through 2018 with columns of years and rows of female/male sex. 6
gendercount.csv
Top baby names and counts with a row per year, and using dependent data columns of FemaleName, FemaleCount, MaleName, and MaleCount. 6
indicators2016.csv
Economic indicator data from 2016 for five countries, one per row, where country ISO code determines country name, pop, gdp, life, cell. 7, 8
indicators.csv
Economic indicator data for 207 countries for years 1960 through 2017. Variables of (country) code and year uniquely define a row, and determine 7, 8, 9
countries.csv
Country information for 207 countries, uniquely determined by country code (ccode), with a row per country. Columns include ccode, country (name), region (of the world the country is part of), and income (category from low income to high income). 8, 9



Relational Databases

Database Description Files
book Set of tables supporting book examples as described initially in Chapter 11.
Schema book.jpg
SQLite book.db
MySQL book.sql.zip
school Database of tables about courses, students, instructors, and departments as covered in Chapter 12 and beyond.
Schema school.jpg
SQLite school.db
MySQL school.sql.zip
nycflights13 Database of flights, planes, and airlines in and out of the New York City airports in 2013.
SQLite nycflights13.db
enron Subset database of emails sent to and from Enron employees recovered during the investigation following fraud by the company. (Google Drive Link due to size.)
SQLite enron.db



Hierarchical Data

Data Set Format Variants Description Chapter(s)
ind0 ind0.xml
ind0Dict.json
ind0List.json
ind0_html.xml
Economic indicator data (pop and gdp) from three countries for the years 2007 and 2017. 15, 16, 17
indicators indicators.json
indicators.xml
Economic indicator data (pop, gdp, life, cell, imports, exports) for 207 countries for years from 1960 to 2018 15, 17
topnames topnames.xml
topnames_html.xml
Most popular baby names based on applications to US social security administration for years from 1880 to 2018, recording top female name and top male name, and application counts for each. 15, 17
school0 school0.xml
school0.json
Subset and hierarchical version of the school data set, based on two departments (ART and MATH) and the associated courses and classes and instructors. 15, 17
school school.xml Hierarchical version of the school data set, incorporating all departments and the associated courses and classes and instructors. 15, 16, 17



Other Files


File Description Chapter(s)
hello.txt
Text file with single line of characters 2
twolines.txt
Text file with two lines of characters 2
tennyson.txt
Text file with multiple lines of characters 2
twolines.utf16.txt
Text file with two lines of characters encoded in
utf-16
2
baby_2010_female_name.txt
Text file with popular female names from US social security administration from 2010, one per line 2
baby_2010_male_name.txt
Text file with popular male names from US social security administration from 2010, one per line 2
baby_2010_female_namecount.txt
Text file with popular female name and application counts from US social security administration from 2010, one per line 2
names.json
Text file encoded in JSON format with list of top female names from 2010 from US social security administration 2
config.json
Text file encoded in JSON format with dictionary of configuration key names mapped to configuration values 2



Web Scraping Examples





Exercise Data Files





Copyright © 2020 and 2021, Thomas Bressoud.