Denison CS181/DA210 Homework

Before you turn this problem in, make sure everything runs as expected. This is a combination of restarting the kernel and then running all cells (in the menubar, select Kernel$\rightarrow$Restart And Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE".


In [ ]:
import os

datadir = "publicdata"

Q1 Write a function

lineLengths(filepath)

that processes a file, line by line, and accumulates and returns a list of tuples, one per line. Each tuple consists of the two value of the line number and the length of the line, excluding any leading or trailing whitespace (spaces, tabs, or newlines). Line numbers in a file start at 1. So for the hello.txt file:

Hello, world!
Welcome to 'Introduction to Data Systems'.

the result is the list of tuples: [(1, 13), (2, 42)]

In [ ]:
# Solution cell

# YOUR CODE HERE
raise NotImplementedError()
filepath = os.path.join(datadir, "tennyson.txt")
lineLengths(filepath)
In [ ]:
# Testing cell

filepath = os.path.join(datadir, "tennyson.txt")
assert lineLengths(filepath) == [(1, 26), (2, 24), (3, 0), (4, 40), (5, 14)]

filepath = os.path.join(datadir, "hello.txt")
assert lineLengths(filepath) == [(1, 13), (2, 42)]

Q2: Consider a variation of the babynames and counts file as depicted below:

22127,      Jacob
18002,      Ethan
17350,    Michael
17179,     Jayden
17051,    William
16756,  Alexander

Each line of the file captures one data case/observation. The values are separated by commas, with the count occurring first and the name second, and spaces are used to align the columns of data to make it easier for a human reader.

Write a function

readNamesCounts(filepath)

that processes the filepath file and yields a tuple whose first element is a reference to a list of names, and whose second element is a reference to a list of integer counts.

In [ ]:
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()
path = os.path.join(datadir, "babynames.txt")
assert os.path.isfile(path)
namelist, countlist = readNamesCounts(path)
In [ ]:
# Testing cell
path = os.path.join(datadir, "babynames.txt")
assert os.path.isfile(path)
namelist, countlist = readNamesCounts(path)
assert len(namelist) == 6
assert len(countlist) == 6
assert "Jacob" in namelist[0]
In [ ]: