Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All).
Make sure you fill in any place that says YOUR CODE HERE
or "YOUR ANSWER HERE", as well as your name and collaborators below:
NAME = ""
COLLABORATORS = ""
import os
import os.path
import pandas as pd
datadir = "publicdata"
Q1 Fill in the remaing code in the following cell, to build List of Dictionaries representation of the data in madison_temp.csv
in the data directory, assigning to LoD
. Your code is a single assigment to add exactly one field to a dictionary row.
path = os.path.join(datadir, "madison_temp.csv")
LoD = []
with open(path, 'r') as f:
columns = f.readline().strip().split(',')
for line in f:
D = {}
fields = line.strip().split(',')
for column_number, value in enumerate(fields):
# YOUR CODE HERE
raise NotImplementedError()
LoD.append(D)
Q2 Using the same incantation as the book in section 6.3.1 on data frame creation, create a pandas data frame named temps
using the LoD
created in Q1.
# YOUR CODE HERE
raise NotImplementedError()
Q3 Using the methods shown in 6.3.2 to obtain the prefix and suffix of the data, do so for the temps
data frame in the following two cells.
# YOUR CODE HERE
raise NotImplementedError()
# YOUR CODE HERE
raise NotImplementedError()
Q4 In a single assignment line, using an attribute of the data frame object, assign to nrows
and ncols
the number of rows and columns in the temperatures data set.
# YOUR CODE HERE
raise NotImplementedError()
print(nrows, ncols)
Q5 In the following code cell, invoke the data frame method that gives detailed information and the dataframe (its Index
, columns, and data types for each of the columns. The, in the markdown cell below that, write your observations on the Index
and the data types. Given the data in the data set, what data types are not correct if we want to work with the data set? (Hint: On page 168, in the first paragraph, we see a similar set of observations for the topnames
data set.)
# YOUR CODE HERE
raise NotImplementedError()
YOUR ANSWER HERE
Q6 Print the columns
attribute of the data frame and then the index
attribute of the data frame.
# YOUR CODE HERE
raise NotImplementedError()
Q7 Carefully consider the data set. In the cell that follows, identify the independent variable(s) and the dependent variable(s).
YOUR ANSWER HERE
Q8 As done in the book (pg 169) set the Index
(the logical row identifiers) to be the variable or variables you have identified as the independent variable(s). It you identified more than one independent variable, you specify a list. If there is just a single independent variable, you can either specify a list with one element, or can specify just the string column name. Assign the result to temps2
.
# YOUR CODE HERE
raise NotImplementedError()
Q9 Using the method that gives a prefix of the data, invoke the method to show the first 10 rows of temps2
.
# YOUR CODE HERE
raise NotImplementedError()
Q10 Repeat the method to find information about a data frame, this time using the temps2
data frame. What are the differences between the earlier information gathering about temps
?
# YOUR CODE HERE
raise NotImplementedError()
YOUR ANSWER HERE
Q11 The last DataFrame method discussed in Section 6.3 allows us to transform an existing data frame and convert data types associated with columns. Invoke this method on either the temps
or temps2
data frames and convert the EMXT
and EMNT
columns to be integer numbers. Assign the result to temps3
, and show the last 10 rows of the result.
# YOUR CODE HERE
raise NotImplementedError()
Q12 At the end of section 6.3.1, the book demonstrates how to construct a data frame directly from a CSV file. Do so now to create a data frame named earthquakes
from the CSV file named "earthquake_all_month.csv"
in the data directory.
# YOUR CODE HERE
raise NotImplementedError()
earthquakes.head()
Q_n In any time remaining, explore the earthquakes
data frame performing as many of the methods from Q3 through Q11 as are applicable.