Before you turn this problem in, make sure everything runs as expected. This is a combination of restarting the kernel and then running all cells (in the menubar, select Kernel$\rightarrow$Restart And Run All).
Make sure you fill in any place that says YOUR CODE HERE
or "YOUR ANSWER HERE".
import os
import os.path
import pandas as pd
datadir = "publicdata"
Q1 Make the following into a pandas
data frame, assigning it to variable df
.
{'foo': ['one','one','one','two','two','two'],
'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6]}
If the values one
and two
from column foo
should head columns (so it takes more than one row to interpret a single observation), and the values themselves come from the baz
column, what transformation/reshaping operation should be used to obtain a tidy version of this data? Include your answer as a comment in your code cell.
# YOUR CODE HERE
raise NotImplementedError()
df
# Testing cell
assert True
Q2 What parameter arguments would be needed for this operation to do its job?
YOUR ANSWER HERE
Q3 Perform the operation and assign the result to df2
.
# YOUR CODE HERE
raise NotImplementedError()
df2
# Testing cell
assert True
Q4 Make the following into a pandas
data frame. Assign it to df
.
{'A': {0: 'a', 1: 'b', 2: 'c'},
'B': {0: 2, 1: 4, 2: 6},
'C': {0: 1, 1: 3, 2: 5},
'D': {0: 1, 1: 2, 2: 4}}
Suppose further that we have determined that columns B
and D
are really values of a variable called X
. What transformation/reshaping operation should be used to obtain a tidy version of this data? Enter your answer as a comment in the code cell where you create the data frame.
# YOUR CODE HERE
raise NotImplementedError()
df
# Testing cell
assert True
Q5 At a minimum, what parameter arguments would be needed for this operation to do its job?
YOUR ANSWER HERE
Q6 Perform the operation and assign the result to df2
.
# YOUR CODE HERE
raise NotImplementedError()
df2
# Testing cell
assert True
Q7 Consider the file ratings.csv
. It has columns for first name, last name, RatingA, used for rating a particular restaurant (A), and RatingB, used for rating a different restaurant (B). The name of a "rater" should be a single variable. The particular restaurants are values of the data set. Transform the given dataset into a tidy data set, naming it ratings_tidy
. Do not give the new data set a row label index.
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()
ratings_tidy
# Testing cell
assert True
Q8 Consider the file restaurants_gender.csv
, that has aggregated other data and whose rows map from an id, restaurant, and gender to an average rating. So, relative to this aggregation, the data is tidy as it stands. Pivot the restaurants_gender
data into a matrix presentation with restaurant down one axis (as a row-label index) and gender across the other axis (as column label Index), a form that might make for good presentation. Store the result as rest_mat
.
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()
rest_mat
# Testing cell
assert True