Denison CS181/DA210 Homework¶

Before you turn this problem in, make sure everything runs as expected. This is a combination of restarting the kernel and then running all cells (in the menubar, select Kernel$\rightarrow$Restart And Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE".

import os
import os.path
import pandas as pd

datadir = "publicdata"

Q1 Make the following into a pandas data frame, assigning it to variable df.

{'foo': ['one','one','one','two','two','two'],
 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
 'baz': [1, 2, 3, 4, 5, 6]}

If the values one and two from column foo should head columns (so it takes more than one row to interpret a single observation), and the values themselves come from the baz column, what transformation/reshaping operation should be used to obtain a tidy version of this data? Include your answer as a comment in your code cell.

# YOUR CODE HERE
raise NotImplementedError()
df

# Testing cell

assert True

Q2 What parameter arguments would be needed for this operation to do its job?

YOUR ANSWER HERE

Q3 Perform the operation and assign the result to df2.

# YOUR CODE HERE
raise NotImplementedError()
df2

# Testing cell

assert True

Q4 Make the following into a pandas data frame. Assign it to df.

{'A': {0: 'a', 1: 'b', 2: 'c'},
'B': {0: 2, 1: 4, 2: 6},
'C': {0: 1, 1: 3, 2: 5},
'D': {0: 1, 1: 2, 2: 4}}

Suppose further that we have determined that columns B and D are really values of a variable called X. What transformation/reshaping operation should be used to obtain a tidy version of this data? Enter your answer as a comment in the code cell where you create the data frame.

# YOUR CODE HERE
raise NotImplementedError()
df

# Testing cell

assert True

Q5 At a minimum, what parameter arguments would be needed for this operation to do its job?

YOUR ANSWER HERE

Q6 Perform the operation and assign the result to df2.

# YOUR CODE HERE
raise NotImplementedError()
df2

# Testing cell

assert True

Q7 Consider the file ratings.csv. It has columns for first name, last name, RatingA, used for rating a particular restaurant (A), and RatingB, used for rating a different restaurant (B). The name of a "rater" should be a single variable. The particular restaurants are values of the data set. Transform the given dataset into a tidy data set, naming it ratings_tidy. Do not give the new data set a row label index.

# Solution cell

# YOUR CODE HERE
raise NotImplementedError()
ratings_tidy

# Testing cell
assert True

Q8 Consider the file restaurants_gender.csv, that has aggregated other data and whose rows map from an id, restaurant, and gender to an average rating. So, relative to this aggregation, the data is tidy as it stands. Pivot the restaurants_gender data into a matrix presentation with restaurant down one axis (as a row-label index) and gender across the other axis (as column label Index), a form that might make for good presentation. Store the result as rest_mat.

# Solution cell

# YOUR CODE HERE
raise NotImplementedError()
rest_mat

# Testing cell
assert True