Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says YOUR CODE HERE or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [ ]:
NAME = ""
COLLABORATORS = ""

In [ ]:
import os
import os.path
import pandas as pd

datadir = "publicdata"
In [ ]:
# Example1: Column Values as a Mashup

df1 = pd.read_csv(os.path.join(datadir, "mashup1.csv"))
df1.head()
In [ ]:
# 1. function to extract code
# 2. function to extract country
# 3. apply to get code vector
# 4. apply to get country vector
# 5. drop current country
# 6. add new code and new country
# 7. cleanup on index and column order
In [ ]:
# Example2: Year/Month Mashup/String

df2 = pd.read_csv(os.path.join(datadir, "metropolis.csv"))
df2.head()
In [ ]:
# Steps?
In [ ]:
# Example3: Multiple rows per country

df3 = pd.read_csv(os.path.join(datadir, "mult_rows.csv"))
df3.head()
In [ ]:
# What operation to solve this one?
In [ ]:
# Example4: Multiple rows per country/year combination

df4 = pd.read_csv(os.path.join(datadir, "mult_rows2.csv"))
df4.head()
In [ ]:
# What operation(s) to solve this?
In [ ]:
# Example5: Single indicator variable with multiple years

df5 = pd.read_csv(os.path.join(datadir, "pop_columns.csv"))
df5.head()
In [ ]:
# Steps to resolve??
In [ ]:
# Example6: Multiple indicator variables with multiple years

df6 = pd.read_csv(os.path.join(datadir, "popgdp_columns.csv"))
df6.head()
In [ ]:
# Steps to resolve??
In [ ]:
# Example7: Two tables for a single mapping

df7a = pd.read_csv(os.path.join(datadir, "topfemale.csv"))
df7b = pd.read_csv(os.path.join(datadir, "topmale.csv"))
df7a.head()
In [ ]:
df7b.head()
In [ ]:
# How to solve??
In [ ]:
# Example8: One table with multiple mappings

df8 = pd.read_csv(os.path.join(datadir, "mixed_table.csv"))
df8.head()
In [ ]:
# How to solve?