import os
import os.path
import pandas as pd

datadir = "publicdata"

path = os.path.join(datadir, "topnames.csv")
topnames0 = pd.read_csv(path)
topnames = topnames0.set_index(['year', 'sex'])
names0 = topnames0.head(10)
names = topnames.head(10)

path = os.path.join(datadir, "indicators2016.csv")
ind0 = pd.read_csv(path)
ind = ind0.set_index('code')

large_country = ind['pop'] > 1000
large_country

code
CAN    False
CHN     True
IND     True
RUS    False
USA    False
VNM    False
Name: pop, dtype: bool

Row Selection by Condition¶

ind[large_country]

ind[ind['pop'] > 1000]

ind['life'] > 77

code
CAN     True
CHN    False
IND    False
RUS    False
USA     True
VNM    False
Name: life, dtype: bool

(ind['pop'] > 1000) | (ind['life'] > 77)

code
CAN     True
CHN     True
IND     True
RUS    False
USA     True
VNM    False
dtype: bool

What happens if we omit the parenthesis? Why?

Other subsets and Sorting¶

nlargest
nsmallest
sort_index
sort_values

	country	pop	gdp	life	cell
code
CHN	China	1378.66	11199.15	76.25	1364.93
IND	India	1324.17	2263.79	68.56	1127.81

	country	pop	gdp	life	cell
code
CHN	China	1378.66	11199.15	76.25	1364.93
IND	India	1324.17	2263.79	68.56	1127.81

Row Selection by Condition¶

Other subsets and Sorting¶

Combinations of selecting rows and projecting columns¶