The matplotlib basemap toolkit is a library for plotting 2D data on maps in
Python. The Basemap constructor takes as input points
specifying the latitude and longitude of the location of interest, along with
an optional argument, resolution
, that specifies the desired quality of
the rendered map. This Basemap instance exposes the readshapefile
method, which contains the necessary details to render 2D maps with various
types of boundaries. The Basemap constructor can accept many additional
arguments, but they will not be covered here. Instead, I’ll demonstrate how to
render a map of Chicago, Illinois that represents variation in population
density by zip code using matplotlib’s default colormaps. The end result will
be rendered as follows:
Installing basemap for Python3
The original basemap implementation was developed for Python 2, and to my knowledge no analog exists for Python 3. Instead, the examples in this post utilize basemap as distributed and maintained via conda-forge. Assuming a functional version of Anaconda for Python 3 is available, installation is as straightforward as:
$ conda install -c conda-forge basemap
The basemap documentation page announced that basemap is under new management, and the Cartopy project will replace basemap going forward. It may be worthwhile updating this article at some point in the future demonstrating use of the Cartopy API. But for now, we’ll stick with basemap.
Acquiring Shapefiles and Zip Code Data
In this post, we reference the shapefiles available for download
here. The
archive will contain a number of files which are then consumed by the
readshapefile
method of the basemap instance. The Chicago population data
by zipcode can be downloaded here.
Displaying Zipcode Boundaries
To begin, we first demonstrate how to create an un-shaded 2D map of Chicago with zip code boundaries rendered. Chicago’s coordinates are (41.8781N, 87.6298W), and we can use this knowledge to construct a 4-point window that surrounds the geographic location of interest:
"""
Render a 2D map of Chicago with zip code boundaries.
"""
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import os
import os.path
us_shape_file_dir = "cb_2017_us_zcta510_500k"
os.chdir(us_shape_file_dir)
# Chicago coordinates.
lowerlon = -88.2
upperlon = -87.2
lowerlat = 41.62
upperlat = 42.05
m = Basemap(
llcrnrlon=lowerlon,
llcrnrlat=lowerlat,
urcrnrlon=upperlon,
urcrnrlat=upperlat,
resolution='c',
projection='lcc',
lat_0=lowerlat,
lat_1=upperlat,
lon_0=lowerlon,
lon_1=upperlon
)
shp_info = m.readshapefile(os.path.basename(us_shape_file_dir), 'state')
plt.gca().axis("off")
plt.show()
Running this produces the following:
The Basemap constructor argument resolution
can be set to one of the
following values:
- c (crude)
- l (low)
- i (intermediate)
- h (high)
- f (full)
- None
If None, no boundary data will be read in. Note that resolution drops off by roughly 80% between selections, but higher resolution datasets are much slower to render.
To represent population density by zip code graphically is slightly more involved. Instead of describing step-by-step how to go about it, I’ll instead present the code, then will comment on a few sections after the rendering is generated:
from mpl_toolkits.basemap import Basemap
from matplotlib.collections import LineCollection
import matplotlib as mpl
from matplotlib.colors import rgb2hex
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os
# Read in population data.
pop_path = "ChicagoPopulation.csv"
DF = read_csv("Sheet1")
colormap = plt.cm.Purples
us_shape_file_dir = "cb_2017_us_zcta510_500k"
os.chdir(us_shape_file_dir)
# Chicago coordinates.
lowerlon = -88.2
upperlon = -87.2
lowerlat = 41.62
upperlat = 42.05
m = Basemap(
llcrnrlon=lowerlon,
llcrnrlat=lowerlat,
urcrnrlon=upperlon,
urcrnrlat=upperlat,
projection="lcc",
resolution="c",
lat_0=lowerlat,
lat_1=upperlat,
lon_0=lowerlon,
lon_1=upperlon
)
shp_info = m.readshapefile(
os.path.basename(us_shape_file_dir),'states',drawbounds=True
)
# Convert integer ZIP5 field to character dtype.
DF['ZIP5'] = DF['ZIP5'].astype(str)
# Read population density info into popdens dict. Take square root of
# actual density for better color mapping.
popdens = {
str(i):np.sqrt(j) for (i, j) in zip(DF.ZIP5.values,DF.POPULATION.values)
}
# Choose a color for each state based on population density. Range
# vmin-vmax has arbitrarily been set to 0-6. Fee lfree to experiment
# with other ranges.
ziplist = []
colors = {}
vmin = 0.
vmax = 6.
# Filter m.states_info to only Chicago zipcodes.
zip_info = m.states_info
popdiv = (max(popdens.values())/(vmax-vmin))
popdensscl = {i:(j/popdiv) for (i,j) in popdens.items()}
for d in zip_info:
iterzip = d["ZCTA5CE10"]
if iterzip in popdensscl.keys():
iterpop = popdensscl.get(iterzip,0)
colors[iterzip] = colormap(iterpop/vmax)[:3]
ziplist.append(iterzip)
for nshape,seg in enumerate(m.states):
i, j = zip(*seg)
if ziplist[nshape] in popdensscl.keys():
color = rgb2hex(colors[ziplist[nshape]])
edgecolor = "#000000"
plt.fill(i,j,color,edgecolor=edgecolor);
# (Optional) include colorbar.
sm = plt.cm.ScalarMappable(
cmap=colormap,norm=mpl.colors.Normalize(vmin=vmin, vmax=vmax)
)
mm = plt.cm.ScalarMappable(cmap=colormap)
mm.set_array([vmin, vmax])
plt.colorbar(mm,ticks=np.arange(vmin, vmax+1, 1),orientation="vertical")
plt.title("Chicago Population by ZIP5")
plt.gca().axis("off")
plt.show()
Running the code above will produce the following:
matplotlib includes many additional colormaps. Available options can be found
here. For representing
population variation across regions, be sure to stick with the Sequential
color mappings.
Note that when reading in the population density file into the popdens
dict, we take the square root of the population. The motivation for doing this
is to make the transition from low to high population zip codes more
noticeable. Feel free to experiment.
The zip_info
object is a list of dicts containing information about
each zip code. The dict elements are:
>>> zip_info[0]
{'AFFGEOID10': '8600000US35442',
'ALAND10': 610213891,
'AWATER10': 10838694,
'GEOID10': '35442',
'RINGNUM': 1,
'SHAPENUM': 1,
'ZCTA5CE10': '35442'}
zip_info
contains 42703 elements, one for each US zip code (the zip code
represented by the dict is given by the ‘ZCTA5CE10’ key). Similarly,
Basemap.states
is a list, also of length 42703, that contains nested lists
of tuples specifying the extact boundaries of the target shapefiles. For
example, the first element of m.states
above is a list containing 175
tuples that help to describe the perimiter of zipcode ‘35442’.
In the last part of the script, we include a colorbar that helps with interpeting the color variation across zip codes. The color bar can be positioned either horizontally or vertically. This is controlled by colorbar’s orientation argument.
Conclusion
This post was intended as a quick introduction to geoprocessing with Python. We rendered a 2D map of the City of Chicago, and demonstrated how to represent variation across geographic boundaries, in this case zip codes. As mentioned in the introduction, the Cartopy project has taken over management of basemap, so for serious geoprocessing projects, leveraging the Cartopy API probably makes the most sense. Until next time, happy coding!