No API? No Problem - Mapping Coordinates to ZIP Codes for Free – The Pleasure of Finding Things Out: A blog by James Triveri

ZIP Codes are widely used in analytics because they serve as a convenient proxy for geographic location, even though they are not true areal units. In the United States, ZIP Codes are created and managed by the United States Postal Service for the purpose of delivering mail. They represent collections of delivery routes (not enclosed polygons), and because these routes can change over time and don’t necessarily align with geographic or administrative boundaries, ZIP Codes do not have precise, fixed geographic boundaries.

The US Census Bureau releases ZIP Code Tabulation Area (ZCTA) shapefiles, which are generalized areal representations of the geographic extent and distribution of point-based ZIP Codes built using census tabulation blocks. Note however, ZCTAs are not the same as ZIP Codes, and not all valid ZIP Codes are represented by a ZCTA.

Despite these shortcomings, ZIP Codes can still reflect meaningful differences in building stock, emergency services and environmental hazards. They are also typically small and granular enough to allow for regional differentiation, and large enough to avoid some of the volatility seen at finer geographic levels like census block. Additionally, ZIP Codes are commonly used by regulators and in industry datasets, which makes them a practical choice for pricing, underwriting and reporting.

In this article, I’ll walk through two approaches that can be used to assign 5-digit ZIP Code to a sample collection of coordinate pairs without having to resort to a paid API.

OpenAddresses

OpenAddresses is an open data project that provides free, worldwide address data collected from government sources, open datasets, and community contributions. It includes individual address records containing street names, cities, ZIP Codes, longitude and latitude and sometimes administrative areas. The data is organized by country and region and is regularly updated. It is especially useful for geocoding, reverse geocoding, and location enrichment tasks where access to millions of structured address points is required.

For US locations, the data are divided into regions. For this demonstration, our goal is to assign a single zip code (not ZCTA) to 10 longitude latitude pairs falling somewhere in Iowa, so we’ll download us_midwest.zip, which is ~1GB.

Within the zip archive, we are interested in the statewide.csv files. For Iowa, the exact path would be something like …/us/ia/statewide.csv. We read this file into a Pandas DataFrame and inspect the first frew records:

%load_ext watermark

%watermark --python --conda --hostname --machine --iversions

Python implementation: CPython
Python version       : 3.12.11
IPython version      : 9.4.0

conda environment: osm

Compiler    : GCC 13.3.0
OS          : Linux
Release     : 6.8.0-60-generic
Machine     : x86_64
Processor   : x86_64
CPU cores   : 16
Architecture: 64bit

Hostname: fermi


import warnings

import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point

np.set_printoptions(suppress=True, precision=8, linewidth=1000)
pd.options.mode.chained_assignment = None
pd.set_option("display.max_columns", None)
pd.set_option("display.width", None)
warnings.filterwarnings("ignore")

# Example Iowa statewide file. Data available at https://results.openaddresses.io/
dfoa = pd.read_csv(
    "openaddr-collected-us_midwest/us/ia/statewide.csv"
)

dfoa.head(10)

	LON	LAT	NUMBER	STREET	UNIT	CITY	DISTRICT	REGION	POSTCODE	ID	HASH
0	-93.586810	41.714348	729	Southeast Kensington ROAD	NaN	Ankeny	Polk County	IA	50021	NaN	85640de13c151e73
1	-91.124941	40.821785	1826	EDWARDS STREET	NaN	BURLINGTON	DES MOINES	IA	52601	NaN	c46a81890ea9dade
2	-95.835598	41.202768	222	PEREGRINE PL	NaN	COUNCIL BLUFFS	POTTAWATTAMIE	IA	51501	NaN	ff20a503ae7884c0
3	-92.984547	41.157328	1037	565TH TRAIL	NaN	LOVILIA	MONROE COUNTY	IA	50150	NaN	50faddd361096939
4	-95.202703	42.656680	1410	MICHIGAN ST	NaN	STORM LAKE	BUENA VISTA	IA		NaN	7780387c272717bc
5	-93.637272	41.529334	6010	Southwest 15th STREET	NaN	Des Moines	Polk County	IA	50315	NaN	c83fc3e7c9ff08ec
6	-92.924891	40.626100	308	SOUTH LIBERTY STREET	NaN	CINCINNATI	APPANOOSE COUNTY	IA	NaN	NaN	05a0d2c060942c72
7	-91.154685	40.755431	5151	US HWY 61	NaN	BURLINGTON	DES MOINES	IA	52601	NaN	142186b7c0ead2b3
8	-93.734875	41.671672	6165	Northwest 86th STREET	Suite 206	Johnston	Polk County	IA	50131	NaN	07012a2220b2dd84
9	-92.916058	42.032234	205	W INGLEDUE ST	NaN	MARSHALLTOWN	MARSHALL CO	IA	50158	NaN	c1a3720939481dac

Ultimately a nearest-neighbor lookup will be used to assign a 5-digit ZIP Code to the 10 sample points using LON and LAT associated with known addresses, therefore we drop records with missing values for LON, LAT or POSTCODE and perform a bit of additional clean-up. In the last step, the original OpenAddresses dataset is converted to a GeoDataFrame:


# Clean and filter.
dfoa = dfoa.dropna(subset=["LON", "LAT", "POSTCODE"])
dfoa["POSTCODE"] = dfoa["POSTCODE"].astype(str).str.strip()
dfoa = dfoa[dfoa.POSTCODE!=""].reset_index(drop=True)

# Retain ZIP5 only (drop the plus four code).
dfoa["POSTCODE"] = dfoa["POSTCODE"].astype(str).str.zfill(5)
dfoa["ZIP5"] = dfoa["POSTCODE"].str.split("-", n=1, expand=True)[0]
dfoa = dfoa[["LON", "LAT", "ZIP5"]]

# Create GeoDataFrame.
oa = gpd.GeoDataFrame(
    dfoa,
    geometry=gpd.points_from_xy(dfoa.LON, dfoa.LAT),
    crs="EPSG:4326"
)

print(f"Iowa addresses in OpenAddresses dataset: {len(oa):,}")

oa.head()

Iowa addresses in OpenAddresses dataset: 1,314,249

	LON	LAT	ZIP5	geometry
0	-93.586810	41.714348	50021	POINT (-93.58681 41.71435)
1	-91.124941	40.821785	52601	POINT (-91.12494 40.82178)
2	-95.835598	41.202768	51501	POINT (-95.8356 41.20277)
3	-92.984547	41.157328	50150	POINT (-92.98455 41.15733)
4	-93.637272	41.529334	50315	POINT (-93.63727 41.52933)

We create a sample of 10 coordinate pairs that fall within the Iowa state boundary:

# Sample points of interest. All fall within Iowa state boundary.
pois = pd.DataFrame({
    "id": np.arange(10),
    "lon": [-91.4239, -92.5043, -90.6057, -93.6110, -93.1085,
            -90.6103, -95.8953, -92.4293, -93.5519, -95.0369],
    "lat": [41.9252, 40.6608, 41.5667, 42.0368, 41.3333, 
            41.5861, 41.2213, 42.4960, 41.6058, 40.7294]
    }
)

# Create GeoDataFrame.
pois = gpd.GeoDataFrame(
    pois,
    geometry=gpd.points_from_xy(pois.lon, pois.lat),
    crs="EPSG:4326"
)

pois.head(10)

	id	lon	lat	geometry
0	0	-91.4239	41.9252	POINT (-91.4239 41.9252)
1	1	-92.5043	40.6608	POINT (-92.5043 40.6608)
2	2	-90.6057	41.5667	POINT (-90.6057 41.5667)
3	3	-93.6110	42.0368	POINT (-93.611 42.0368)
4	4	-93.1085	41.3333	POINT (-93.1085 41.3333)
5	5	-90.6103	41.5861	POINT (-90.6103 41.5861)
6	6	-95.8953	41.2213	POINT (-95.8953 41.2213)
7	7	-92.4293	42.4960	POINT (-92.4293 42.496)
8	8	-93.5519	41.6058	POINT (-93.5519 41.6058)
9	9	-95.0369	40.7294	POINT (-95.0369 40.7294)

Let’s visualize the 10 points to get an idea of their distribution across the state:


import folium

f = folium.Figure(width=700, height=500)

lats, lons = pois.lat.tolist(), pois.lon.tolist()
coords = list(zip(lats, lons))

# Center map on central Iowa. 
mid_lon, mid_lat = -93.0977, 41.8780
 
m = folium.Map(
    location=[mid_lat, mid_lon], 
    zoom_start=7
).add_to(f)

for lat, lon in coords:

    folium.CircleMarker(
        location=[lat, lon], 
        radius=5, 
        color="red", 
        fill_color="red", 
        fill=True,
        fill_opacity=1
        ).add_to(m)

m

Within GeoPandas, sjoin_nearest performs a spatial join that matches each geometry in one GeoDataFrame to the nearest geometry in another GeoDataFrame based on geometric distance. Instead of checking for intersection or containment like a standard spatial join, sjoin_nearest finds the closest match in terms of Euclidean distance (in projected coordinates) or great-circle distance if using geographic coordinates.

In our example, each of the 10 points will get assigned the ZIP Code associated with the nearest known address point based on great-circle distance:


# Perform nearest neighbor lookup to known lon-lat zip code.
pois = gpd.sjoin_nearest(
    pois,
    oa[["ZIP5", "geometry"]],
    how="left",
    distance_col="dist_meters"
)

pois.head()

	id	lon	lat	geometry	index_right	ZIP5	dist_meters
0	0	-91.4239	41.9252	POINT (-91.4239 41.9252)	896565	52314	0.000180
1	1	-92.5043	40.6608	POINT (-92.5043 40.6608)	684538	52537	0.000204
2	2	-90.6057	41.5667	POINT (-90.6057 41.5667)	659368	52806	0.000064
3	3	-93.6110	42.0368	POINT (-93.611 42.0368)	1261954	50010	0.000030
4	4	-93.1085	41.3333	POINT (-93.1085 41.3333)	207898	50138	0.000654

In the resulting join, index_right specifies which row index from the OpenAddresses dataset served as the nearest match to the sample point.

Since our sample data is represented using a geographic coordinate reference system, dist_meters represents the distance to the nearest known address in terms of decimal degrees, not meters (even if we named the column "dist_meters"). If accurate distance measurements to the nearest known address are required, it is necessary to re-project the data to a projected CRS, then re-execute the spatial join:


# Using a projected CRS.
pois = pd.DataFrame({
    "id": np.arange(10),
    "lon": [-91.4239, -92.5043, -90.6057, -93.6110, -93.1085,
            -90.6103, -95.8953, -92.4293, -93.5519, -95.0369],
    "lat": [41.9252, 40.6608, 41.5667, 42.0368, 41.3333, 
            41.5861, 41.2213, 42.4960, 41.6058, 40.7294]
    }
)

# Create GeoDataFrame.
pois = gpd.GeoDataFrame(
    pois,
    geometry=gpd.points_from_xy(pois.lon, pois.lat),
    crs="EPSG:4326"
)

# Re-project to a projected CRS for accurate distance calculations.
pois = pois.to_crs("EPSG:3857")

# Ensure same CRS for OpenAddresses data.
oa = oa.to_crs("EPSG:3857")

pois = gpd.sjoin_nearest(
    pois, 
    oa[["ZIP5", "geometry"]], 
    distance_col="dist_meters"
)

pois.head(10)

	id	lon	lat	geometry	index_right	ZIP5	dist_meters
0	0	-91.4239	41.9252	POINT (-10177261.994 5149781.344)	896565	52314	20.088399
1	1	-92.5043	40.6608	POINT (-10297531.572 4962437.751)	684538	52537	29.855588
2	2	-90.6057	41.5667	POINT (-10086180.387 5096292.577)	659368	52806	9.535298
3	3	-93.6110	42.0368	POINT (-10420728.853 5166493.501)	1261954	50010	4.511569
4	4	-93.1085	41.3333	POINT (-10364790.809 5061628.338)	207898	50138	78.010282
5	5	-90.6103	41.5861	POINT (-10086692.457 5099179.465)	519778	52806	10.161212
6	6	-95.8953	41.2213	POINT (-10675015.965 5045038.366)	395965	51501	90.983468
7	7	-92.4293	42.4960	POINT (-10289182.61 5235569.854)	550407	50613	49.319472
8	8	-93.5519	41.6058	POINT (-10414149.871 5102111.883)	187076	50317	11.959234
9	9	-95.0369	40.7294	POINT (-10579459.315 4972509.787)	358873	51632	292.360969

For the first sample (id = 0), dist_meters indicates the nearest known address is ~20 meters away.

Finding k-Nearest ZIP Codes with BallTree

sjoin_nearest performs a spatial join by finding the nearest geometry in one GeoDataFrame for each geometry in another using a spatial index. When sjoin_nearest is called, it uses the spatial index to find candidate nearest geometries efficiently, then refines the match using actual geometric distance.

One limitation of using sjoin_nearest is that only a single ZIP Code will be assigned to each sample. There may be cases where knowing the ZIP Codes of the five or ten nearest addresses might be of interest, possibly as an indicator of the homogeneity of ZIP Codes in a given region: If the 10 nearest addresses all share the same ZIP Code, it is likely that the sample point shares that ZIP Code (assuming a relatively small distance to each address).

We can identify and retrieve ZIP Codes associated with the k-nearest addresses and corresponding distances using a BallTree. Within scikit-learn, BallTree implements a fast nearest-neighbor search using a space-partitioning tree. It is not geospatially aware and does not work directly with geometry objects, and instead works with raw coordinate arrays. In the next cell, we create a BallTree instance, and identify the 10 nearest ZIP Codes for each sample in pois. Note that BallTree with haversine distance expects input in radians, not degrees:


from sklearn.neighbors import BallTree

# Convert lon/lat to radians for BallTree when using haversine 
# metric.
# OpenAddresses data.
oa_coords = np.deg2rad(oa[["LON", "LAT"]].values)

# 10 sample data points. 
poi_coords = np.deg2rad(pois[["lon", "lat"]].values)

# Create BallTree instance.
bt = BallTree(oa_coords, metric="haversine")

# Query top 10 neighbors.
dists, indices = bt.query(poi_coords, k=10)

print(f"dists:\n{dists}\n")

print(f"indices:\n{indices}\n")

dists:
[[0.00000078 0.00000128 0.00000155 0.0000019  0.00000203 0.00000257 0.00000312 0.00000329 0.0000034  0.00000356]
 [0.00000043 0.00000734 0.00000983 0.0000122  0.00001249 0.00001339 0.000014   0.00001426 0.00001839 0.00002441]
 [0.00000014 0.00000024 0.00000029 0.00000048 0.00000056 0.0000006  0.00000061 0.00000093 0.00000093 0.00000094]
 [0.00000003 0.0000005  0.0000005  0.0000005  0.00000062 0.00000064 0.00000076 0.00000085 0.00000114 0.00000131]
 [0.00000287 0.00000313 0.00000334 0.00000416 0.00000419 0.00000437 0.00000449 0.00000452 0.00000454 0.00000457]
 [0.00000067 0.00000076 0.00000078 0.00000083 0.0000009  0.00000114 0.00000118 0.00000121 0.00000123 0.00000125]
 [0.00000158 0.00000411 0.00000447 0.00000459 0.00000587 0.00000624 0.0000063  0.00000662 0.00000686 0.00000708]
 [0.00000245 0.00000253 0.00000458 0.00000494 0.00000522 0.0000057  0.00000577 0.00000604 0.00000629 0.00000641]
 [0.00000059 0.00000117 0.00000119 0.00000134 0.00000143 0.00000153 0.00000159 0.00000168 0.00000172 0.00000194]
 [0.0000043  0.00000476 0.00000553 0.00000741 0.00001087 0.00001098 0.0000111  0.00001117 0.00001213 0.00001238]]

indices:
[[ 547709  457711  131556  510091   13847  856680  896565  192896 1064575  158127]
 [ 684538 1093125 1130773  540050  784185 1083753  600773  614572   58343  334934]
 [ 659368  286207  432808  627125  391029 1006928  338206  390721  575752  267197]
 [1261954 1280697  443539 1027393  567643 1038369  116769  799895  331017   70198]
 [ 312170  797161  506549   92729  392748  322733  532624  969750 1307702   79626]
 [ 241231  519778  282477  333948 1158412  575133  486008 1200591  238422 1134008]
 [ 395965  605176  682735  450244 1282644  889027 1276471  355442 1138247  620641]
 [ 766037  732231 1011916  550407 1161263 1088782  351109 1042692   25833 1034179]
 [ 187076 1143818 1077490  343559  836727 1266959 1041256   48610  677851  868357]
 [1245842  165862  471164  764378  276749 1284297  358873  362510 1020793  984672]]

indices is a 10 x 10 array representing the indices into the original OpenAddresses GeoDataFrame corresponding to the 10 nearest addresses. We can lookup the associated ZIP Codes as follows:


# Lookup ZIP Codes for first sample (row 0)

idx = indices[0]

oa.loc[idx, "ZIP5"].tolist()

['52314',
 '52314',
 '52314',
 '52314',
 '52314',
 '52314',
 '52314',
 '52314',
 '52314',
 '52314']

For the first point of interest, the 10 nearest ZIP Codes are the same, indicating a degree of homegeneity for that region.

dists is also a 10 x 10 array representing the distances to each of the 10 nearest addresses. When using the haversine metric, the distances will be returned in radians. To get distance in meters, multiply by 6,371,000 (assumes an Earth radius of 6,371km):


# Convert radians to meters.
dists*= 6_371_000  

dists.round(2)

array([[  4.99,   8.14,   9.9 ,  12.13,  12.96,  16.39,  19.89,  20.93,  21.68,  22.66],
       [  2.74,  46.77,  62.65,  77.71,  79.54,  85.31,  89.2 ,  90.82, 117.18, 155.54],
       [  0.9 ,   1.5 ,   1.82,   3.03,   3.58,   3.81,   3.91,   5.94,   5.95,   5.98],
       [  0.21,   3.17,   3.18,   3.21,   3.95,   4.09,   4.86,   5.44,   7.28,   8.36],
       [ 18.31,  19.97,  21.31,  26.53,  26.72,  27.86,  28.58,  28.78,  28.91,  29.13],
       [  4.24,   4.87,   4.99,   5.28,   5.74,   7.29,   7.53,   7.68,   7.82,   7.98],
       [ 10.08,  26.16,  28.49,  29.22,  37.38,  39.73,  40.13,  42.16,  43.71,  45.09],
       [ 15.6 ,  16.12,  29.2 ,  31.48,  33.29,  36.34,  36.75,  38.48,  40.05,  40.83],
       [  3.77,   7.46,   7.61,   8.55,   9.09,   9.74,  10.13,  10.73,  10.98,  12.35],
       [ 27.4 ,  30.32,  35.26,  47.18,  69.28,  69.94,  70.69,  71.14,  77.29,  78.86]])

Notice the distances increase from left to right. The nearest neighboring address will be at index 0 for each row, the furthest at the last index.

Let’s attach the ZIP Codes and distances to the original pois GeoDataFrame:


# Add ZIP Codes and distances to pois GeoDataFrame.
pois["zip5_bt"] = [oa.loc[ii, "ZIP5"].tolist() for ii in indices]
pois["dist_bt"] = [dists[ii].tolist() for ii in range(len(dists))]

pois.head(10)

	id	lon	lat	geometry	index_right	ZIP5	dist_meters	zip5_bt	dist_bt
0	0	-91.4239	41.9252	POINT (-10177261.994 5149781.344)	896565	52314	20.088399	[52314, 52314, 52314, 52314, 52314, 52314, 523...	[4.985134882078796, 8.141201258446523, 9.90310...
1	1	-92.5043	40.6608	POINT (-10297531.572 4962437.751)	684538	52537	29.855588	[52537, 52537, 52537, 52537, 52537, 52537, 525...	[2.7405501278443767, 46.76946329403264, 62.646...
2	2	-90.6057	41.5667	POINT (-10086180.387 5096292.577)	659368	52806	9.535298	[52806, 52806, 52806, 52806, 52806, 52806, 528...	[0.9037957011912015, 1.500812774709375, 1.8239...
3	3	-93.6110	42.0368	POINT (-10420728.853 5166493.501)	1261954	50010	4.511569	[50010, 50010, 50010, 50010, 50010, 50010, 500...	[0.21342223699654192, 3.17473669691422, 3.1849...
4	4	-93.1085	41.3333	POINT (-10364790.809 5061628.338)	207898	50138	78.010282	[50138, 50138, 50138, 50138, 50138, 50138, 501...	[18.31278035974411, 19.967693631899426, 21.309...
5	5	-90.6103	41.5861	POINT (-10086692.457 5099179.465)	519778	52806	10.161212	[52806, 52806, 52806, 52806, 52806, 52806, 528...	[4.242762617075453, 4.870854521824348, 4.99311...
6	6	-95.8953	41.2213	POINT (-10675015.965 5045038.366)	395965	51501	90.983468	[51501, 51501, 51501, 51501, 51501, 51501, 515...	[10.084917943102482, 26.164037229203412, 28.48...
7	7	-92.4293	42.4960	POINT (-10289182.61 5235569.854)	550407	50613	49.319472	[50613, 50613, 50613, 50613, 50613, 50613, 506...	[15.604875075255679, 16.124558588950354, 29.19...
8	8	-93.5519	41.6058	POINT (-10414149.871 5102111.883)	187076	50317	11.959234	[50317, 50317, 50317, 50317, 50317, 50317, 503...	[3.772941024353452, 7.462518447657111, 7.60926...
9	9	-95.0369	40.7294	POINT (-10579459.315 4972509.787)	358873	51632	292.360969	[51632, 51632, 51632, 51632, 51632, 51632, 516...	[27.402712270050465, 30.31954566212826, 35.258...

We confirm that the ZIP Code identified using sjoin_nearest (ZIP5) matches the nearest address ZIP Code for each of the 10 points of interest.

As final diagnostic, we can visualize all the points from the OpenAddresses dataset that map to one of our POIs nearest ZIP Codes, along with the convex hull enclosing those points. This will provide additional context missing from a purely distance based diagnostic. We focus on the point with id = 1, and perform the following steps:

Filter oa to addresses having ZIP Code = 52537.
Combine resulting geometries using .unary_union.
Call .convex_hull to compute the convex hull polygon enclosing the result from the previous step.
Visualize address points, convex hull and original point of interest.


# Filter oa to addresses having ZIP Code = 52537.
# Repoject to WGS84 for visualization.
oa52537 = oa[oa.ZIP5=="52537"].to_crs("EPSG:4326").reset_index(drop=True)

print(f"Addresses in ZIP Code 52537: {len(oa52537):,}")

oa52537.head(10)

Addresses in ZIP Code 52537: 1,830

	LON	LAT	ZIP5	geometry
0	-92.452221	40.750748	52537	POINT (-92.45222 40.75075)
1	-92.330243	40.803125	52537	POINT (-92.33024 40.80313)
2	-92.561039	40.628394	52537	POINT (-92.56104 40.62839)
3	-92.468560	40.710687	52537	POINT (-92.46856 40.71069)
4	-92.315096	40.747551	52537	POINT (-92.3151 40.74755)
5	-92.219034	40.748971	52537	POINT (-92.21903 40.74897)
6	-92.341592	40.712831	52537	POINT (-92.34159 40.71283)
7	-92.420783	40.702076	52537	POINT (-92.42078 40.70208)
8	-92.221492	40.778673	52537	POINT (-92.22149 40.77867)
9	-92.428911	40.876458	52537	POINT (-92.42891 40.87646)


# Get convex hull for ZIP Code = 52537.
cvh = oa52537.unary_union.convex_hull

# Get POI lat and lon.
poi_y = pois[pois.id==1]["lat"].item()
poi_x = pois[pois.id==1]["lon"].item()

# Initialize map.
f = folium.Figure(width=800, height=525)
m = folium.Map(location=[poi_y, poi_x], zoom_start=10).add_to(f)


folium.Marker(
    location=[poi_y, poi_x],
    icon=folium.Icon(color="red"),
    popup=f"id=1: {poi_x} {poi_y}"
    ).add_to(m)

# Add address points to map.
for point in oa52537.geometry:
    folium.CircleMarker(
        location=[point.y, point.x],
        radius=3.5,
        color="#4f9de4",
        fill=True,
        weight=1,
        fill_opacity=0.35
    ).add_to(m)


# Add the convex hull polygon to the map.
folium.Polygon(
    locations=[[lat, lon] for lon, lat in cvh.exterior.coords],
    color="#184771",
    fill=False
).add_to(m)


m

The POI (red marker) is surrounded by many addresses with the same ZIP Code. This provides additional confidence that the assignment is reasonable and relatively accurate.

Conclusion

ZCTAs <> ZIP Codes! They are not one-to-one. Some ZIP Codes don’t have ZCTA equivalents. ZCTAs are statistical areas and do not map exactly to USPS ZIP Codes. ZIp Codes represent collections of delivery routes (not enclosed polygons).
ZCTAs are updated only with each decennial census, while USPS updates ZIP Codes constantly.
ZIP Codes that span multiple counties or regions may be misrepresented in ZCTA form.
Even though the OpenAddresses dataset contains over 500 million addresses, coverage is variable, and is generally better in urban/suburban areas than in rural areas. Some large metropolitan areas may be missing address data altogether.
If you’re building something regulatory or production-facing, you should disclose that the USPS does not publish ZIP Code boundaries, and this method uses nearest inferred ZIP Codes from publicly available data.