No API? No Problem - Mapping Coordinates to ZIP Codes for Free
Assigning ZIP Codes to coordinates without paying for API access
Python
Geospatial
Published
July 20, 2025
ZIP Codes are widely used in analytics because they serve as a convenient proxy for geographic location, even though they are not true areal units. In the United States, ZIP Codes are created and managed by the United States Postal Service for the purpose of delivering mail. They represent collections of delivery routes (not enclosed polygons), and because these routes can change over time and don’t necessarily align with geographic or administrative boundaries, ZIP Codes do not have precise, fixed geographic boundaries.
The US Census Bureau releases ZIP Code Tabulation Area (ZCTA) shapefiles, which are generalized areal representations of the geographic extent and distribution of point-based ZIP Codes built using census tabulation blocks. Note however, ZCTAs are not the same as ZIP Codes, and not all valid ZIP Codes are represented by a ZCTA.
Despite these shortcomings, ZIP Codes can still reflect meaningful differences in building stock, emergency services and environmental hazards. They are also typically small and granular enough to allow for regional differentiation, and large enough to avoid some of the volatility seen at finer geographic levels like census block. Additionally, ZIP Codes are commonly used by regulators and in industry datasets, which makes them a practical choice for pricing, underwriting and reporting.
In this article, I’ll walk through two approaches that can be used to assign 5-digit ZIP Code to a sample collection of coordinate pairs without having to resort to a paid API.
OpenAddresses
OpenAddresses is an open data project that provides free, worldwide address data collected from government sources, open datasets, and community contributions. It includes individual address records containing street names, cities, ZIP Codes, longitude and latitude and sometimes administrative areas. The data is organized by country and region and is regularly updated. It is especially useful for geocoding, reverse geocoding, and location enrichment tasks where access to millions of structured address points is required.
For US locations, the data are divided into regions. For this demonstration, our goal is to assign a single zip code (not ZCTA) to 10 longitude latitude pairs falling somewhere in Iowa, so we’ll download us_midwest.zip, which is ~4GB compressed.
Within the zip archive, we are interested in the statewide-addresses-state.geojson files. For Iowa, the exact path would be something like ./../us/ia/statewide-addresses-state.geojson. We read this file into a GeoDataFrame and inspect the first few records:
Python implementation: CPython
Python version : 3.11.11
IPython version : 8.31.0
conda environment: module3
Compiler : MSC v.1942 64 bit (AMD64)
OS : Windows
Release : 10
Machine : AMD64
Processor : Intel64 Family 6 Model 170 Stepping 4, GenuineIntel
CPU cores : 22
Architecture: 64bit
Hostname: JTRIZPC11
numpy : 1.26.3
folium : 0.16.0
pandas : 2.2.0
geopandas: 0.14.3
import warningsimport numpy as npimport pandas as pdimport geopandas as gpdfrom shapely.geometry import Pointnp.set_printoptions(suppress=True, precision=8, linewidth=1000)pd.options.mode.chained_assignment =Nonepd.set_option("display.max_columns", None)pd.set_option("display.width", None)warnings.filterwarnings("ignore")# Example Iowa statewide file. Data available at https://results.openaddresses.io/oa = gpd.read_file("./ia/statewide-addresses-state.geojson")print(f"oa.shape: {oa.shape}")oa.head(10)
oa.shape: (1467745, 10)
hash
number
street
unit
city
district
region
postcode
id
geometry
0
a5899128ed263520
729
Southeast Kensington ROAD
Ankeny
Polk County
IA
50021
POINT (-93.58681 41.71435)
1
211e49dde5333505
1826
EDWARDS STREET
BURLINGTON
DES MOINES
IA
52601
POINT (-91.12494 40.82178)
2
d1da12189105c053
222
PEREGRINE PL
COUNCIL BLUFFS
POTTAWATTAMIE
IA
51501
POINT (-95.83560 41.20277)
3
32d64202ed448b32
1037
565TH TRAIL
LOVILIA
MONROE COUNTY
IA
50150
POINT (-92.98455 41.15733)
4
b1707b5d2717b540
1410
MICHIGAN ST
STORM LAKE
BUENA VISTA
IA
POINT (-95.20270 42.65668)
5
3d92dd6500c35ceb
6010
Southwest 15th STREET
Des Moines
Polk County
IA
50315
POINT (-93.63727 41.52933)
6
fc135f3ab81d87ac
308
SOUTH LIBERTY STREET
CINCINNATI
APPANOOSE COUNTY
IA
POINT (-92.92489 40.62610)
7
8d873ba4490e4f85
5151
US HWY 61
BURLINGTON
DES MOINES
IA
52601
POINT (-91.15469 40.75543)
8
a3a763e4eae64973
6165
Northwest 86th STREET
Suite 206
Johnston
Polk County
IA
50131
POINT (-93.73488 41.67167)
9
d5a4c455796ec14b
205
W INGLEDUE ST
MARSHALLTOWN
MARSHALL CO
IA
50158
POINT (-92.91606 42.03223)
Ultimately a nearest-neighbor lookup will be used to assign a 5-digit ZIP Code to the 10 sample points using coordinates associated with known addresses, therefore we drop records with missing geometry or postcode and perform a bit of additional clean-up. In the last step:
# Clean and filter.oa = oa.dropna(subset=["postcode", "geometry"])oa["postcode"] = oa["postcode"].astype(str).str.strip()oa = oa[oa.postcode!=""].reset_index(drop=True)# Retain ZIP5 only (drop the plus four code).oa["postcode"] = oa["postcode"].astype(str).str.zfill(5)oa["zip5"] = oa["postcode"].str.split("-", n=1, expand=True)[0]oa = oa[["zip5", "geometry"]]print(f"Iowa addresses in OpenAddresses dataset: {len(oa):,}")oa.head()
Iowa addresses in OpenAddresses dataset: 1,314,249
zip5
geometry
0
50021
POINT (-93.58681 41.71435)
1
52601
POINT (-91.12494 40.82178)
2
51501
POINT (-95.83560 41.20277)
3
50150
POINT (-92.98455 41.15733)
4
50315
POINT (-93.63727 41.52933)
We create a sample of 10 coordinate pairs that fall within the Iowa state boundary:
# Sample points of interest. All fall within Iowa state boundary.pois = pd.DataFrame({"id": np.arange(10),"lon": [-91.4239, -92.5043, -90.6057, -93.6110, -93.1085,-90.6103, -95.8953, -92.4293, -93.5519, -95.0369],"lat": [41.9252, 40.6608, 41.5667, 42.0368, 41.3333, 41.5861, 41.2213, 42.4960, 41.6058, 40.7294] })# Create GeoDataFrame.pois = gpd.GeoDataFrame( pois, geometry=gpd.points_from_xy(pois.lon, pois.lat), crs="EPSG:4326")pois.head(10)
id
lon
lat
geometry
0
0
-91.4239
41.9252
POINT (-91.42390 41.92520)
1
1
-92.5043
40.6608
POINT (-92.50430 40.66080)
2
2
-90.6057
41.5667
POINT (-90.60570 41.56670)
3
3
-93.6110
42.0368
POINT (-93.61100 42.03680)
4
4
-93.1085
41.3333
POINT (-93.10850 41.33330)
5
5
-90.6103
41.5861
POINT (-90.61030 41.58610)
6
6
-95.8953
41.2213
POINT (-95.89530 41.22130)
7
7
-92.4293
42.4960
POINT (-92.42930 42.49600)
8
8
-93.5519
41.6058
POINT (-93.55190 41.60580)
9
9
-95.0369
40.7294
POINT (-95.03690 40.72940)
Let’s visualize the 10 points to get an idea of their distribution across the state:
import foliumlats, lons = pois.lat.tolist(), pois.lon.tolist()coords =list(zip(lats, lons))# Center map on central Iowa. mid_lon, mid_lat =-93.0977, 41.8780m = folium.Map( location=[mid_lat, mid_lon], zoom_start=8)for lat, lon in coords: folium.CircleMarker( location=[lat, lon], radius=5, color="red", fill_color="red", fill=True, fill_opacity=1 ).add_to(m)m
Make this Notebook Trusted to load map: File -> Trust Notebook
Within GeoPandas, sjoin_nearest performs a spatial join that matches each geometry in one GeoDataFrame to the nearest geometry in another GeoDataFrame based on geometric distance. Instead of checking for intersection or containment like a standard spatial join, sjoin_nearest finds the closest match in terms of Euclidean distance (in projected coordinates) or great-circle distance if using geographic coordinates.
In our example, each of the 10 points will get assigned the ZIP Code associated with the nearest known address point based on great-circle distance:
# Perform nearest neighbor lookup to known lon-lat zip code.pois = gpd.sjoin_nearest( pois, oa[["zip5", "geometry"]], how="left", distance_col="dist_meters")pois.head()
id
lon
lat
geometry
index_right
zip5
dist_meters
0
0
-91.4239
41.9252
POINT (-91.42390 41.92520)
896565
52314
0.000180
1
1
-92.5043
40.6608
POINT (-92.50430 40.66080)
684538
52537
0.000204
2
2
-90.6057
41.5667
POINT (-90.60570 41.56670)
659368
52806
0.000064
3
3
-93.6110
42.0368
POINT (-93.61100 42.03680)
1261954
50010
0.000030
4
4
-93.1085
41.3333
POINT (-93.10850 41.33330)
207898
50138
0.000654
In the resulting join, index_right specifies which row index from the OpenAddresses dataset served as the nearest match to the sample point.
Since our sample data is represented using a geographic coordinate reference system, dist_meters represents the distance to the nearest known address in terms of decimal degrees, not meters (even if we named the column "dist_meters"). If accurate distance measurements to the nearest known address are required, it is necessary to re-project the data to a projected CRS, then re-execute the spatial join:
# Using a projected CRS.pois = pd.DataFrame({"id": np.arange(10),"lon": [-91.4239, -92.5043, -90.6057, -93.6110, -93.1085,-90.6103, -95.8953, -92.4293, -93.5519, -95.0369],"lat": [41.9252, 40.6608, 41.5667, 42.0368, 41.3333, 41.5861, 41.2213, 42.4960, 41.6058, 40.7294] })# Create GeoDataFrame.pois = gpd.GeoDataFrame( pois, geometry=gpd.points_from_xy(pois.lon, pois.lat), crs="EPSG:4326")# Re-project to a projected CRS for accurate distance calculations.pois = pois.to_crs("EPSG:3857")# Ensure same CRS for OpenAddresses data. Unpack lat and lon for next section.oa["lon"] = oa.geometry.xoa["lat"] = oa.geometry.yoa = oa.to_crs("EPSG:3857")pois = gpd.sjoin_nearest( pois, oa[["zip5", "geometry"]], distance_col="dist_meters")pois.head(10)
id
lon
lat
geometry
index_right
zip5
dist_meters
0
0
-91.4239
41.9252
POINT (-10177261.994 5149781.344)
896565
52314
20.088399
1
1
-92.5043
40.6608
POINT (-10297531.572 4962437.751)
684538
52537
29.855588
2
2
-90.6057
41.5667
POINT (-10086180.387 5096292.577)
659368
52806
9.535298
3
3
-93.6110
42.0368
POINT (-10420728.853 5166493.501)
1261954
50010
4.511569
4
4
-93.1085
41.3333
POINT (-10364790.809 5061628.338)
207898
50138
78.010282
5
5
-90.6103
41.5861
POINT (-10086692.457 5099179.465)
519778
52806
10.161212
6
6
-95.8953
41.2213
POINT (-10675015.965 5045038.366)
395965
51501
90.983468
7
7
-92.4293
42.4960
POINT (-10289182.610 5235569.854)
550407
50613
49.319472
8
8
-93.5519
41.6058
POINT (-10414149.871 5102111.883)
187076
50317
11.959234
9
9
-95.0369
40.7294
POINT (-10579459.315 4972509.787)
358873
51632
292.360969
For the first sample (id = 0), dist_meters indicates the nearest known address is ~20 meters away.
Finding k-Nearest ZIP Codes with BallTree
sjoin_nearest performs a spatial join by finding the nearest geometry in one GeoDataFrame for each geometry in another using a spatial index. When sjoin_nearest is called, it uses the spatial index to find candidate nearest geometries efficiently, then refines the match using actual geometric distance.
One limitation of using sjoin_nearest is that only a single ZIP Code will be assigned to each sample. There may be cases where knowing the ZIP Codes of the five or ten nearest addresses might be of interest, possibly as an indicator of the homogeneity of ZIP Codes in a given region: If the 10 nearest addresses all share the same ZIP Code, it is likely that the sample point shares that ZIP Code (assuming a relatively small distance to each address).
We can identify and retrieve ZIP Codes associated with the k-nearest addresses and corresponding distances using a BallTree. Within scikit-learn, BallTree implements a fast nearest-neighbor search using a space-partitioning tree. It is not geospatially aware and does not work directly with geometry objects, and instead works with raw coordinate arrays. In the next cell, we create a BallTree instance, and identify the 10 nearest ZIP Codes for each sample in pois. Note that BallTree with haversine distance expects input in radians, not degrees:
from sklearn.neighbors import BallTree# Convert lon/lat to radians for BallTree when using haversine # metric.# OpenAddresses data.oa_coords = np.deg2rad(oa[["lat", "lon"]].values)# 10 sample data points. poi_coords = np.deg2rad(pois[["lat", "lon"]].values)# Create BallTree instance.bt = BallTree(oa_coords, metric="haversine")# Query top 10 neighbors.dists, indices = bt.query(poi_coords, k=10)print(f"dists:\n{dists}\n")print(f"indices:\n{indices}\n")
indices is a 10 x 10 array representing the indices into the original OpenAddresses GeoDataFrame corresponding to the 10 nearest addresses. We can lookup the associated ZIP Codes as follows:
# Lookup ZIP Codes for first sample (row 0)idx = indices[0]oa.loc[idx, "zip5"].tolist()
For the first point of interest, the 10 nearest ZIP Codes are the same, indicating a degree of homegeneity for that region.
dists is also a 10 x 10 array representing the distances to each of the 10 nearest addresses. When using the haversine metric, the distances will be returned in radians. To get distance in meters, multiply by 6,371,000 (assumes an Earth radius of 6,371km):
# Convert radians to meters.dists*=6_371_000dists.round(2)
Notice the distances increase from left to right. The nearest neighboring address will be at index 0 for each row, the furthest at the last index.
Let’s attach the ZIP Codes and distances to the original pois GeoDataFrame:
# Add ZIP Codes and distances to pois GeoDataFrame.pois["zip5_bt"] = [oa.loc[ii, "zip5"].tolist() for ii in indices]pois["dist_bt"] = [dists[ii].tolist() for ii inrange(len(dists))]pois.head(10)
id
lon
lat
geometry
index_right
zip5
dist_meters
zip5_bt
dist_bt
0
0
-91.4239
41.9252
POINT (-10177261.994 5149781.344)
896565
52314
20.088399
[52314, 52314, 52314, 52314, 52314, 52314, 523...
[4.985134881214199, 8.141201259829977, 9.90310...
1
1
-92.5043
40.6608
POINT (-10297531.572 4962437.751)
684538
52537
29.855588
[52537, 52537, 52537, 52537, 52537, 52537, 525...
[2.740550127855483, 46.76946329287494, 62.6460...
2
2
-90.6057
41.5667
POINT (-10086180.387 5096292.577)
659368
52806
9.535298
[52806, 52806, 52806, 52806, 52806, 52806, 528...
[0.9037957011905808, 1.500812774709375, 1.8239...
3
3
-93.6110
42.0368
POINT (-10420728.853 5166493.501)
1261954
50010
4.511569
[50010, 50010, 50010, 50010, 50010, 50010, 500...
[0.21342223699654192, 3.17473669691422, 3.1849...
4
4
-93.1085
41.3333
POINT (-10364790.809 5061628.338)
207898
50138
78.010282
[50138, 50138, 50138, 50138, 50138, 50138, 501...
[18.31278035976892, 19.967693631552844, 21.309...
5
5
-90.6103
41.5861
POINT (-10086692.457 5099179.465)
519778
52806
10.161212
[52806, 52806, 52806, 52806, 52806, 52806, 528...
[4.24276261571108, 4.870854523238734, 4.993111...
6
6
-95.8953
41.2213
POINT (-10675015.965 5045038.366)
395965
51501
90.983468
[51501, 51501, 51501, 51501, 51501, 51501, 515...
[10.084917943102482, 26.16403722913567, 28.485...
7
7
-92.4293
42.4960
POINT (-10289182.610 5235569.854)
550407
50613
49.319472
[50613, 50613, 50613, 50613, 50613, 50613, 506...
[15.604875076006634, 16.124558588950354, 29.19...
8
8
-93.5519
41.6058
POINT (-10414149.871 5102111.883)
187076
50317
11.959234
[50317, 50317, 50317, 50317, 50317, 50317, 503...
[3.7729410257543035, 7.462518448995617, 7.6092...
9
9
-95.0369
40.7294
POINT (-10579459.315 4972509.787)
358873
51632
292.360969
[51632, 51632, 51632, 51632, 51632, 51632, 516...
[27.402712270050465, 30.319545661080536, 35.25...
We confirm that the ZIP Code identified using sjoin_nearest (ZIP5) matches the nearest address ZIP Code for each of the 10 points of interest.
As final diagnostic, we can visualize all the points from the OpenAddresses dataset that map to one of our POIs nearest ZIP Codes, along with the convex hull enclosing those points. This will provide additional context missing from a purely distance based diagnostic. We focus on the point with id = 1, and perform the following steps:
Filter oa to addresses having ZIP Code = 52537.
Combine resulting geometries using .unary_union.
Call .convex_hull to compute the convex hull polygon enclosing the result from the previous step.
Visualize address points, convex hull and original point of interest.
# Filter oa to addresses having ZIP Code = 52537.# Repoject to WGS84 for visualization.oa52537 = oa[oa.zip5=="52537"].to_crs("EPSG:4326").reset_index(drop=True)print(f"Addresses in ZIP Code 52537: {len(oa52537):,}")oa52537.head(10)
Addresses in ZIP Code 52537: 1,830
zip5
geometry
lon
lat
0
52537
POINT (-92.45222 40.75075)
-92.452221
40.750748
1
52537
POINT (-92.33024 40.80313)
-92.330243
40.803125
2
52537
POINT (-92.56104 40.62839)
-92.561039
40.628394
3
52537
POINT (-92.46856 40.71069)
-92.468560
40.710687
4
52537
POINT (-92.31510 40.74755)
-92.315096
40.747551
5
52537
POINT (-92.21903 40.74897)
-92.219034
40.748971
6
52537
POINT (-92.34159 40.71283)
-92.341592
40.712831
7
52537
POINT (-92.42078 40.70208)
-92.420783
40.702076
8
52537
POINT (-92.22149 40.77867)
-92.221492
40.778673
9
52537
POINT (-92.42891 40.87646)
-92.428911
40.876458
# Get convex hull for ZIP Code = 52537.cvh = oa52537.unary_union.convex_hull# Get POI lat and lon.poi_y = pois[pois.id==1]["lat"].item()poi_x = pois[pois.id==1]["lon"].item()# Initialize map.f = folium.Figure(width=800, height=525)m = folium.Map(location=[poi_y, poi_x], zoom_start=10).add_to(f)folium.Marker( location=[poi_y, poi_x], icon=folium.Icon(color="red"), popup=f"id=1: {poi_x}{poi_y}" ).add_to(m)# Add address points to map.for point in oa52537.geometry: folium.CircleMarker( location=[point.y, point.x], radius=3.5, color="#4f9de4", fill=True, weight=1, fill_opacity=0.35 ).add_to(m)# Add the convex hull polygon to the map.folium.Polygon( locations=[[lat, lon] for lon, lat in cvh.exterior.coords], color="#184771", fill=False).add_to(m)m
The POI (red marker) is surrounded by many addresses with the same ZIP Code. This provides additional confidence that the assignment is reasonable and relatively accurate.
Conclusion
ZCTAs <> ZIP Codes! They are not one-to-one. Some ZIP Codes don’t have ZCTA equivalents. ZCTAs are statistical areas and do not map exactly to USPS ZIP Codes. ZIp Codes represent collections of delivery routes (not enclosed polygons).
ZCTAs are updated only with each decennial census, while USPS updates ZIP Codes constantly.
ZIP Codes that span multiple counties or regions may be misrepresented in ZCTA form.
Even though the OpenAddresses dataset contains over 500 million addresses, coverage is variable, and is generally better in urban/suburban areas than in rural areas. Some large metropolitan areas may be missing address data altogether.
If you’re building something regulatory or production-facing, you should disclose that the USPS does not publish ZIP Code boundaries, and this method uses nearest inferred ZIP Codes from publicly available data.