No API? No Problem - Mapping Coordinates to ZIP Codes for Free
Assigning ZIP Codes to coordinates without paying for API access
Python
Geospatial
Published
July 20, 2025
ZIP Codes are widely used in analytics because they serve as a convenient proxy for geographic location, even though they are not true areal units. In the United States, ZIP Codes are created and managed by the United States Postal Service for the purpose of delivering mail. They represent collections of delivery routes (not enclosed polygons), and because these routes can change over time and don’t necessarily align with geographic or administrative boundaries, ZIP Codes do not have precise, fixed geographic boundaries.
The US Census Bureau releases ZIP Code Tabulation Area (ZCTA) shapefiles, which are generalized areal representations of the geographic extent and distribution of point-based ZIP Codes built using census tabulation blocks. Note however, ZCTAs are not the same as ZIP Codes, and not all valid ZIP Codes are represented by a ZCTA.
Despite these shortcomings, ZIP Codes can still reflect meaningful differences in building stock, emergency services and environmental hazards. They are also typically small and granular enough to allow for regional differentiation, and large enough to avoid some of the volatility seen at finer geographic levels like census block. Additionally, ZIP Codes are commonly used by regulators and in industry datasets, which makes them a practical choice for pricing, underwriting and reporting.
In this article, I’ll walk through two approaches that can be used to assign 5-digit ZIP Code to a sample collection of coordinate pairs without having to resort to a paid API.
OpenAddresses
OpenAddresses is an open data project that provides free, worldwide address data collected from government sources, open datasets, and community contributions. It includes individual address records containing street names, cities, ZIP Codes, longitude and latitude and sometimes administrative areas. The data is organized by country and region and is regularly updated. It is especially useful for geocoding, reverse geocoding, and location enrichment tasks where access to millions of structured address points is required.
For US locations, the data are divided into regions. For this demonstration, our goal is to assign a single zip code (not ZCTA) to 10 longitude latitude pairs falling somewhere in Iowa, so we’ll download us_midwest.zip, which is ~1GB.
Within the zip archive, we are interested in the statewide.csv files. For Iowa, the exact path would be something like …/us/ia/statewide.csv. We read this file into a Pandas DataFrame and inspect the first frew records:
Python implementation: CPython
Python version : 3.12.11
IPython version : 9.4.0
conda environment: osm
Compiler : GCC 13.3.0
OS : Linux
Release : 6.8.0-60-generic
Machine : x86_64
Processor : x86_64
CPU cores : 16
Architecture: 64bit
Hostname: fermi
import warningsimport numpy as npimport pandas as pdimport geopandas as gpdfrom shapely.geometry import Pointnp.set_printoptions(suppress=True, precision=8, linewidth=1000)pd.options.mode.chained_assignment =Nonepd.set_option("display.max_columns", None)pd.set_option("display.width", None)warnings.filterwarnings("ignore")# Example Iowa statewide file. Data available at https://results.openaddresses.io/dfoa = pd.read_csv("openaddr-collected-us_midwest/us/ia/statewide.csv")dfoa.head(10)
LON
LAT
NUMBER
STREET
UNIT
CITY
DISTRICT
REGION
POSTCODE
ID
HASH
0
-93.586810
41.714348
729
Southeast Kensington ROAD
NaN
Ankeny
Polk County
IA
50021
NaN
85640de13c151e73
1
-91.124941
40.821785
1826
EDWARDS STREET
NaN
BURLINGTON
DES MOINES
IA
52601
NaN
c46a81890ea9dade
2
-95.835598
41.202768
222
PEREGRINE PL
NaN
COUNCIL BLUFFS
POTTAWATTAMIE
IA
51501
NaN
ff20a503ae7884c0
3
-92.984547
41.157328
1037
565TH TRAIL
NaN
LOVILIA
MONROE COUNTY
IA
50150
NaN
50faddd361096939
4
-95.202703
42.656680
1410
MICHIGAN ST
NaN
STORM LAKE
BUENA VISTA
IA
NaN
7780387c272717bc
5
-93.637272
41.529334
6010
Southwest 15th STREET
NaN
Des Moines
Polk County
IA
50315
NaN
c83fc3e7c9ff08ec
6
-92.924891
40.626100
308
SOUTH LIBERTY STREET
NaN
CINCINNATI
APPANOOSE COUNTY
IA
NaN
NaN
05a0d2c060942c72
7
-91.154685
40.755431
5151
US HWY 61
NaN
BURLINGTON
DES MOINES
IA
52601
NaN
142186b7c0ead2b3
8
-93.734875
41.671672
6165
Northwest 86th STREET
Suite 206
Johnston
Polk County
IA
50131
NaN
07012a2220b2dd84
9
-92.916058
42.032234
205
W INGLEDUE ST
NaN
MARSHALLTOWN
MARSHALL CO
IA
50158
NaN
c1a3720939481dac
Ultimately a nearest-neighbor lookup will be used to assign a 5-digit ZIP Code to the 10 sample points using LON and LAT associated with known addresses, therefore we drop records with missing values for LON, LAT or POSTCODE and perform a bit of additional clean-up. In the last step, the original OpenAddresses dataset is converted to a GeoDataFrame:
# Clean and filter.dfoa = dfoa.dropna(subset=["LON", "LAT", "POSTCODE"])dfoa["POSTCODE"] = dfoa["POSTCODE"].astype(str).str.strip()dfoa = dfoa[dfoa.POSTCODE!=""].reset_index(drop=True)# Retain ZIP5 only (drop the plus four code).dfoa["POSTCODE"] = dfoa["POSTCODE"].astype(str).str.zfill(5)dfoa["ZIP5"] = dfoa["POSTCODE"].str.split("-", n=1, expand=True)[0]dfoa = dfoa[["LON", "LAT", "ZIP5"]]# Create GeoDataFrame.oa = gpd.GeoDataFrame( dfoa, geometry=gpd.points_from_xy(dfoa.LON, dfoa.LAT), crs="EPSG:4326")print(f"Iowa addresses in OpenAddresses dataset: {len(oa):,}")oa.head()
Iowa addresses in OpenAddresses dataset: 1,314,249
LON
LAT
ZIP5
geometry
0
-93.586810
41.714348
50021
POINT (-93.58681 41.71435)
1
-91.124941
40.821785
52601
POINT (-91.12494 40.82178)
2
-95.835598
41.202768
51501
POINT (-95.8356 41.20277)
3
-92.984547
41.157328
50150
POINT (-92.98455 41.15733)
4
-93.637272
41.529334
50315
POINT (-93.63727 41.52933)
We create a sample of 10 coordinate pairs that fall within the Iowa state boundary:
# Sample points of interest. All fall within Iowa state boundary.pois = pd.DataFrame({"id": np.arange(10),"lon": [-91.4239, -92.5043, -90.6057, -93.6110, -93.1085,-90.6103, -95.8953, -92.4293, -93.5519, -95.0369],"lat": [41.9252, 40.6608, 41.5667, 42.0368, 41.3333, 41.5861, 41.2213, 42.4960, 41.6058, 40.7294] })# Create GeoDataFrame.pois = gpd.GeoDataFrame( pois, geometry=gpd.points_from_xy(pois.lon, pois.lat), crs="EPSG:4326")pois.head(10)
id
lon
lat
geometry
0
0
-91.4239
41.9252
POINT (-91.4239 41.9252)
1
1
-92.5043
40.6608
POINT (-92.5043 40.6608)
2
2
-90.6057
41.5667
POINT (-90.6057 41.5667)
3
3
-93.6110
42.0368
POINT (-93.611 42.0368)
4
4
-93.1085
41.3333
POINT (-93.1085 41.3333)
5
5
-90.6103
41.5861
POINT (-90.6103 41.5861)
6
6
-95.8953
41.2213
POINT (-95.8953 41.2213)
7
7
-92.4293
42.4960
POINT (-92.4293 42.496)
8
8
-93.5519
41.6058
POINT (-93.5519 41.6058)
9
9
-95.0369
40.7294
POINT (-95.0369 40.7294)
Let’s visualize the 10 points to get an idea of their distribution across the state:
import foliumf = folium.Figure(width=700, height=500)lats, lons = pois.lat.tolist(), pois.lon.tolist()coords =list(zip(lats, lons))# Center map on central Iowa. mid_lon, mid_lat =-93.0977, 41.8780m = folium.Map( location=[mid_lat, mid_lon], zoom_start=7).add_to(f)for lat, lon in coords: folium.CircleMarker( location=[lat, lon], radius=5, color="red", fill_color="red", fill=True, fill_opacity=1 ).add_to(m)m
Within GeoPandas, sjoin_nearest performs a spatial join that matches each geometry in one GeoDataFrame to the nearest geometry in another GeoDataFrame based on geometric distance. Instead of checking for intersection or containment like a standard spatial join, sjoin_nearest finds the closest match in terms of Euclidean distance (in projected coordinates) or great-circle distance if using geographic coordinates.
In our example, each of the 10 points will get assigned the ZIP Code associated with the nearest known address point based on great-circle distance:
# Perform nearest neighbor lookup to known lon-lat zip code.pois = gpd.sjoin_nearest( pois, oa[["ZIP5", "geometry"]], how="left", distance_col="dist_meters")pois.head()
id
lon
lat
geometry
index_right
ZIP5
dist_meters
0
0
-91.4239
41.9252
POINT (-91.4239 41.9252)
896565
52314
0.000180
1
1
-92.5043
40.6608
POINT (-92.5043 40.6608)
684538
52537
0.000204
2
2
-90.6057
41.5667
POINT (-90.6057 41.5667)
659368
52806
0.000064
3
3
-93.6110
42.0368
POINT (-93.611 42.0368)
1261954
50010
0.000030
4
4
-93.1085
41.3333
POINT (-93.1085 41.3333)
207898
50138
0.000654
In the resulting join, index_right specifies which row index from the OpenAddresses dataset served as the nearest match to the sample point.
Since our sample data is represented using a geographic coordinate reference system, dist_meters represents the distance to the nearest known address in terms of decimal degrees, not meters (even if we named the column "dist_meters"). If accurate distance measurements to the nearest known address are required, it is necessary to re-project the data to a projected CRS, then re-execute the spatial join:
# Using a projected CRS.pois = pd.DataFrame({"id": np.arange(10),"lon": [-91.4239, -92.5043, -90.6057, -93.6110, -93.1085,-90.6103, -95.8953, -92.4293, -93.5519, -95.0369],"lat": [41.9252, 40.6608, 41.5667, 42.0368, 41.3333, 41.5861, 41.2213, 42.4960, 41.6058, 40.7294] })# Create GeoDataFrame.pois = gpd.GeoDataFrame( pois, geometry=gpd.points_from_xy(pois.lon, pois.lat), crs="EPSG:4326")# Re-project to a projected CRS for accurate distance calculations.pois = pois.to_crs("EPSG:3857")# Ensure same CRS for OpenAddresses data.oa = oa.to_crs("EPSG:3857")pois = gpd.sjoin_nearest( pois, oa[["ZIP5", "geometry"]], distance_col="dist_meters")pois.head(10)
id
lon
lat
geometry
index_right
ZIP5
dist_meters
0
0
-91.4239
41.9252
POINT (-10177261.994 5149781.344)
896565
52314
20.088399
1
1
-92.5043
40.6608
POINT (-10297531.572 4962437.751)
684538
52537
29.855588
2
2
-90.6057
41.5667
POINT (-10086180.387 5096292.577)
659368
52806
9.535298
3
3
-93.6110
42.0368
POINT (-10420728.853 5166493.501)
1261954
50010
4.511569
4
4
-93.1085
41.3333
POINT (-10364790.809 5061628.338)
207898
50138
78.010282
5
5
-90.6103
41.5861
POINT (-10086692.457 5099179.465)
519778
52806
10.161212
6
6
-95.8953
41.2213
POINT (-10675015.965 5045038.366)
395965
51501
90.983468
7
7
-92.4293
42.4960
POINT (-10289182.61 5235569.854)
550407
50613
49.319472
8
8
-93.5519
41.6058
POINT (-10414149.871 5102111.883)
187076
50317
11.959234
9
9
-95.0369
40.7294
POINT (-10579459.315 4972509.787)
358873
51632
292.360969
For the first sample (id = 0), dist_meters indicates the nearest known address is ~20 meters away.
Finding k-Nearest ZIP Codes with BallTree
sjoin_nearest performs a spatial join by finding the nearest geometry in one GeoDataFrame for each geometry in another using a spatial index. When sjoin_nearest is called, it uses the spatial index to find candidate nearest geometries efficiently, then refines the match using actual geometric distance.
One limitation of using sjoin_nearest is that only a single ZIP Code will be assigned to each sample. There may be cases where knowing the ZIP Codes of the five or ten nearest addresses might be of interest, possibly as an indicator of the homogeneity of ZIP Codes in a given region: If the 10 nearest addresses all share the same ZIP Code, it is likely that the sample point shares that ZIP Code (assuming a relatively small distance to each address).
We can identify and retrieve ZIP Codes associated with the k-nearest addresses and corresponding distances using a BallTree. Within scikit-learn, BallTree implements a fast nearest-neighbor search using a space-partitioning tree. It is not geospatially aware and does not work directly with geometry objects, and instead works with raw coordinate arrays. In the next cell, we create a BallTree instance, and identify the 10 nearest ZIP Codes for each sample in pois. Note that BallTree with haversine distance expects input in radians, not degrees:
from sklearn.neighbors import BallTree# Convert lon/lat to radians for BallTree when using haversine # metric.# OpenAddresses data.oa_coords = np.deg2rad(oa[["LON", "LAT"]].values)# 10 sample data points. poi_coords = np.deg2rad(pois[["lon", "lat"]].values)# Create BallTree instance.bt = BallTree(oa_coords, metric="haversine")# Query top 10 neighbors.dists, indices = bt.query(poi_coords, k=10)print(f"dists:\n{dists}\n")print(f"indices:\n{indices}\n")
indices is a 10 x 10 array representing the indices into the original OpenAddresses GeoDataFrame corresponding to the 10 nearest addresses. We can lookup the associated ZIP Codes as follows:
# Lookup ZIP Codes for first sample (row 0)idx = indices[0]oa.loc[idx, "ZIP5"].tolist()
For the first point of interest, the 10 nearest ZIP Codes are the same, indicating a degree of homegeneity for that region.
dists is also a 10 x 10 array representing the distances to each of the 10 nearest addresses. When using the haversine metric, the distances will be returned in radians. To get distance in meters, multiply by 6,371,000 (assumes an Earth radius of 6,371km):
# Convert radians to meters.dists*=6_371_000dists.round(2)
Notice the distances increase from left to right. The nearest neighboring address will be at index 0 for each row, the furthest at the last index.
Let’s attach the ZIP Codes and distances to the original pois GeoDataFrame:
# Add ZIP Codes and distances to pois GeoDataFrame.pois["zip5_bt"] = [oa.loc[ii, "ZIP5"].tolist() for ii in indices]pois["dist_bt"] = [dists[ii].tolist() for ii inrange(len(dists))]pois.head(10)
id
lon
lat
geometry
index_right
ZIP5
dist_meters
zip5_bt
dist_bt
0
0
-91.4239
41.9252
POINT (-10177261.994 5149781.344)
896565
52314
20.088399
[52314, 52314, 52314, 52314, 52314, 52314, 523...
[4.985134882078796, 8.141201258446523, 9.90310...
1
1
-92.5043
40.6608
POINT (-10297531.572 4962437.751)
684538
52537
29.855588
[52537, 52537, 52537, 52537, 52537, 52537, 525...
[2.7405501278443767, 46.76946329403264, 62.646...
2
2
-90.6057
41.5667
POINT (-10086180.387 5096292.577)
659368
52806
9.535298
[52806, 52806, 52806, 52806, 52806, 52806, 528...
[0.9037957011912015, 1.500812774709375, 1.8239...
3
3
-93.6110
42.0368
POINT (-10420728.853 5166493.501)
1261954
50010
4.511569
[50010, 50010, 50010, 50010, 50010, 50010, 500...
[0.21342223699654192, 3.17473669691422, 3.1849...
4
4
-93.1085
41.3333
POINT (-10364790.809 5061628.338)
207898
50138
78.010282
[50138, 50138, 50138, 50138, 50138, 50138, 501...
[18.31278035974411, 19.967693631899426, 21.309...
5
5
-90.6103
41.5861
POINT (-10086692.457 5099179.465)
519778
52806
10.161212
[52806, 52806, 52806, 52806, 52806, 52806, 528...
[4.242762617075453, 4.870854521824348, 4.99311...
6
6
-95.8953
41.2213
POINT (-10675015.965 5045038.366)
395965
51501
90.983468
[51501, 51501, 51501, 51501, 51501, 51501, 515...
[10.084917943102482, 26.164037229203412, 28.48...
7
7
-92.4293
42.4960
POINT (-10289182.61 5235569.854)
550407
50613
49.319472
[50613, 50613, 50613, 50613, 50613, 50613, 506...
[15.604875075255679, 16.124558588950354, 29.19...
8
8
-93.5519
41.6058
POINT (-10414149.871 5102111.883)
187076
50317
11.959234
[50317, 50317, 50317, 50317, 50317, 50317, 503...
[3.772941024353452, 7.462518447657111, 7.60926...
9
9
-95.0369
40.7294
POINT (-10579459.315 4972509.787)
358873
51632
292.360969
[51632, 51632, 51632, 51632, 51632, 51632, 516...
[27.402712270050465, 30.31954566212826, 35.258...
We confirm that the ZIP Code identified using sjoin_nearest (ZIP5) matches the nearest address ZIP Code for each of the 10 points of interest.
As final diagnostic, we can visualize all the points from the OpenAddresses dataset that map to one of our POIs nearest ZIP Codes, along with the convex hull enclosing those points. This will provide additional context missing from a purely distance based diagnostic. We focus on the point with id = 1, and perform the following steps:
Filter oa to addresses having ZIP Code = 52537.
Combine resulting geometries using .unary_union.
Call .convex_hull to compute the convex hull polygon enclosing the result from the previous step.
Visualize address points, convex hull and original point of interest.
# Filter oa to addresses having ZIP Code = 52537.# Repoject to WGS84 for visualization.oa52537 = oa[oa.ZIP5=="52537"].to_crs("EPSG:4326").reset_index(drop=True)print(f"Addresses in ZIP Code 52537: {len(oa52537):,}")oa52537.head(10)
Addresses in ZIP Code 52537: 1,830
LON
LAT
ZIP5
geometry
0
-92.452221
40.750748
52537
POINT (-92.45222 40.75075)
1
-92.330243
40.803125
52537
POINT (-92.33024 40.80313)
2
-92.561039
40.628394
52537
POINT (-92.56104 40.62839)
3
-92.468560
40.710687
52537
POINT (-92.46856 40.71069)
4
-92.315096
40.747551
52537
POINT (-92.3151 40.74755)
5
-92.219034
40.748971
52537
POINT (-92.21903 40.74897)
6
-92.341592
40.712831
52537
POINT (-92.34159 40.71283)
7
-92.420783
40.702076
52537
POINT (-92.42078 40.70208)
8
-92.221492
40.778673
52537
POINT (-92.22149 40.77867)
9
-92.428911
40.876458
52537
POINT (-92.42891 40.87646)
# Get convex hull for ZIP Code = 52537.cvh = oa52537.unary_union.convex_hull# Get POI lat and lon.poi_y = pois[pois.id==1]["lat"].item()poi_x = pois[pois.id==1]["lon"].item()# Initialize map.f = folium.Figure(width=800, height=525)m = folium.Map(location=[poi_y, poi_x], zoom_start=10).add_to(f)folium.Marker( location=[poi_y, poi_x], icon=folium.Icon(color="red"), popup=f"id=1: {poi_x}{poi_y}" ).add_to(m)# Add address points to map.for point in oa52537.geometry: folium.CircleMarker( location=[point.y, point.x], radius=3.5, color="#4f9de4", fill=True, weight=1, fill_opacity=0.35 ).add_to(m)# Add the convex hull polygon to the map.folium.Polygon( locations=[[lat, lon] for lon, lat in cvh.exterior.coords], color="#184771", fill=False).add_to(m)m
The POI (red marker) is surrounded by many addresses with the same ZIP Code. This provides additional confidence that the assignment is reasonable and relatively accurate.
Conclusion
ZCTAs <> ZIP Codes! They are not one-to-one. Some ZIP Codes don’t have ZCTA equivalents. ZCTAs are statistical areas and do not map exactly to USPS ZIP Codes. ZIp Codes represent collections of delivery routes (not enclosed polygons).
ZCTAs are updated only with each decennial census, while USPS updates ZIP Codes constantly.
ZIP Codes that span multiple counties or regions may be misrepresented in ZCTA form.
Even though the OpenAddresses dataset contains over 500 million addresses, coverage is variable, and is generally better in urban/suburban areas than in rural areas. Some large metropolitan areas may be missing address data altogether.
If you’re building something regulatory or production-facing, you should disclose that the USPS does not publish ZIP Code boundaries, and this method uses nearest inferred ZIP Codes from publicly available data.