"""
Very inefficient GeoHashing subroutine. For demonstration purposes only.
"""
def get_geohash(lat, lon, precision=12):
"""
Generate GeoHash of lat/lon at specified precision.
Parameters
----------
lat: float
Latitude of point of interest.
lon: float
Longitude of point of interest
precision: int
Precision of GeoHash. Higher values result in
smaller bounding regions.
Returns
-------
geohash, bbox as list
"""
= "0123456789bcdefghjkmnpqrstuvwxyz"
base32 = []
bits
= -90, 90
min_lat, max_lat = -180, 180
min_lon, max_lon
for ii in range(5 * precision):
if ii % 2 == 0:
# Bisect longitude (E-W).
= (min_lon + max_lon) / 2
mid_lon if lon >= mid_lon:
1)
bits.append(= mid_lon
min_lon else:
0)
bits.append(= mid_lon
max_lon
else:
# Bisect latitude (N-S).
= (min_lat + max_lat) / 2
mid_lat if lat >= mid_lat:
1)
bits.append(= mid_lat
min_lat else:
0)
bits.append(= mid_lat
max_lat
# Create single bit string from list of 0/1s.
= "".join([str(ii) for ii in bits])
bitstr
# Partition bitstr into groups of 5 bits.
= [bitstr[(5 * ii):(5 * (ii + 1))] for ii in range(precision)]
quints
# Convert binary digits to decimal digits to get indices into base32.
= [int(ii, base=2) for ii in quints]
indices
# Lookup characters associated with each index and concatenate.
= "".join([base32[ii] for ii in indices])
geohash
# GeoHash bounding box is just the final, min_lat, min_lon, max_lat, max_lon.
= [min_lat, min_lon, max_lat, max_lon]
bbox
return(geohash, bbox)
I recently became interested in GeoHashing, and wanted to develop an understanding of the algorithm with the goal of implementing it myself. I was surprised to find it to be quite simple and intuitive. In what follows, I’ll demonstrate how to generate GeoHashes for a given latitude and longitude and compare results against pygeohash
, a Python library for all things GeoHash-related. I’ll also walkthrough an approach that can be used to find the neighboring cells of a given GeoHash, and render them on a map.
A Geohash is a unique identifier of a specific region on the Earth. The basic idea is the Earth gets divided into rectangular regions of user-defined size, and each region is assigned a unique id which is called its Geohash. For a given location on earth, the algorithm converts an arbitrary precision latitude and longitude into a string, the idea being that regions with a similar string prefix will be closer together. Conceptually, GeoHashing reduces proximity search to string prefix matching. As each character encodes additional precision, shared prefixes denote geographic proximity. But the converse is not true: Two points may be close in terms of relative proximity, but have no common GeoHash characters (think of two points on opposite sides of the prime meridian).
Geohashes also provide a degree of anonymity since it isn’t necessary to expose exact GPS coordinates. The location of an entity up to a bounding box cell at a given precision is all that is known.
Before we begin, it is important to note that GeoHash libraries in the wild use a much more efficient generating mechanism than what is presented here. My goal is to demonstrate the concept with maximum clarity as opposed to maximum efficiency. If you decide to use GeoHashing in a real-world application, use an existing library.
Algorithm
A GeoHash is a hierarchical spatial index: In order to represent a point, the world is recursively divided into smaller and smaller grid cells with each additional bit until the desired precision is reached. Functionally, the algorithm works by storing the intermediate results of two binary searches. In the resulting bit string, even-indexed bits represent longitude, while odd-indexed bits represent latitude. The user specifies a level of precision, usually between 1 and 12, and a GeoHash of that length is returned. The table below gives the dimensions of GeoHash cells at each level of precision (taken from here):
Precision Dimension
1: 5,000km x 5,000km
2: 1,250km x 625km
3: 156km x 156km
4: 31.9km x 19.5km
5: 4.89km x 4.89km
6: 1.22km x 0.61km
7: 153m x 153m
8: 38.2m x 19.1m
9: 4.77m x 4.77m
10: 1.19m x 0.596m
11: 149mm x 149mm
12: 37.2mm x 18.6mm
The values in the table represent the maximum dimension at each level of precision: As locations move away from the equator, the dimensions of each cell get smaller.
For a GeoHash of length 12, the coordinate can be found in a cell having dimension 37.2mm x 18.6mm, which is an impressive level of precision for a 12 character string!
The GeoHash symbol map consists of 32 characters:
base32 = "0123456789bcdefghjkmnpqrstuvwxyz"
base32
consists of digits 0 thru 9 plus all lowercase letters excluding a, i, l, o
. As the search space is recursively partitioned, each group of 5 bits, when converted to a decimal integer, maps to one of the values in base32
above.
I’ll present the algorithm first, then walkthrough it step-by-step. What follows is a basic implementation of GeoHashing, which takes as input the latitude and longitude for a point of interest, and returns the GeoHash and bounding box to the specified level of precision:
We define base32
and initialize bits
to hold our 0s and 1s. The initial minimum and maximum latitudes and longitudes define the bounds of the search space.
The range of iteration is specified as range(5 * precision)
. For a GeoHash having precision=6, the bits
list will contain 5 * 6 = 30 values. We multiply by 5 since the base32
character map has length 32, and 2^5 bit arrangements cover each of the 32 indices.
If ii
is even, we bisect longitude. If lon >= mid_lon
we append 1 to bits
and update min_lat
. Otherwise we append 0 and update max_lon
. If ii
is odd, we execute the same logic but for latitude. In this way, the final bit string represents interleaved longitude and latitude bits.
Once iteration completes, the individual elements of bits
are concatenated into a single bit string bitstr
. Next, we partition bitstr
into groups of 5 bits (quints
), which are converted to decimal integers, which are then used as indices to lookup elements from base32
. These characters are concatenated into a single string representing the GeoHash.
To demonstrate, lets find the GeoHash for the Merchandise Mart (41.88876, -87.63516) at precision = 6. Once iteration completes, bits
will consist of 30 1s and 0s:
[0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1]
Which gets combined into a single bit string:
'011001010100011111001001101011'
Partitioned into groups of 5 bits:
['01100', '10101', '00011', '11100', '10011', '01011']
Converted to decimal integers:
[12, 21, 3, 28, 19, 11]
Which serve as indices into base32
, whose corresponding elements and are concatenated into the GeoHash:
'dp3wmc'
Our function also returns the bounding box of the precision = 6 GeoHash:
[41.8853759765625, -87.637939453125, 41.890869140625, -87.626953125]
Let’s compare the results of our implementation against pygeohash to ensure consistency:
import pygeohash
= 41.88876, -87.63516, 6
lat, lon, precision =lat, longitude=lon, precision=precision) pygeohash.encode(latitude
'dp3wmc'
Let’s visualize the precision = 6 GeoHash using folium:
import folium
from folium.features import DivIcon
= 41.88876, -87.63516, 6
lat, lon, precision
# Use our subroutine to get geohash and bounding box.
= get_geohash(lat, lon, precision=precision)
geohash, bbox
# Unpack bbox.
= bbox
min_lat, min_lon, max_lat, max_lon
# Get mid_lat and mid_lon for GeoHash id placement.
= (min_lat + max_lat) / 2
mid_lat = (min_lon + max_lon) / 2
mid_lon
= folium.Map(
m =[lat, lon],
location#width=900,
#height=600,
=16,
zoom_start=True,
zoom_control=True,
no_touch="OpenStreetMap"
tiles
)
# precision = 6 GeoHash bounding box.
folium.Rectangle(
[(min_lat, min_lon), (max_lat, max_lon)], ="blue", fill_opacity=.15
fill_color
).add_to(m)
# Red dot at Merchandise Mart.
folium.CircleMarker(=[lat, lon], radius=5, color="red", fill_color="red",
location=1
fill_opacity
).add_to(m)
# precision = 6 GeoHash id.
map.Marker(
folium.
[mid_lat, mid_lon],=DivIcon(
icon=(250,36),
icon_size=(100,50),
icon_anchor=f'<div style="font-size: 40pt">{geohash}</div>',
html
)
).add_to(m)
m
Identify GeoHash Neighboring Cells
We can use the get_geohash
function to help identify cells that neighbor a given GeoHash. Once the bounding box for the target GeoHash is known we simply increment those coordinates by a small amount, then lookup the GeoHash and bounding box associated with the new coordinate:
= 41.88876, -87.63516, 6
lat, lon, precision
= 1e-10
eps
# Center GeoHash id and bounding box.
= get_geohash(lat, lon, precision=precision)
geohash, bbox
= bbox
min_lat, min_lon, max_lat, max_lon
# Get GeoHash id and bounding box for Northwest cell.
= get_geohash(max_lat + eps, min_lon - eps, precision=precision)
gh_nw, bb_nw
# Get GeoHash id and bounding box for Northeast cell.
= get_geohash(max_lat + eps, max_lon + eps, precision=precision)
gh_ne, bb_ne
# Get GeoHash id and bounding box for Southeast cell.
= get_geohash(min_lat - eps, max_lon + eps, precision=precision)
gh_se, bb_se
# Get GeoHash id and bounding box for Southwest cell.
= get_geohash(min_lat - eps, min_lon - eps, precision=precision) gh_sw, bb_sw
Which gives us the bounding boxes in NW, NE, SE and SW directions. Notice that all 5 boxes share the same prefix out to 4 characters:
Using the results from the NW, NE, SE and SW directions, we obtain the GeoHashes in the N, S, E and W directions:
# For N, get mid point between NE and NW cells.
= bb_nw
min_lat_nw, min_lon_nw, max_lat_nw, max_lon_nw = bb_ne
min_lat_ne, min_lon_ne, max_lat_ne, max_lon_ne = (min_lat_nw + max_lat_nw) / 2
n_lat = (min_lon_nw + max_lon_ne) / 2
n_lon
# For S, get mid point between SE and SW cells.
= bb_sw
min_lat_sw, min_lon_sw, max_lat_sw, max_lon_sw = bb_se
min_lat_se, min_lon_se, max_lat_se, max_lon_se = (min_lat_sw + max_lat_sw) / 2
s_lat = (min_lon_sw + max_lon_se) / 2
s_lon
# For E, get mid point between SE and NE cells.
= bb_ne
min_lat_ne, min_lon_ne, max_lat_ne, max_lon_ne = bb_se
min_lat_se, min_lon_se, max_lat_se, max_lon_se = (max_lat_ne + min_lat_se) / 2
e_lat = (min_lon_se + max_lon_se) / 2
e_lon
# For W, get mid point between SW and NW cells.
= bb_nw
min_lat_nw, min_lon_nw, max_lat_nw, max_lon_nw = bb_sw
min_lat_sw, min_lon_sw, max_lat_sw, max_lon_sw = (max_lat_nw + min_lat_sw) / 2
w_lat = (min_lon_sw + max_lon_sw) / 2 w_lon
Which we can visualize using folium, resulting in:
We can encapsulate this logic within a function we’ll call get_neighbors
:
def get_neighbors(lat, lon, precision=12):
"""
Find 8 adjacent neighbors to GeoHash associated
with (lat, lon) at specified precision.
Parameters
----------
lat: float
Latitude of point of interest.
lon: float
Longitude of point of interest
precision: int
Precision of GeoHash. Higher values result in
smaller bounding regions.
Returns
-------
dict
"""
= {}
dneighbors = 1e-10
eps
# Center GeoHash id and bounding box.
= get_geohash(lat, lon, precision=precision)
geohash, bbox = bbox
min_lat, min_lon, max_lat, max_lon
"center"] = {"geohash": geohash, "bbox": bbox}
dneighbors[
= {
dcorners "nw": {"lat": max_lat + eps, "lon": min_lon - eps},
"ne": {"lat": max_lat + eps, "lon": max_lon + eps},
"sw": {"lat": min_lat - eps, "lon": min_lon - eps},
"se": {"lat": min_lat - eps, "lon": max_lon + eps},
}
for kk in dcorners:
= dcorners[kk]["lat"], dcorners[kk]["lon"]
lat_, lon_ = get_geohash(lat_, lon_, precision=precision)
gh, bb = {"geohash": gh, "bbox": bb}
dneighbors[kk]
# Find GeoHash for N, S, E and W directions.
= dneighbors["nw"]["bbox"]
min_lat_nw, min_lon_nw, max_lat_nw, max_lon_nw = dneighbors["ne"]["bbox"]
min_lat_ne, min_lon_ne, max_lat_ne, max_lon_ne = (min_lat_nw + max_lat_nw) / 2
n_lat = (min_lon_nw + max_lon_ne) / 2
n_lon = get_geohash(n_lat, n_lon, precision=precision)
n_gh, n_bb "n"] = {"geohash": n_gh, "bbox": n_bb}
dneighbors[
= dneighbors["sw"]["bbox"]
min_lat_sw, min_lon_sw, max_lat_sw, max_lon_sw = dneighbors["se"]["bbox"]
min_lat_se, min_lon_se, max_lat_se, max_lon_se = (min_lat_sw + max_lat_sw) / 2
s_lat = (min_lon_sw + max_lon_se) / 2
s_lon = get_geohash(s_lat, s_lon, precision=precision)
s_gh, s_bb "s"] = {"geohash": s_gh, "bbox": s_bb}
dneighbors[
= dneighbors["ne"]["bbox"]
min_lat_ne, min_lon_ne, max_lat_ne, max_lon_ne = dneighbors["se"]["bbox"]
min_lat_se, min_lon_se, max_lat_se, max_lon_se = (max_lat_ne + min_lat_se) / 2
e_lat = (min_lon_se + max_lon_se) / 2
e_lon = get_geohash(e_lat, e_lon, precision=precision)
e_gh, e_bb "e"] = {"geohash": e_gh, "bbox": e_bb}
dneighbors[
= dneighbors["nw"]["bbox"]
min_lat_nw, min_lon_nw, max_lat_nw, max_lon_nw = dneighbors["sw"]["bbox"]
min_lat_sw, min_lon_sw, max_lat_sw, max_lon_sw = (max_lat_nw + min_lat_sw) / 2
w_lat = (min_lon_sw + max_lon_sw) / 2
w_lon = get_geohash(w_lat, w_lon, precision=precision)
w_gh, w_bb "w"] = {"geohash": w_gh, "bbox": w_bb}
dneighbors[
return(dneighbors)
Running get_neighbors
and iterating over the results yields:
= 41.88876, -87.63516, 6
lat, lon, precision
= get_neighbors(lat, lon, precision=precision)
dn
for kk in dn:
= dn[kk]["geohash"], dn[kk]["bbox"]
gh, bb print(f"{[kk]}: geohash={gh}, bbox={bb}")
['center']: geohash=dp3wmc, bbox=[41.8853759765625, -87.637939453125, 41.890869140625, -87.626953125]
['nw']: geohash=dp3wmd, bbox=[41.890869140625, -87.64892578125, 41.8963623046875, -87.637939453125]
['ne']: geohash=dp3wq4, bbox=[41.890869140625, -87.626953125, 41.8963623046875, -87.615966796875]
['sw']: geohash=dp3wm8, bbox=[41.8798828125, -87.64892578125, 41.8853759765625, -87.637939453125]
['se']: geohash=dp3wq0, bbox=[41.8798828125, -87.626953125, 41.8853759765625, -87.615966796875]
['n']: geohash=dp3wmf, bbox=[41.890869140625, -87.637939453125, 41.8963623046875, -87.626953125]
['s']: geohash=dp3wmb, bbox=[41.8798828125, -87.637939453125, 41.8853759765625, -87.626953125]
['e']: geohash=dp3wq1, bbox=[41.8853759765625, -87.626953125, 41.890869140625, -87.615966796875]
['w']: geohash=dp3wm9, bbox=[41.8853759765625, -87.64892578125, 41.890869140625, -87.637939453125]
Finally, we demonstrate how to recreate the previous figure (target GeoHash cell with eight adjacent neighbors):
import folium
from folium.features import DivIcon
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
= 41.88876, -87.63516, 6
lat, lon, precision
# Get GeoHashes and bounding boxes.
= get_neighbors(lat, lon, precision=6)
dn
# Separate color for each cell.
= mpl.colormaps["gist_rainbow"]
fcolors = [fcolors(ii) for ii in np.linspace(0, 1, len(dn))]
colors_rgba = [mpl.colors.to_hex(ii, keep_alpha=False) for ii in colors_rgba]
colors_hex
= folium.Map(
m =[lat, lon],
location# width=900,
# height=600,
=16,
zoom_start=True,
zoom_control=True,
no_touch="OpenStreetMap"
tiles
)
for ii, kk in enumerate(dn.keys()):
= colors_hex[ii]
color = dn[kk]["geohash"], dn[kk]["bbox"]
geohash, bbox = bbox
min_lat, min_lon, max_lat, max_lon = (min_lat + max_lat) / 2
mid_lat = (min_lon + max_lon) / 2
mid_lon
# GeoHash bounding box.
folium.Rectangle(
[(min_lat, min_lon), (max_lat, max_lon)], =color, fill_opacity=.25
fill_color
).add_to(m)
map.Marker(
folium.
[mid_lat, mid_lon],=DivIcon(
icon=(250,36),
icon_size=(75,25),
icon_anchor=f'<div style="font-size: 30pt">{geohash}</div>',
html
)
).add_to(m)
m