Geo features
Haversine distances to anchor points, hand-rolled geohash encoding, and rotated coordinates.
featurely.geo
Location-based feature encodings for latitude and longitude.
These helpers turn raw coordinates into representations a linear model can use: distances to fixed anchor points, discrete spatial cells, and rotated axes. Raw latitude and longitude only let a linear model fit a single plane over the map; these encodings expose distance decay and neighborhood structure that the plane cannot capture.
haversine_distance(lat1, lon1, lat2, lon2)
Great-circle distance in kilometers between coordinate pairs.
The haversine formula treats Earth as a sphere, which is accurate to roughly 0.5 percent; plenty for feature engineering distances.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lat1
|
float | ndarray | Series
|
Latitude of the first point; scalar or array-like, degrees. |
required |
lon1
|
float | ndarray | Series
|
Longitude of the first point; scalar or array-like, degrees. |
required |
lat2
|
float | ndarray | Series
|
Latitude of the second point; scalar or array-like, degrees. |
required |
lon2
|
float | ndarray | Series
|
Longitude of the second point; scalar or array-like, degrees. |
required |
Returns:
| Type | Description |
|---|---|
float | ndarray
|
Distance in kilometers, matching the broadcast shape of the inputs. |
Source code in src/featurely/geo.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
compute_city_distances(df, cities, lat_col='Latitude', lon_col='Longitude')
Return distance-to-anchor candidate features in kilometers.
One column per anchor point plus dist_nearest_city, which collapses the set into a single proximity measure. Many spatial outcomes decay with distance from activity centers, a pattern raw coordinates cannot express linearly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input frame with coordinate columns. |
required |
cities
|
dict[str, tuple[float, float]]
|
Mapping of anchor name to (latitude, longitude). Column
names follow the pattern |
required |
lat_col
|
str
|
Name of the latitude column. |
'Latitude'
|
lon_col
|
str
|
Name of the longitude column. |
'Longitude'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A frame of distance columns plus |
Source code in src/featurely/geo.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |
encode_geohash(lat, lon, precision=4)
Encode one coordinate pair as a geohash string.
Geohashing interleaves bits from successive binary subdivisions of the longitude and latitude ranges, then packs each group of 5 bits into a base32 character. Nearby points usually share a prefix, so shorter hashes give coarser spatial cells: precision 4 cells are roughly 39 km by 19.5 km.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lat
|
float
|
Latitude in degrees. |
required |
lon
|
float
|
Longitude in degrees. |
required |
precision
|
int
|
Number of base32 characters in the hash. |
4
|
Returns:
| Type | Description |
|---|---|
str
|
The geohash string. |
Source code in src/featurely/geo.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |
compute_geohash_cells(df, precision=4, min_cell_count=100, lat_col='Latitude', lon_col='Longitude')
Return one-hot geohash cell membership candidates.
Cells with fewer than min_cell_count rows are pooled into a shared "other" bucket so the linear model does not fit dummy coefficients to nearly empty cells. Membership indicators are target-free, so there is no leakage risk from this encoding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input frame with coordinate columns. |
required |
precision
|
int
|
Geohash length; higher values give smaller cells. |
4
|
min_cell_count
|
int
|
Minimum rows per cell before pooling into "other". |
100
|
lat_col
|
str
|
Name of the latitude column. |
'Latitude'
|
lon_col
|
str
|
Name of the longitude column. |
'Longitude'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A frame of one-hot indicator columns named |
Source code in src/featurely/geo.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
compute_rotated_coordinates(df, angle_deg, lat_col='Latitude', lon_col='Longitude')
Return coordinate axes rotated by an arbitrary angle.
Raw latitude and longitude only let a linear model fit gradients that run north-south or east-west. Rotating the frame exposes gradients that run diagonally across the map, for example along a coastline or a mountain range. A 45 degree rotation reproduces the classic sum-and-difference encoding up to scale.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input frame with coordinate columns. |
required |
angle_deg
|
float
|
Rotation angle in degrees, counterclockwise. Column names
embed the angle, e.g. |
required |
lat_col
|
str
|
Name of the latitude column. |
'Latitude'
|
lon_col
|
str
|
Name of the longitude column. |
'Longitude'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A frame with the two rotated coordinate columns. |
Source code in src/featurely/geo.py
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | |