home 🗺

How to get geo coordinates for POIs and show them on openstreetmap in 4 easy selfhost variants:

by reinhard@finalmedia.de

2015/07/06, Update:2020/02/02, Update:2023/08/05, Update:2023/09/03

There are 4 variants right now. I recommend the last one (d)

a) without need for any database backend (no cgi, php or any other external database service) just by using special symlinks and gatling http daemon - or
b) alternative minimal serverside solution: djb cdb file, ucspi-tcp and daemontools - or
c) djb cdb file and clientside javascript implementation using cdb files from static file webservers that supports http range requests.
d) xxhash64 based folder structure. clientside javascript implementation. Usable with any static file webserver without need for range request support.

A) Using gatling and symlinks

This is a quite hackish solution for this problem.

Why hackish?

Sure, I could also choose the boring way and use plain javascript, a json file for every country and loading POI data with ajax on demand. So you had to specify the target country first in searchform (or filter it), get your valid json or geojson file with ajax from server, parse for coordinates. you can handle the whole thing in javascript, e voila. done. but... this is boring and that's not, what I want.

I want a solution additionally satisfying the following needs:

scalable, proven, oldschool, tiny and stable
no cgi, php or any database-server in background
fast, so the client has not to download lots of json files in the size of megabytes for each requested country. I also want to have a low memory footprint on client side (browser).
easy to maintain! So I can add or remove any POI with a single command, without editing or redistributing any changed json files. I don't want to use any texteditor or special utility to add, modify oder delete any POIs. Maintaining has to be possible out-of-the-box in any linux environment without installing any applications.
a solution with fallback, so I also can resolve names to coordinates without the use of javascript. I still want to get coordinates by using the remote service in a console based browser like links or w3m without javascript and also without parsing json files at commandline by hand.
a solution also usable without any webserver by directly accessing data in filesystem and if no daemons are running.

So here is a solution.

See a live Demo in Action: https://finalmedia.de/cdn/osterbruecken.de/ostermap/index.htm (german).

How it is done?

First I fetched a dump of the geonames database. For the this testing case I just needed data for region germany, so I fetched this file http://download.geonames.org/export/dump/DE.zip

I extracted the file DE.txt out of it, parsed the tab separated file (tsv) with tr and cut (or you can use awk if you like it) and used grep for getting all POIs, marked with ";P"

I reduced the charset and transformed them to lowercase, just allowing the following characters:

	a-zöäü ß .-

you can use the following chain to do that

	tr "\t" ";" < DE.txt | cut -d";" -f2,5,6,7 | grep ";P" |\ 
	tr -d "," | cut -d";" -f1,2,3 | tr ";" "," |\
	tr -dc "0-9a-zA-ZöäüÖÄÜß\n ,.-" | tr "A-ZÖÄÜ" "a-zöäü" > cities.txt

this will export all lines to a new file, called cities.txt based on the following format:

	city,lat,lon

UPDATE: The database of geonames.org was not very satisfying. So I used official openstreetmap database dumps in OpenStreetMap Protocolbuffer Binary Format, available from http://download.geofabrik.de, in this case germany-latest.osm.pbf (>3.8 GB) (uncompressed around 60GB) and extract all cities or streetnames out of it. Use osmconvert.c (local mirror: osmconvert.c) from the toolset of osmconvert for extracting data: (hint: build a 64bit executable and use a machine with a lot of RAM for this! processing the dataset germany-latest.osm will need about 14 GB of RAM on your machine and it will take some hours to finish). If you have not enough RAM be sure to add a sufficient swap file before!

(Optional: Generate large 40GB swapfile as root, if you have not enough RAM)

	dd if=/dev/zero bs=1M count=40000 of=/swap.40gb.img status=progress
	chmod 600 /swap.40gb.img
	mkswap /swap.40gb.img
	echo "/swap.40gb.img none swap sw 0 0" > /etc/fstab
	swapon -a

Extract Data (all Cities) from pbf file

	.//osmconvert germany-latest.osm.pbf --max-objects=900000000 --all-to-nodes \
	--csv="name @lat @lon" --csv-separator="," | grep -v -E "^," > cities.txt

process your cities.txt and sort out all duplicate names (its a quick hack, perhaps i will rename those in an improved version later)

	sort -k1 -t, cities.txt | uniq > uniq_cities.txt

all cities are stored in file uniq_cities.txt now - line by line with its coordinates like this:

	zwötzen,50.84858,12.08635

then I wrote a small script, that reads those lines and makes lots of broken symlinks out of it, just putting them into a folder called "search".

	#!/bin/bash
	mkdir -p search
	cat uniq_cities.txt | while read line
	do
	url="http://osm.org/#map=/`echo $line| tr -dc "0-9.,-" | cut -d"," -f2,3| tr "," "/"`"
	symlink="search/`echo $line | cut -d"," -f1 | tr -dc "a-zA-ZöäüÖÄÜß. -" | tr "A-ZÖÄÜ" "a-zöäü"`"
	ln -s "$url" "$symlink"
	done

the name of the broken symlink is the name of the city and the symlink points to an URL like this

	http://osm.org/#map=/

with the given coordinates of the city.

sure, this is just an example. you can use your own tileserver and your own map, like I did in the demo "Ostermap" mentioned before.

In this way you'll get a lot of broken symlinks like these:

...
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ührde -> http://osm.org/#map=/51.70547/10.20814
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhrendorf -> http://osm.org/#map=/53.86275/9.41756
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhrsleben -> http://osm.org/#map=/52.20087/11.26443
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhry -> http://osm.org/#map=52.29693/10.85758
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhsmannsdorf -> http://osm.org/#map=51.33048/14.90316
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhyst -> http://osm.org/#map=51.36469/14.506
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uhyst am taucher -> http://osm.org/#map=51.19249/14.21843
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uichteritz -> http://osm.org/#map=51.20652/11.92215
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uiffingen -> http://osm.org/#map=49.5024/9.59269
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uigenau -> http://osm.org/#map=49.31204/11.01731
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uigendorf -> http://osm.org/#map=48.18048/9.57969
lrwxrwxrwx 1 user group  23 Jun 23 23:00 uissigheim -> http://osm.org/#map=49.67984/9.57134
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ulbargen -> http://osm.org/#map=53.37535/7.58291
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ulbering -> http://osm.org/#map=48.35362/13.01465
lrwxrwxrwx 1 user group  23 Jun 23 23:00 ulberndorf -> http://osm.org/#map=50.87472/13.67231
...

Why those broken symlinks?

Now, I can use those broken symlinks with gatling httpd, a tiny and really fast httpd server by Felix von Leitner.

I already used gatling for leaflet, my own maps and tiles I made with glosm.

You can find the project "Ostermap" right here. I just rendered the map for Saarland.

Why does this work?

Gatling recognizes broken symlinks. If they contain "://" ( like in "http://" or "https://" it will make a valid http redirect out of it and redirect your browser to the given URL. This is a really nice feature. Thanks, fefe. In this way it will redirect any name of the given city to leaflet or openstreetmap with the given coordinates.

So I can start a locally listing gatling and enter the following url in a browser

	http://127.0.0.1/search/ulbargen

to get the geo-location of the city ulbargen and directly show it on the map.

furthermore, you can additionally supply a minimal vanilla javascript, that gets the value of an input-box, transforms given text to lowercase and calls the url by rewriting window.location as described above to subfolder "search/ulbargen". this tiny index.htm would do the magic:

	<html>
	<input title="please enter the name of the point of interest" id="name" value="ulbargen">
	<input type="button" value="suche" onclick="window.location+='search/'+document.getElementById('name').value.toLowerCase()">
	</html>

If any invalid POI name is entered, gatling just responses with 404 file not found. you can write additionally a ajax-script, catching 404 response and write something like "sorry, POI not found. please retry". Or specify your special 404 error page.

Ok. I got it. But why using broken symlinks and not just regular files and get them with javascript?

First: By storing the geo-information in an broken symlink i can implement a very compact storage of those coordinates without limitations of the underlaying filesystem or defined blocksize for a single regular file.

When you try to store the coordinates in regular files, also named by the name of the city - this is not very efficent: The whole "database" of this example would need over 240MB in total, since every file is about 4k on your storage. Even if it just contains those few bytes for the coordinates, every regular file would have a file size about 4096 bytes on your drive (because of block size) (see wikipedia if you want to know more about this). So lots of small regular files would waste a lot of storage capacity.

If you don't believe, just have a look at such files and compare the size with the the following commands

	echo hello > regular_file
	ls -slh1 regular_file
	stat regular_file
	du -hcs regular_file

	ln -s "hello again" symlink_file
	ls -slh1 symlink_file
	stat symlink_file
	du -hcs regular_file

Sure, you can change blocksize of your filesystem by reformating the blockdevice or possibly do some tweaks with tune2fs. But even then the minimal blocksize of ext3 would be around 512 bytes and these changes would be no out-of-the-box solution and could lead to disadvantages of other services on your system.

you will find some more information about this topic here, here and here.

When using symlinks, the whole "database" its just about 1.8 MB in total, since each symlink and inode just needs those 128 bytes in this case. It won't get "blown up" to 4k by the specified minium block size of the underlying filesystem.

now you can also make a tarball out of the folder for distributing the "database". the xz tarball is about 1.1 MB then.

you simply can add a new POI by doing this

	ln -s "http://osm.org/#map=/49.49361/7.26694" "osterbrücken"

and remove it, just by deleting the symlink

	rm osterbrücken

you also can specify zoom-level for individual POIs, if you want to. just use:

	ln -s "http://osm.org/#map=14/49.49361/7.26694" "osterbrücken"

Improvements

you could also distribute street-names in this way, for example by making cities as subfolders and put street POIs as symlinks. this would work with build in directory indexing of gatling.

since there are no unique city names, i should also consider to generate folders for duplicate names and then put each symlink in this folder.

Alternatives

Have a look at rfc5870 which describes the URL Scheme for Geo-Coordinates. Its a A Uniform Resource Identifier for Geographic Locations, in WGS-84 (World Geodetic System). But than you have to evaluate this URL at your client application. Also since gatling awaits "://" and the geo-url just is "geo:74.4294,19.0245", you wont get a successful redirect. you would have to change sourcecode of gatling in http.c for parsing this correctly.

Downloads

You can fetch my pois.de.txt.xz (51MB) with 4.919.091 entries in format "name,lat,lon". The dataset is based on extraction of openstreetmap database dump (20150701), so licenced under Open Data Commons Open Database Lizenz (ODbL), and so copyright by © OpenStreetMap contributors

B) + C) Usage of cdb

UPDATE (20200206):

Here is a Version of current POI Names Dataset as cdb database:

It's my diffrent approach, and won't need gatling and broken symlinks anymore.

This is the concept: we generate a (sorted) djb cdb (Constant Database) for key value search and do search requests. You can generate multiple cdb files, containing cities, streets, pois etc.

djb cdb is a simple, minimalistic constant key/value database file with fast hashtable lookup, created by Dan J. Bernstein.

How to build cdb Databases

you can build, query and serve them with native cdb implementatation, daemontools, and dash:

You will need libcdb and tcpserver ucspi-tcp from djb. In Debian you can install them in this way:

apt-get install libcdb1 pv ucspi-tcp dos2unix

You will need actual osm dumps in pbf file format (already mentioned above), you can get them here and here and you need osmconvert.c to read and convert the pbf files. Then you can generate the cdb files with this script:

#!/bin/bash
# make geonames cdb file from all your current pbf osm export files in current
# working directory. need osmconvert, libcdb and pv
find . -name "*.pbf" -exec sh -c 'osmconvert "{}" --max-objects=90000000000 \
--all-to-nodes --csv="name @lat @lon" --csv-separator="," | grep -v -E "^," | \
tr -d "\\t:> " | tr "," " " | pv -N "{}" -l | cdb -m -c "{}.cdb"' "{}" \;

As you can see, I choose to delete spaces. So you have to search for "NeuerWeg" instead of "Neuer Weg".

The following scripts are your webserver. so you won't need gatling, just tcpserver from ucspi-tcp and cdb utils.

stdin input is restricted to a-z A-Z 0-9 - so, sorry no Umlaute, yet. todo: add urldecode, and öäüÖÄÜß in stdin truncate delete complement.

Remember: To safe space in database, all spaces are truncated. So search for "NeuerWeg" instead of "Neuer Weg"

Now we have to Variants: The old one.. a classical serverside backend. And the new one, which is much more interesting: a clientside Implementation, using http range request on the cdb file to query the cdb file from any static webserver.

B) Build Simple serverside cdb Backend. No need for special client.

Whole Germany

Use with dash not with bash.

#!/bin/dash
# germany_serv.sh
# name to coord list
echo "HTTP/1.4 200 OK"
echo "Content-Type: text/plain"
echo
cdb -m -q germany.20200209.cdb \
"$(timeout 2 head -n1|head -c 128|cut -d " " -f 2|tr -dc "0-9a-zA-Z-")" || echo "no result"
exit 0

Just Saarland (with redirect header to online map tiles)

Use with dash not with bash.

#!/bin/dash
# saar_serv.sh redirect
echo "HTTP/1.1 302 Moved Temporarily"
echo "Content-Length: 0"
echo -n "Location: https://osterbruecken.de/ostermap/?pos="
cdb -m -q saarland.20191123.cdb \
"$(timeout 2 head -n1|head -c 64|cut -d " " -f 2|tr -dc "0-9a-zA-Z-")" | \
head -n1 | tr -dc "0-9. " | tr " " ","
echo
exit 0

And the wrapper script you can use with daemontools.

#!/bin/sh
# start_saarserv.sh (tcpwrapper)
# run as restricted user!!! (setuidgid)
ulimit 12000
exec setuidgid nobody tcpserver -R -H -D -c 40000 127.0.0.1 8000 recordio ./saar_serv.sh

Connect Benchmark TestScript (uses http@ client from ucspi-tcp)

ulimit 64000; yes | xargs -P 0 sh -c "http@ 127.0.0.1 NeuerWeg 8000"

Your Example Browser-Request would be http://127.0.0.1:8000/NeuerWeg

C) (WORK IN PROGESS) a port of cdb to Javascript. Clientside only. No need for special server. But webserver needs to support http byte range requests.

Update: Sat 05 Aug 2023 12:54:20 PM CEST

Here ist my Implementation of a clientside CDB Implementation in Javascript. So you can place a static cdb file on your static webserver, and doing HTTP Range Requests on this cdb, without downloading the whole file.

You can also use my cdb_generator (made with javascript) to generate a cdb file from simple txt files, containing key->value line by line.

But this one is more interesting: My cdb_find (made with javascript and http range requests) to request the cdb file from server, without downloading the whole cdb file.

This is also useful for very large cdb Files.

Important Hint: You have to enable RangeRequests on your Webserver (e.g. apache) in your vhost, to make this work

MaxRanges unlimited
MaxRangeReversals unlimited
MaxRangeOverlaps unlimited

MaxRanges 2000
MaxRangeReversals 2000
MaxRangeOverlaps 2000

How it works

HTTP Range Requests allow, that your Browser can request a specific byte range out of large file and wont need to download the whole file at once. It's just specified in the Header, when doing the Request.

Since we have a well sorted and structured cdb file, we can fetch the wohle cdb index (only first 2048 bytes), then perform a hashtable lookup, calculate position and fetch further bytes from file to the gain final result(s). Since our cdb file is already sorted, we can expect a reasonable performance.

By downloading the whole database you can build a offline version, which is quite nice. To replicate the database to different servers, you can use rsync an --inplace on a remote copy, the an atomic move in remote filesystem.

Keep in mind, that a classic djb cdb file with 32bit has a maximal filesize of 4GB.

D) (RECOMMENDED) xxhash64 based folder structure. Client Side only. No need for special server.

We generate a xxhash64 for each search term. We devide this hash into folders and subfolders. we place a symlink in the last subfolder to itsself, but encoding the geo coordinate in the name of the symlink. a simple javascript clientside hashing and lookup leads us to expected search results.

we add any point (example finalmedia) like this:

xxhash64 of string "finalmedia" results in "1ffffb19962d5e48", so we do

apt-get install xxhash
echo -n finalmedia | xxh64sum  | tr -dc "0-9a-f" | fold -w 2 | tr "\n" ""
# results in 1f/ff/fb/19/96/2d/5e/48/

mkdir -p search/1f/ff/fb/19/96/2d/5e/48
cd search/1f/ff/fb/19/96/2d/5e/48
ln -s .. 49.4945,7.2669
ln -s .. 49.4953,7.2683

So our symlink filenames are give coords and the target of the link is just the folder itself. Therefore its also possible to set multiple coords to same searchterm, just by adding more symlinks in the same folder to different coords!

Follow this link to try out my implementation. Hint: Just change the searchterm to the string: "Finalmedia"

Have a look at the sourcecode of the page, to find out how it works.

Additional Stuff

german postcodes (plz) to geo coordinates: plz2geo.txt

RFC 1876 (January 1996) mentions DNS LOC Records. so you can check dig dkdhr.com LOX +short -> 42 21 43.528 N 71 5 6.284 W -25.00m 1m 3000m 10m.

its easy to set the plz2geo them in nameserver: example: 66606.plz.cafeface.de to get -> 7.18821 N 49.46627 E. It has to get encoded to LOC Format:


   This RFC defines the format of a new Resource Record (RR) for the
   Domain Name System (DNS), and reserves a corresponding DNS type
   mnemonic (LOC) and numerical code (29).

2. RDATA Format

       MSB                                           LSB
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
      0|        VERSION        |         SIZE          |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
      2|       HORIZ PRE       |       VERT PRE        |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
      4|                   LATITUDE                    |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
      6|                   LATITUDE                    |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
      8|                   LONGITUDE                   |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     10|                   LONGITUDE                   |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     12|                   ALTITUDE                    |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
     14|                   ALTITUDE                    |
       +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   (octet)

where:

VERSION      Version number of the representation.  This must be zero.
             Implementations are required to check this field and make
             no assumptions about the format of unrecognized versions.

SIZE         The diameter of a sphere enclosing the described entity, in
             centimeters, expressed as a pair of four-bit unsigned
             integers, each ranging from zero to nine, with the most
             significant four bits representing the base and the second
             number representing the power of ten by which to multiply
             the base.  This allows sizes from 0e0 (<1cm) to 9e9
             (90,000km) to be expressed.  This representation was chosen
             such that the hexadecimal representation can be read by
             eye; 0x15 = 1e5.  Four-bit values greater than 9 are
             undefined, as are values with a base of zero and a non-zero
             exponent.

             Since 20000000m (represented by the value 0x29) is greater
             than the equatorial diameter of the WGS 84 ellipsoid
             (12756274m), it is therefore suitable for use as a
             "worldwide" size.

HORIZ PRE    The horizontal precision of the data, in centimeters,
             expressed using the same representation as SIZE.  This is
             the diameter of the horizontal "circle of error", rather


VERT PRE     The vertical precision of the data, in centimeters,
             expressed using the sane representation as for SIZE.  This
             is the total potential vertical error, rather than a "plus
             or minus" value.  (This was chosen to match the
             interpretation of SIZE; to get a "plus or minus" value,
             divide by 2.)  Note that if altitude above or below sea
             level is used as an approximation for altitude relative to
             the [WGS 84] ellipsoid, the precision value should be
             adjusted.

LATITUDE     The latitude of the center of the sphere described by the
             SIZE field, expressed as a 32-bit integer, most significant
             octet first (network standard byte order), in thousandths
             of a second of arc.  2^31 represents the equator; numbers
             above that are north latitude.

LONGITUDE    The longitude of the center of the sphere described by the
             SIZE field, expressed as a 32-bit integer, most significant
             octet first (network standard byte order), in thousandths
             of a second of arc, rounded away from the prime meridian.
             2^31 represents the prime meridian; numbers above that are
             east longitude.

ALTITUDE     The altitude of the center of the sphere described by the
             SIZE field, expressed as a 32-bit integer, most significant
             octet first (network standard byte order), in centimeters,
             from a base of 100,000m below the [WGS 84] reference
             spheroid used by GPS (semimajor axis a=6378137.0,
             reciprocal flattening rf=298.257223563).  Altitude above
             (or below) sea level may be used as an approximation of
             altitude relative to the the [WGS 84] spheroid, though due
             to the Earth's surface not being a perfect spheroid, there
             will be differences.  (For example, the geoid (which sea
             level approximates) for the continental US ranges from 10
             meters to 50 meters below the [WGS 84] spheroid.
             Adjustments to ALTITUDE and/or VERT PRE will be necessary
             in most cases.  The Defense Mapping Agency publishes geoid
             height values relative to the [WGS 84] ellipsoid.


      LOC ( d1 [m1 [s1]] {"N"|"S"} d2 [m2 [s2]]
                               {"E"|"W"} alt["m"] [siz["m"] [hp["m"]
                               [vp["m"]]]] )

   (The parentheses are used for multi-line data as specified in [RFC
   1035] section 5.1.)

   where:

       d1:     [0 .. 90]            (degrees latitude)
       d2:     [0 .. 180]           (degrees longitude)
       m1, m2: [0 .. 59]            (minutes latitude/longitude)
       s1, s2: [0 .. 59.999]        (seconds latitude/longitude)
       alt:    [-100000.00 .. 42849672.95] BY .01 (altitude in meters)
       siz, hp, vp: [0 .. 90000000.00] (size/precision in meters)