This is a quite hackish solution for this problem.
Sure, I could also choose the boring way and use plain javascript, a json file for every country and loading POI data with ajax on demand. So you had to specify the target country first in searchform (or filter it), get your valid json or geojson file with ajax from server, parse for coordinates. you can handle the whole thing in javascript, e voila. done. but... this is boring and that's not, what I want.
I want a solution additionally satisfying the following needs:
See a live Demo in Action: https://finalmedia.de/cdn/osterbruecken.de/ostermap/index.htm (german).
First I fetched a dump of the geonames database. For the this testing case I just needed data for region germany, so I fetched this file http://download.geonames.org/export/dump/DE.zip
I extracted the file DE.txt out of it, parsed the tab separated file (tsv) with tr and cut (or you can use awk if you like it) and used grep for getting all POIs, marked with ";P"
I reduced the charset and transformed them to lowercase, just allowing the following characters:
a-zöäü ß .-
you can use the following chain to do that
tr "\t" ";" < DE.txt | cut -d";" -f2,5,6,7 | grep ";P" |\ tr -d "," | cut -d";" -f1,2,3 | tr ";" "," |\ tr -dc "0-9a-zA-ZöäüÖÄÜß\n ,.-" | tr "A-ZÖÄÜ" "a-zöäü" > cities.txt
this will export all lines to a new file, called cities.txt based on the following format:
city,lat,lon
UPDATE: The database of geonames.org was not very satisfying. So I used official openstreetmap database dumps in OpenStreetMap Protocolbuffer Binary Format, available from http://download.geofabrik.de, in this case germany-latest.osm.pbf (>3.8 GB) (uncompressed around 60GB) and extract all cities or streetnames out of it. Use osmconvert.c (local mirror: osmconvert.c) from the toolset of osmconvert for extracting data: (hint: build a 64bit executable and use a machine with a lot of RAM for this! processing the dataset germany-latest.osm will need about 14 GB of RAM on your machine and it will take some hours to finish). If you have not enough RAM be sure to add a sufficient swap file before!
dd if=/dev/zero bs=1M count=40000 of=/swap.40gb.img status=progress chmod 600 /swap.40gb.img mkswap /swap.40gb.img echo "/swap.40gb.img none swap sw 0 0" > /etc/fstab swapon -a
.//osmconvert germany-latest.osm.pbf --max-objects=900000000 --all-to-nodes \ --csv="name @lat @lon" --csv-separator="," | grep -v -E "^," > cities.txt
process your cities.txt and sort out all duplicate names (its a quick hack, perhaps i will rename those in an improved version later)
sort -k1 -t, cities.txt | uniq > uniq_cities.txtall cities are stored in file uniq_cities.txt now - line by line with its coordinates like this:
zwötzen,50.84858,12.08635
then I wrote a small script, that reads those lines and makes lots of broken symlinks out of it, just putting them into a folder called "search".
#!/bin/bash mkdir -p search cat uniq_cities.txt | while read line do url="http://osm.org/#map=/`echo $line| tr -dc "0-9.,-" | cut -d"," -f2,3| tr "," "/"`" symlink="search/`echo $line | cut -d"," -f1 | tr -dc "a-zA-ZöäüÖÄÜß. -" | tr "A-ZÖÄÜ" "a-zöäü"`" ln -s "$url" "$symlink" done
the name of the broken symlink is the name of the city and the symlink points to an URL like this
http://osm.org/#map=/
with the given coordinates of the city.
sure, this is just an example. you can use your own tileserver and your own map, like I did in the demo "Ostermap" mentioned before.
In this way you'll get a lot of broken symlinks like these:
... lrwxrwxrwx 1 user group 23 Jun 23 23:00 ührde -> http://osm.org/#map=/51.70547/10.20814 lrwxrwxrwx 1 user group 23 Jun 23 23:00 uhrendorf -> http://osm.org/#map=/53.86275/9.41756 lrwxrwxrwx 1 user group 23 Jun 23 23:00 uhrsleben -> http://osm.org/#map=/52.20087/11.26443 lrwxrwxrwx 1 user group 23 Jun 23 23:00 uhry -> http://osm.org/#map=52.29693/10.85758 lrwxrwxrwx 1 user group 23 Jun 23 23:00 uhsmannsdorf -> http://osm.org/#map=51.33048/14.90316 lrwxrwxrwx 1 user group 23 Jun 23 23:00 uhyst -> http://osm.org/#map=51.36469/14.506 lrwxrwxrwx 1 user group 23 Jun 23 23:00 uhyst am taucher -> http://osm.org/#map=51.19249/14.21843 lrwxrwxrwx 1 user group 23 Jun 23 23:00 uichteritz -> http://osm.org/#map=51.20652/11.92215 lrwxrwxrwx 1 user group 23 Jun 23 23:00 uiffingen -> http://osm.org/#map=49.5024/9.59269 lrwxrwxrwx 1 user group 23 Jun 23 23:00 uigenau -> http://osm.org/#map=49.31204/11.01731 lrwxrwxrwx 1 user group 23 Jun 23 23:00 uigendorf -> http://osm.org/#map=48.18048/9.57969 lrwxrwxrwx 1 user group 23 Jun 23 23:00 uissigheim -> http://osm.org/#map=49.67984/9.57134 lrwxrwxrwx 1 user group 23 Jun 23 23:00 ulbargen -> http://osm.org/#map=53.37535/7.58291 lrwxrwxrwx 1 user group 23 Jun 23 23:00 ulbering -> http://osm.org/#map=48.35362/13.01465 lrwxrwxrwx 1 user group 23 Jun 23 23:00 ulberndorf -> http://osm.org/#map=50.87472/13.67231 ...
Now, I can use those broken symlinks with gatling httpd, a tiny and really fast httpd server by Felix von Leitner.
I already used gatling for leaflet, my own maps and tiles I made with glosm.
You can find the project "Ostermap" right here. I just rendered the map for Saarland.
Gatling recognizes broken symlinks. If they contain "://" ( like in "http://" or "https://" it will make a valid http redirect out of it and redirect your browser to the given URL. This is a really nice feature. Thanks, fefe. In this way it will redirect any name of the given city to leaflet or openstreetmap with the given coordinates.
So I can start a locally listing gatling and enter the following url in a browser
http://127.0.0.1/search/ulbargen
to get the geo-location of the city ulbargen and directly show it on the map.
furthermore, you can additionally supply a minimal vanilla javascript, that gets the value of an input-box, transforms given text to lowercase and calls the url by rewriting window.location as described above to subfolder "search/ulbargen". this tiny index.htm would do the magic:
<html> <input title="please enter the name of the point of interest" id="name" value="ulbargen"> <input type="button" value="suche" onclick="window.location+='search/'+document.getElementById('name').value.toLowerCase()"> </html>
If any invalid POI name is entered, gatling just responses with 404 file not found. you can write additionally a ajax-script, catching 404 response and write something like "sorry, POI not found. please retry". Or specify your special 404 error page.
First: By storing the geo-information in an broken symlink i can implement a very compact storage of those coordinates without limitations of the underlaying filesystem or defined blocksize for a single regular file.
When you try to store the coordinates in regular files, also named by the name of the city - this is not very efficent: The whole "database" of this example would need over 240MB in total, since every file is about 4k on your storage. Even if it just contains those few bytes for the coordinates, every regular file would have a file size about 4096 bytes on your drive (because of block size) (see wikipedia if you want to know more about this). So lots of small regular files would waste a lot of storage capacity.
If you don't believe, just have a look at such files and compare the size with the the following commands
echo hello > regular_file ls -slh1 regular_file stat regular_file du -hcs regular_file ln -s "hello again" symlink_file ls -slh1 symlink_file stat symlink_file du -hcs regular_file
Sure, you can change blocksize of your filesystem by reformating the blockdevice or possibly do some tweaks with tune2fs. But even then the minimal blocksize of ext3 would be around 512 bytes and these changes would be no out-of-the-box solution and could lead to disadvantages of other services on your system.
you will find some more information about this topic here, here and here.
When using symlinks, the whole "database" its just about 1.8 MB in total, since each symlink and inode just needs those 128 bytes in this case. It won't get "blown up" to 4k by the specified minium block size of the underlying filesystem.
now you can also make a tarball out of the folder for distributing the "database". the xz tarball is about 1.1 MB then.
you simply can add a new POI by doing this
ln -s "http://osm.org/#map=/49.49361/7.26694" "osterbrücken"
and remove it, just by deleting the symlink
rm osterbrücken
you also can specify zoom-level for individual POIs, if you want to. just use:
ln -s "http://osm.org/#map=14/49.49361/7.26694" "osterbrücken"
you could also distribute street-names in this way, for example by making cities as subfolders and put street POIs as symlinks. this would work with build in directory indexing of gatling.
since there are no unique city names, i should also consider to generate folders for duplicate names and then put each symlink in this folder.
Have a look at rfc5870 which describes the URL Scheme for Geo-Coordinates. Its a A Uniform Resource Identifier for Geographic Locations, in WGS-84 (World Geodetic System). But than you have to evaluate this URL at your client application. Also since gatling awaits "://" and the geo-url just is "geo:74.4294,19.0245", you wont get a successful redirect. you would have to change sourcecode of gatling in http.c for parsing this correctly.
You can fetch my pois.de.txt.xz (51MB) with 4.919.091 entries in format "name,lat,lon". The dataset is based on extraction of openstreetmap database dump (20150701), so licenced under Open Data Commons Open Database Lizenz (ODbL), and so copyright by © OpenStreetMap contributors
Here is a Version of current POI Names Dataset as cdb database:
It's my diffrent approach, and won't need gatling and broken symlinks anymore.
This is the concept: we generate a (sorted) djb cdb (Constant Database) for key value search and do search requests. You can generate multiple cdb files, containing cities, streets, pois etc.
djb cdb is a simple, minimalistic constant key/value database file with fast hashtable lookup, created by Dan J. Bernstein.
you can build, query and serve them with native cdb implementatation, daemontools, and dash:
You will need libcdb and tcpserver ucspi-tcp from djb. In Debian you can install them in this way:
apt-get install libcdb1 pv ucspi-tcp dos2unix
You will need actual osm dumps in pbf file format (already mentioned above), you can get them here and here and you need osmconvert.c to read and convert the pbf files. Then you can generate the cdb files with this script:
#!/bin/bash # make geonames cdb file from all your current pbf osm export files in current # working directory. need osmconvert, libcdb and pv find . -name "*.pbf" -exec sh -c 'osmconvert "{}" --max-objects=90000000000 \ --all-to-nodes --csv="name @lat @lon" --csv-separator="," | grep -v -E "^," | \ tr -d "\\t:> " | tr "," " " | pv -N "{}" -l | cdb -m -c "{}.cdb"' "{}" \;
As you can see, I choose to delete spaces. So you have to search for "NeuerWeg" instead of "Neuer Weg".
The following scripts are your webserver. so you won't need gatling, just tcpserver from ucspi-tcp and cdb utils.
stdin input is restricted to a-z A-Z 0-9 - so, sorry no Umlaute, yet. todo: add urldecode, and öäüÖÄÜß in stdin truncate delete complement.
Remember: To safe space in database, all spaces are truncated. So search for "NeuerWeg" instead of "Neuer Weg"
Now we have to Variants: The old one.. a classical serverside backend. And the new one, which is much more interesting: a clientside Implementation, using http range request on the cdb file to query the cdb file from any static webserver.
Whole Germany
Use with dash not with bash.
#!/bin/dash # germany_serv.sh # name to coord list echo "HTTP/1.4 200 OK" echo "Content-Type: text/plain" echo cdb -m -q germany.20200209.cdb \ "$(timeout 2 head -n1|head -c 128|cut -d " " -f 2|tr -dc "0-9a-zA-Z-")" || echo "no result" exit 0
Just Saarland (with redirect header to online map tiles)
Use with dash not with bash.
#!/bin/dash # saar_serv.sh redirect echo "HTTP/1.1 302 Moved Temporarily" echo "Content-Length: 0" echo -n "Location: https://osterbruecken.de/ostermap/?pos=" cdb -m -q saarland.20191123.cdb \ "$(timeout 2 head -n1|head -c 64|cut -d " " -f 2|tr -dc "0-9a-zA-Z-")" | \ head -n1 | tr -dc "0-9. " | tr " " "," echo exit 0
And the wrapper script you can use with daemontools.
#!/bin/sh # start_saarserv.sh (tcpwrapper) # run as restricted user!!! (setuidgid) ulimit 12000 exec setuidgid nobody tcpserver -R -H -D -c 40000 127.0.0.1 8000 recordio ./saar_serv.sh
Connect Benchmark TestScript (uses http@ client from ucspi-tcp)
ulimit 64000; yes | xargs -P 0 sh -c "http@ 127.0.0.1 NeuerWeg 8000"
Your Example Browser-Request would be http://127.0.0.1:8000/NeuerWeg
Here ist my Implementation of a clientside CDB Implementation in Javascript. So you can place a static cdb file on your static webserver, and doing HTTP Range Requests on this cdb, without downloading the whole file.
You can also use my cdb_generator (made with javascript) to generate a cdb file from simple txt files, containing key->value line by line.
But this one is more interesting: My cdb_find (made with javascript and http range requests) to request the cdb file from server, without downloading the whole cdb file.
This is also useful for very large cdb Files.
Important Hint: You have to enable RangeRequests on your Webserver (e.g. apache) in your vhost, to make this work
MaxRanges unlimited MaxRangeReversals unlimited MaxRangeOverlaps unlimited
or
MaxRanges 2000 MaxRangeReversals 2000 MaxRangeOverlaps 2000
HTTP Range Requests allow, that your Browser can request a specific byte range out of large file and wont need to download the whole file at once. It's just specified in the Header, when doing the Request.
Since we have a well sorted and structured cdb file, we can fetch the wohle cdb index (only first 2048 bytes), then perform a hashtable lookup, calculate position and fetch further bytes from file to the gain final result(s). Since our cdb file is already sorted, we can expect a reasonable performance.
By downloading the whole database you can build a offline version, which is quite nice. To replicate the database to different servers, you can use rsync an --inplace on a remote copy, the an atomic move in remote filesystem.
Keep in mind, that a classic djb cdb file with 32bit has a maximal filesize of 4GB.
We generate a xxhash64 for each search term. We devide this hash into folders and subfolders. we place a symlink in the last subfolder to itsself, but encoding the geo coordinate in the name of the symlink. a simple javascript clientside hashing and lookup leads us to expected search results.
we add any point (example finalmedia) like this:
xxhash64 of string "finalmedia" results in "1ffffb19962d5e48", so we do
apt-get install xxhash echo -n finalmedia | xxh64sum | tr -dc "0-9a-f" | fold -w 2 | tr "\n" "" # results in 1f/ff/fb/19/96/2d/5e/48/ mkdir -p search/1f/ff/fb/19/96/2d/5e/48 cd search/1f/ff/fb/19/96/2d/5e/48 ln -s .. 49.4945,7.2669 ln -s .. 49.4953,7.2683
So our symlink filenames are give coords and the target of the link is just the folder itself. Therefore its also possible to set multiple coords to same searchterm, just by adding more symlinks in the same folder to different coords!
Follow this link to try out my implementation. Hint: Just change the searchterm to the string: "Finalmedia"
Have a look at the sourcecode of the page, to find out how it works.
german postcodes (plz) to geo coordinates: plz2geo.txt
RFC 1876 (January 1996) mentions DNS LOC Records. so you can check dig dkdhr.com LOX +short -> 42 21 43.528 N 71 5 6.284 W -25.00m 1m 3000m 10m.
its easy to set the plz2geo them in nameserver: example: 66606.plz.cafeface.de to get -> 7.18821 N 49.46627 E. It has to get encoded to LOC Format:
This RFC defines the format of a new Resource Record (RR) for the Domain Name System (DNS), and reserves a corresponding DNS type mnemonic (LOC) and numerical code (29). 2. RDATA Format MSB LSB +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 0| VERSION | SIZE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 2| HORIZ PRE | VERT PRE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 4| LATITUDE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 6| LATITUDE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 8| LONGITUDE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 10| LONGITUDE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 12| ALTITUDE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 14| ALTITUDE | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ (octet) where: VERSION Version number of the representation. This must be zero. Implementations are required to check this field and make no assumptions about the format of unrecognized versions. SIZE The diameter of a sphere enclosing the described entity, in centimeters, expressed as a pair of four-bit unsigned integers, each ranging from zero to nine, with the most significant four bits representing the base and the second number representing the power of ten by which to multiply the base. This allows sizes from 0e0 (<1cm) to 9e9 (90,000km) to be expressed. This representation was chosen such that the hexadecimal representation can be read by eye; 0x15 = 1e5. Four-bit values greater than 9 are undefined, as are values with a base of zero and a non-zero exponent. Since 20000000m (represented by the value 0x29) is greater than the equatorial diameter of the WGS 84 ellipsoid (12756274m), it is therefore suitable for use as a "worldwide" size. HORIZ PRE The horizontal precision of the data, in centimeters, expressed using the same representation as SIZE. This is the diameter of the horizontal "circle of error", rather VERT PRE The vertical precision of the data, in centimeters, expressed using the sane representation as for SIZE. This is the total potential vertical error, rather than a "plus or minus" value. (This was chosen to match the interpretation of SIZE; to get a "plus or minus" value, divide by 2.) Note that if altitude above or below sea level is used as an approximation for altitude relative to the [WGS 84] ellipsoid, the precision value should be adjusted. LATITUDE The latitude of the center of the sphere described by the SIZE field, expressed as a 32-bit integer, most significant octet first (network standard byte order), in thousandths of a second of arc. 2^31 represents the equator; numbers above that are north latitude. LONGITUDE The longitude of the center of the sphere described by the SIZE field, expressed as a 32-bit integer, most significant octet first (network standard byte order), in thousandths of a second of arc, rounded away from the prime meridian. 2^31 represents the prime meridian; numbers above that are east longitude. ALTITUDE The altitude of the center of the sphere described by the SIZE field, expressed as a 32-bit integer, most significant octet first (network standard byte order), in centimeters, from a base of 100,000m below the [WGS 84] reference spheroid used by GPS (semimajor axis a=6378137.0, reciprocal flattening rf=298.257223563). Altitude above (or below) sea level may be used as an approximation of altitude relative to the the [WGS 84] spheroid, though due to the Earth's surface not being a perfect spheroid, there will be differences. (For example, the geoid (which sea level approximates) for the continental US ranges from 10 meters to 50 meters below the [WGS 84] spheroid. Adjustments to ALTITUDE and/or VERT PRE will be necessary in most cases. The Defense Mapping Agency publishes geoid height values relative to the [WGS 84] ellipsoid.
LOC ( d1 [m1 [s1]] {"N"|"S"} d2 [m2 [s2]] {"E"|"W"} alt["m"] [siz["m"] [hp["m"] [vp["m"]]]] ) (The parentheses are used for multi-line data as specified in [RFC 1035] section 5.1.) where: d1: [0 .. 90] (degrees latitude) d2: [0 .. 180] (degrees longitude) m1, m2: [0 .. 59] (minutes latitude/longitude) s1, s2: [0 .. 59.999] (seconds latitude/longitude) alt: [-100000.00 .. 42849672.95] BY .01 (altitude in meters) siz, hp, vp: [0 .. 90000000.00] (size/precision in meters)