The OpenStreetMap movement has a great trick up their sleeve for manipulating their data – osmosis. It’s the sort of tool that map, GIS and navigation companies probably have developed in-house, only I’m prepared to bet that this one’s more elegant than most of theirs. For straightforward tasks such as loading some OSM data into Postgres the documentation was clear to me. However, I have been scratching my head more than somewhat with the more subtle plumbing.
I needed to load some OSM data into Postgres, munge it and write it back out to the same format file that I downloaded. Simple, right? And, on the tin, that’s what osmosis says it does, but the devil is in the detail.
First I tried loading with –read-pbf and –write-pgsql, OK, that works. Then to extract, the reverse, right? Just –read-pgsql and –write-pbf. Wrong! The detail to note here is that –read-pgsql outputs a ‘dataset’ and –write-pbf is looking for an ‘entity stream’. Frankly, I’m not bothered to try and understand why these are or should be different in osmosis, but they are, and they don’t mix.
Second effort was to use –write-apidb and –read-apidb. The theory’s good in that –read-apidb does spit out a stream within osmosis that –write-pbf is happy with. Except APIDB is, comparatively, a pain to use on my Ubuntu 12.04 system, it looks for some (Postgres?) extensions that weren’t installed, nor easily installed from 5 minutes’ Googling. I tried it anyway, since the API DB schema creation script seemed to almost work. But, na, –write-apidb failed. Anyway, I’d have to change my munging code that relied on the schema that is created by –write-pgsql (with the tags in an hstore (which, by the way, is a totally groovy feature of PostgreSQL!)).
Back to the drawing board, or RTFM.
The Solution – Some Plumbing
I’ll cut straight to the last page, here’s what I needed. First clear out any cruft in the database instance:
/home/userx/opt/osmosis/bin/osmosis --truncate-pgsql host="localhost" database="foo" user="foo" password="foo"
Then load the data up:
/home/userx/opt/osmosis/bin/osmosis --read-pbf file=wherever.osm.pbf --write-pgsql host="localhost" database="foo" user="foo" password="foo"
Then munge the data in PostgreSQL to your heart’s content.
Then export to a new PBF file:
/home/userx/opt/osmosis/bin/osmosis --read-pgsql host="localhost" database="foo" user="foo" password="foo" outPipe.0=pg --dd inPipe.0=pg outPipe.0=dd --write-pbf inPipe.0=dd file=wherever_munged.osm.pbf
The trick is knowing about that –dd (a.k.a. –data-dumper) filter that converts the stream types on the fly inside osmosis. And you need to understand the plumbing. What you need to do is name the pipe coming out of –read-pgsql and explicity use that named pipe as the input to the –dd filter. In this case it is called ‘pg’, but could equally have been ‘giraffe’. Similarly you need to explicity name the output of -dd and name that as the input to –write-pbf – in this case I used the name ‘dd’.
Easy when you know how – a few more examples and words on this in the osmosis wiki wouldn’t go amiss.
A Few More Osmosis Examples (Not Necessarily PostGIS Related!)
Clip St. Petersburg out from a large PBF data set of western Russia using a bounding box filter:
osmosis --rb russia-european-part-latest.osm.pbf outPipe.0=in --bb left=29.54 bottom=59.61 right=30.87 top=60.3 inPipe=in outPipe=bb --wb sp.out.pbf inPipe=bb omitmetadata=true