Eric Theise
 geo  Mapzen  metro extract  OpenStreetMap

Mapzen's OpenStreetMap Metro Extracts

Knowing how to get started with your own copy of OpenStreetMap data has always been a bit of a challenge. Will Skora’s tweet of a few days ago indicates it’s still an issue. Eagerly downloading an entire planet file usually leads, literally, to a world of hurt for first timers. Selectively downloading an area of interest can be a good approach, although it, too, can result in a few failed or otherwise disappointing results.

Metro extracts are, perhaps, the best option for starting to work with your own copy of OSM data. For a long time, these have been provided by Geofabrick GmbH, BBBike.org, and Michal Migurski’s metro.teczno.com site. The metro.teczno baton has recently passed to Mapzen, and as I’m updating slides for upcoming presentations at FOSS4G and NACIS, it seems an opportune time to exercise and critique their offerings.

On the plus side, Mapzen’s using a contemporary and widely-used deployment tool (chef) to automate the weekly creation of their metro extracts; this should boost reliability and reproducability. They’ve added a few formats, and continue to offer the ability to request extracts of new metro areas via a GitHub pull request.

Let’s detour and take a look at the request process. I’m going to need Duke City (better known as Albuquerque, NM) data for a talk I’m giving at NACIS and I’m happy to let them shoulder the burden of generating that extract, for me and others. An excerpt from their metroextractor-cities/cities.json file is shown below:

{
    "regions": {
        "africa": {
            "bbox": {
                "top": "37.630",
                "left": "-17.910",
                "bottom": "-35.180",
                "right": "63.647"
            },
            "cities": {
                "abuja": {
                    "bbox": {
                        "top": "9.246",
                        "left": "7.248",
                        "bottom": "8.835",
                        "right": "7.717"
                    }
                },
                 "addis-abeba": {
                    "bbox": {
                        "top": "9.077",
                        "left": "38.651",
                        "bottom": "8.919",
                        "right": "38.920"
                    }
                },

...

                "trinidad-tobago": {
                    "bbox": {
                        "top": "11.442",
                        "left": "-63.136",
                        "bottom": "9.443",
                        "right": "-60.224"
                    }
                }
            }
        }
    }
}

A little googling turns up a file at the University of New Mexico that gives the Geographic Bounding Box of Bernalillo County as

West Bound -107.19617
East Bound -106.149575
North Bound 35.219639
South Bound 34.869024

so I’ll do a reality check via OpenStreetMap,

Bernalillo County

create a little chunk of JSON,

"bernalillo-county-nm": {
  "bbox": {
      "top": "35.219639",
      "left": "-107.19617",
      "bottom": "34.869024",
      "right": "-106.149575"
  }
},

insert it, alphabetically, into the "north_america": "cities" hash, and submit a pull request via GitHub’s edit function with the comment, Bernalillo County contains Albuquerque plus smaller municipalities and unincorporated areas.. Once merged, I’ll expect the Burque metro extract to appear in the next weekly batch.

(My request was rejected almost immediately! If I’d been paying more attention, I would have noticed that Mapzen’s standardized on three significant digits for longitudes and latitudes. Thus my revised request

"bernalillo-county-nm": {
  "bbox": {
      "top": "35.220",
      "left": "-107.196",
      "bottom": "34.869",
      "right": "-106.150"
  }
},

has suitably rounded coordinates.)

Two critiques at this juncture. There’s no standard naming scheme or guide. So cities.json contains “portland-me” and “portland”, presumably Oregon. And if cities are rendered in alphabetical order, rather than by area, decreasing, it can be impossible to select and identify a metro extract in the overview map on Mapzen’s main metro extract page. For example, it’s impossible to use the map to identify the San Francisco extract, as Sf Bay Area sits on top of it, or the Portland extract, as it’s blocked by Washington County Or.

Blocked Extract

A nitpick: using title capitalization also leaves something to be desired.

A few other critiques of the interface. The search function is helpful, but organizing the growing list of cities by continent codes would restore some needed structure. A guide to the formats provided would help the new consumer of these extracts; while a geo pro might immediately recognize which format is most useful for which use, that same geo pro might also have already forked Mapzen’s work and started to use it to create and customize their own extracts, on their own schedule.

Most of all, I miss the Coastline Shapefiles Mike Migurski provided alongside each extract, if only for pedagogic reasons. In my workshops, we devote a lot of energy towards downloading and importing metro extracts into participants’ own databases, and wiring them up to generate tiles using MapBox’s TileMill. It’s a long slog, and afterwards, it’s refreshing to demonstrate how easy and useful it is to import shapefiles to improve a map’s appearance or to add administrative data that exists outside of OpenStreetMap. Those coastline shapefiles served the purpose well.

That said, kudos to Mapzen for stepping in and dedicating resources towards the preservation and creation of open tools, and open data, for geo professionals and enthusiasts.