Skip to main content

Extracts partial GTFS feed from OSM data.

Project description

# osmtogtfs

[![Build Status](https://travis-ci.org/hiposfer/osmtogtfs.svg?branch=master)](https://travis-ci.org/hiposfer/osmtogtfs) [![pypi](https://img.shields.io/pypi/v/osmtogtfs.svg)](https://pypi.python.org/pypi/osmtogtfs)

Extracts partial GTFS feed from OSM data.

OpenStreeMaps data contain information about bus, tram, train and other public transport means. This information is not enought for providing a complete routing service, most importantly because it lacks timing data. However, it still contains routes, stop positions and some other useful data.

This tool takes an OSM file or URI and thanks to [osmium](http://osmcode.org/) library converts it to a partial [GTFS](https://developers.google.com/transit/gtfs/reference/) feed. GTFS is the de facto standard for sharing public transport information and there are many tools around it. The resulting feed would not validate if you check it, because it is of course partial. Nevertheless, it is yet valuable to us.

## Installation This tool uses osmium which is a C++ library built using boost, so one should install that first. The best way would be using the package manager of your OS and installing [pyosmium](https://github.com/osmcode/pyosmium).

Afterwards install the script from pypi:

$ pip install osmtogtfs

Alternatively you can clone the repo and install it:

$ git clone https://github.com/hiposfer/osmtogtfs & cd osmtogtfs $ python setup.py install

This will install osmtogtfs command on your system. Alternatively, you can also run osmtogtfs/cli.py.

Make sure to run these commands with python 3.

## Usage Run the tool over your OSM data source (or whatever osmium accepts):

osmtogtfs <osmfile>

After a while, depending on the file size, a file named gtfs.zip will be produced inside the working directory. Moreover, if you install the package, you will get an script called osmtogtfs in your python path:

$ osmtogtfs –help Usage: osmtogtfs [OPTIONS] INPUT

Options:
--outdir PATH

Store output in this directory.

--zipfile PATH

Save as Zip file if provided.

--loglevel

Set the logging level.

--help

Show this message and exit.

–outdir defaults to the working directory and if –zipfile is provided, the feed will be zipped and stored in the _outdir_ with the given name, otherwise feed will be stored as plain text in multiple files.

### With Docker If osmium is not available in your package manager, it could be troublesome to install it manually. So here is an executable docker image that could be used directly. The only cavet here is passing input file to the docker container and also getting the results back. The containerized script will write its output to /data by default. The only step necessary is to mount the folder containing the input OSM file to /data inside the container. The following command shows this, note that my input file is called bremen-latest.osm.pbf and is located inside /path/to/osm directory:

$ docker run -v /path/to/osm/:/data hiposfer/osmtogtfs /data/bremen-latest.osm.pbf

The above command will write the output files inside /path/to/osm directory. The osmtogtfs docker image will be downloaded on first run.

## Tests We use the pytest package for testing. Install pytest and run the tests:

$ pip install pytest $ pytest

-s disables capturing and shows us more output (such as print statements and log messages).

### Profiling In order to profile the code we use cProfile:

# For the osmtogtfs script $ python -m cProfile -s cumtime osmtogtfs/cli.py resources/osm/bremen-latest.osm.pbf –outdir output/bremen –dummy > output/benchmarks/bremen.txt $ python -m cProfile -s cumtime osmtogtfs/cli.py resources/osm/saarland-latest.osm.pbf –outdir output/saarland –dummy > output/benchmarks/saarland.txt

You will find the result in [output/benchmark.txt](output/benchmark.txt). Theses results are produced on an Archlinux machine with an Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz CPU with 16GB RAM.

### Dummy Feed Information Not all of GTFS necessary data are available in OSM files. In order to fill the missing fields with some dummy data use –dummy CLI option. This will produce trips.txt, stop_times.txt and calendar feeds. These files will contain dummy data of course.

## Implementation Notes In this section we describe important aspects of the implementation in order to help understand how the program works.

### Field Mapping GTFS feeds could contain up to thirteen different CSV files with .txt extension. Six of these files are required for a valid feed, including _agency.txt_, _stops.txt_, _routes.txt_, _trips.txt_, _stop_times.txt_ and _calendar.txt_. Each file contains a set of comumns. Some columns are required and some are optional. Most importantly, not all the fields necessary to build a GTFS feed are available in OSM data. Therefore we have to generate some fileds ourselves or leave them blank. Below we cover how the values for each column of the files that we produce at the moment are produced.

#### agency.txt We use _operator_ tag on OSM relations which are tagged as relation=route to extract agency information. However, there are some routes without operator tags. In such cases we use a dummy agency:

{‘agency_id’: -1, ‘agency_name’: ‘Unkown agency’, ‘agency_timezone’: ‘’}

  • agency_id: we use the _operator_ value to produce the _agency_id_: agency_id = int(hashlib.sha256(op_name.encode(‘utf-8’)).hexdigest(), 16) % 10**8

  • agency_name: the value of the _operator_ tag

  • agency_timezone: we guess it based on the coordinates of the elements in the relation

#### stops.txt

  • stop_id: value of the node id from OSM

  • stop_name: value of _name_ tag or _Unknown_

  • stop_lon: longitute of the node

  • stop_lat: latitute of the node

#### routes.txt

  • route_id: id of the OSM relation element

  • route_short_name: value of _name_ or _ref_ tag of the relation

  • route_long_name: a combination of _from_ and _to_ tags on the relation otherwise empty

  • route_type: we map OSM route types to GTFS

  • route_url: link to the relation on openstreetmaps.org

  • route_color: value of the _color_ tag if present otherwise empty

  • agency_id: ID of the agency otherwise -1

### OSM to GTFS Route Type Mapping

Below is the mapping that we use, the left column is the OSM value and the right column is the corresponding value from GTFS specification (make sure the see the code for any changes):

tram: 0 light_rail: 0 subway: 1 rail: 2 railway: 2 train: 2 bus: 3 ex-bus: 3 ferry: 4 cableCar: 5 gondola: 6 funicular: 7

### namedtuples as the preferred data structure In order to decrease the necessary memory, we use mostly namedtuples (which are basically tuples) to store data.

## Lincense MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

osmtogtfs-0.2.0-py3.6.egg (35.6 kB view hashes)

Uploaded Source

osmtogtfs-0.2.0-py2.py3-none-any.whl (21.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page