scieee Science in your language
[en] (orig)
stefan kaufmann
OPENING PUBLIC TRANSIT DATA IN GERMANY
OPENING PUBLIC TRANSIT DATA IN GERMANY
stefan kaufmann
A Status Quo
Diplom Informatik (Dipl.-Inf.)
Institut für Datenbanken und Informationssysteme
Fakultät für Ingenieurwissenschaften und Informatik
Universität Ulm
2014-05-23
Stefan Kaufmann: Opening Public Transit Data in Germany, A Status
Quo
Written as a requirement for the completion of the diploma course in
Media Informatics.
supervisors:
Prof. Dr. Manfred Reichert
Prof. Dr. Frank Kargl
advisor:
Dipl.-Inf. Rüdiger Pryss
location:
Institute of Databases and Information Systems
Faculty of Engineering and Computer Science
89081 Ulm, Germany
submitted:
2014-05-23
licensing information: This thesis is licensed under a Creative
Commons Attribution 3.0Unported License.
This thesis is dedicated to my parents, who provided me with the
guidance I needed to grow as a child, and the freedom I needed to
grow as an adult—including the patience during my long years of
studying.
Also, this is to the one left behind. Thank you for letting me learn
from you, FloD.
ABSTRACT
Open data has been recognized as a valuable resource, and public
institutions have taken to publishing their data under open licenses,
also in Germany. However, German public transit agencies are still re-
luctant to publish their schedules as open data. Also, two widely used
data exchange formats used in German transit planning are propri-
etary, with no documentation publicly available. Through this work,
one of the proprietary formats was reverse-engineered, and a trans-
formation process into the open GTFS schedule format was developed.
This process allowed a partnering transit operator to publish their
schedule as open data. Also, through a survey taken with German
transit authorities and operators, the prevalence of transit data ex-
change formats, and reservations concerning open transit data were
evaluated. The survey brought a series of issues to light which serve
as obstacles for opening up transit data. Addressing the issues found
through this work, and partnering with open-minded transit authori-
ties to further develop transit data publishing processes can serve as
a foundation for wider adoption of publishing open transit data in
Germany.
ZUSAMMENFASSUNG
Open Data kann als wertvolle Ressource angesehen werden. Auch in
Deutschland verbreiten öffentliche Einrichtungen zunehmend Daten
unter freien Lizenzen. Bei der Veröffentlichung ihrer Fahrpläne als
Open Data zeigen sich deutsche Verkehrsverbünde und -unternehmen
jedoch immer noch zurückhaltend. Zwei der im deutschen ÖPNV ver-
breiteten Datenaustauschformate sind zudem proprietäre Formate,
für die keine Dokumentation öffentlich verfügbar ist. In dieser Arbeit
wurde eines dieser Formate analysiert und ein Transformationspro-
zess für dieses Format in das offene GTFS-Format entwickelt. Dieser
Prozess ermöglichte es einem Verkehrsunternehmen, seine Fahrpläne
als Open Data zu veröffentlichen. Darüber hinaus wurden durch ei-
ne Umfrage unter Verkehrsverbünden und -unternehmen in Deutsch-
land die Verbreitung verschiedener ÖPNV-Datenformate und Vorbe-
halte gegenüber der Veröffentlichung offener Fahrplandaten erfragt.
Die Erhebung identifizierte eine Reihe offener Probleme, die der Ver-
öffentlichung offener Fahrplandaten im Wege stehen. Die Bearbei-
tung dieser Probleme, sowie eine Kooperation mit aufgeschlossenen
Verkehrsverbünden zur weiteren Entwicklung von Veröffentlichungs-
prozessen können als Grundlage dienen, die Veröffentlichung offener
Fahrplandaten zu fördern.
vii
ACKNOWLEDGMENTS
I would like to thank my advisor Rüdiger Pryss, and my principal
supervisor, Manfred Reichert, for allowing me to pursue this topic
and giving me the liberties to explore and learn. I also thank Frank
Kargl for agreeing to be secondary reviewer for this thesis, and for
being the academic host to the datalove open data student working
group at Ulm University.
Special thanks go to SWU Verkehr and DING, especially Christian
Burst and Martin Schiller, who were willing to spend a lot of their
time on patiently explaining me the internals of their respective data
flows, and providing me with the data sets to work on.
Further thanks go to my proofreaders, in alphabetical order, Doreen
Butze, Benjamin Erb, Rens van der Heijden, Barbara Körner, Simon
Lüke, Nikola Mattschas, Juliane Wessalowksi and Björn Wiedersheim,
and everybody else who helped and supported me during my thesis
writing period. This also includes the members of the OKF Open
Transport group who inspired me to choose this topic for my thesis,
and Florian Schaub, wo convinced me to actually do so.
Finally, I would like to thank my friends, who helped to shape who
I am today.
Thank you!
ix
CONTENTS
1 introduction 1
1.1Turning Government into a Platform through Open Data 1
1.2Open Data in Public Transit 2
1.3Aims of this work 6
1.4Outline 7
2 transit data 9
2.1Usage of data during operation 10
2.2Data Acquisition 13
2.3Excursus: Real-time Data 14
3 data models 17
3.1The VDV Transit Data Model (ÖPNV-Datenmodell) 17
3.1.1VDV-Schrift 451: File Layout 18
3.1.2VDV-Schrift 452: Standard Interface for Network
Plans and Schedules 20
3.2Uniting European Standardisation Efforts: Transmodel 22
3.3German Industry Standards: HAFAS 24
3.3.1Features 24
3.3.2HAFAS Exchange Format 25
3.4German Industry Standards: DIVA 27
3.4.1Features and Data Exchange Compatibility 27
3.4.2Internal Raw Data Format 28
3.4.3File headers 29
3.4.4Line Definition Files 30
3.5Along Comes Google: GTFS 31
3.5.1File format 35
3.5.2Features 35
3.6Comparison of Models 36
3.7Challenges of Transforming and Merging Transit Datasets 37
3.7.1Transit Vocabulary 38
3.8Example Data Flow at SWU Verkehr and DING 39
3.8.1Planning Stage 39
3.8.2Operational Stage 40
3.8.3Data Handling by DING 40
4 exporting diva data:a first approach through
csv exports 43
4.1Exporting Data from CSV Timetables 43
4.1.1Data Layout 45
4.1.2Deciphering A Journey Timing Pattern Column 46
4.1.3Programmatical Transformation 48
xi
xii contents
4.2Creating stops.txt from KML stop locations 50
4.2.1Programmatical transformation 52
4.3Transforming the type of day calendar 53
4.4Optional: Matching Route Shapes 56
4.5Drawbacks of this Approach 56
4.6Conclusion 58
5 exporting directly from diva data 59
5.1File Structure and Layout 59
5.2Importing Tables Into An Intermediary Database 60
5.3Setting up a Target Database 61
5.4Transforming the Line Information Files 61
5.4.1Choosing Relevant Line Definition Files 62
5.4.2Journey Patterns 63
5.4.3Stop Points 63
5.4.4Timing Patterns 64
5.4.5Journey Definition 65
5.4.6Headsigns 66
5.4.7Line Name and Description 67
5.5Transforming Stop Structures and Coordinates 68
5.5.1Querying Stop Areas and Stop Points 69
5.5.2Coordinate Transformation 70
5.6Importing Service Types and Dates 71
5.6.1Determining Local Holidays 71
5.6.2Importing All Other Service Types 71
5.7Handling Transfers 72
5.8Exporting the GTFS Feed from the Database 73
5.9Issues 74
5.10 Conclusion 75
6 what is holding back open transit data in ger-
many?77
6.1Legal Matters 77
6.1.1Transit Legislation 77
6.1.2Intellectual Property Rights 78
6.2Evaluation: The Status Quo 79
6.2.1Findings of the Evaluation 80
6.3Outlook 86
7 conclusion 89
a appendix 91
a.1Evaluation Questionnaire 91
a.1.1Page 1: Basic Questions 91
a.1.2Page 2: Schedule exchange 92
a.1.3Page 3: Open Transit Data 93
a.1.4Page 4: Personal judgements, Part 194
contents xiii
a.1.5Closing questions 95
a.2DIVA Exchange Format 95
a.2.1Folder Structure and Naming Conventions 95
a.2.2DIVA Coordinate Model 97
a.3Scripts Reference 99
a.3.1GTFS Target Database Creation Statements 99
a.3.2Converting DIVA Journeys To GTFS 102
a.3.3Transfer Handling Script 112
a.3.4Transforming Stop information from DIVA to
GTFS 116
bibliography 121
LIST OF FIGURES
Figure 1Mapnificent screenshot 4
Figure 2IBIS On-board computers 11
Figure 3Infrared beacons for enhanced positional accu-
racy 12
Figure 4SWU test car 14
Figure 5Data flow model using VDV-452 as exchange
format 20
Figure 6Entity relationship diagram of VDV-452 23
Figure 7GTFS Class Diagram 33
Figure 8SWU stop geodata example 57
Figure 9“Had you heard about the term ‘open data’ be-
fore this survey?” 81
Figure 10 “Are the necessary tools and technical exper-
tise for exporting open transit data available in
your institution?” 82
Figure 11 “Is the legal expertise necessary for publishing
open transit data available within your institu-
tion?” 82
Figure 12 “Free exporting tools for the data formats we
use could help us in publishing open transit
data” 83
Figure 13 “A step-by-step manual, including an explana-
tion of the legal framework, would help us in
publishing open transit data” 83
Figure 14 “Third-parties developing new, innovative ap-
plications based on schedule data can help im-
prove attractiveness of public transit” 83
Figure 15 “If third parties develop solutions based on
schedule data, transit authorities can save money
since they do not have to develop applications
themselves” 84
Figure 16 “Providing online services based on schedule
data is primarily the responsibility of transit
authorities or their service contractors” 84
Figure 17 “If third-party developers make profit off ap-
plications based on schedule data, they should
pay fees to the schedule publishers” 84
Figure 18 “If third-party applications give false informa-
tion based on correct schedule data, riders will
seek the fault at the transit authority” 85
xiv
Figure 19 Example list of DIVA files 98
LIST OF TABLES
Table 1Data tables in VDV-452 22
Table 2DIVA files and their contents 32
Table 3GTFS feed overview 34
Table 4Comparison of data models 37
Table 5Example CSV timetable for SWU line number
15 from Willy-Brandt-Platz to Science Park on
a weekday 44
Table 6More complex CSV timetable for SWU line num-
ber 5from Neu-Ulm to Science Park 47
Table 7Type of day calendar matrix 54
Table 8Input elements for schedule-planning activi-
ties according to Ceder 96
Table 9DIVA Coordinate Reference Systems 99
LISTINGS
Listing 1Example file header in VDV-451 format 19
Listing 2Example table in VDV-451 format 19
Listing 3Excerpt from the stop coordinate example file
provided by SBB 26
Listing 4Excerpt from the schedule example file pro-
vided by SBB 26
Listing 5Typical DIVA file header 29
Listing 6Excerpt of DIVA line definition file for SWU
bus No. 15 30
Listing 7Excerpt of SWU’s GTFS stops.txt 35
Listing 8DIVA KML geolocation excerpt 50
Listing 9Transformed type of day calendar and the re-
sulting calendar_dates.txt entries 55
Listing 10 Example SQL table creation statement 61
Listing 11 Selecting relevant line definition file informa-
tion 62
Listing 12 Regular expression matching the relevant parts
of a journey definition line 66
xv
Listing 13 Regular expression matching the relevant parts
of a journey definition line 67
Listing 14 Regular expression for relevant parts of a line
name line 68
Listing 15 SQL query for DIVA stop areas containing stop
points 69
Listing 16 SQL query for DIVA stop points which are part
of a stop area 70
Listing 17 SQL query for DIVA stop points which are not
part of a stop area 70
Listing 18 Querying service exceptions from DIVA 71
Listing 19 Querying DIVA transfers 73
Listing 20 Exporting the GTFS database into a text file
feed 73
Listing 21 SQL Create statements for setting up a target
GTFS database 99
Listing 22 Transforming DIVA journeys into GTFS 102
Listing 23 Excerpt: DIVA transfer information transfor-
mation to GTFS 112
Listing 24 Transforming DIVA stops into GTFS 116
ACRONYMS
AEG Allgemeines Eisenbahngesetz
API Application Programming Interface
AVM Automated Vehicle Management
AVL Automated Vehicle Location
BART Bay Area Rapid Transit
CASSIOPE Computer-Aided System for Scheduling Information
and Operation of Public Transport in Europe
CEN European Committee for Standardization
CRS Coordinate Reference System
CSA Connection Scan Algorithm
CSV Comma Separated Values
CF Compact Flash
DOM Document Object Model
xvi
acronyms xvii
DELFI Durchgängige Elektronische Fahrplaninformation
DING Donau-Iller-Nahverkehrsverbund GmbH
DINO DIVA Datenpool Nord
DIVA Dialoggesteuertes Verkehrsmanagement- und
Auskunftssystem
DRIVE Dedicated Road Infrastructure for Vehicle Safety in
Europe
EC European Community
EFA Elektronische Fahrplanauskunft
ERM Entity Relationship Model
GDF Geographic Data File
GIS Geographic Information System
GPS Global Positioning System
GSM Global System for Mobile Communications
GTFS General Transit Feed Specification
HAFAS HaCon Fahrplan-Auskunfts-System
HTML Hypertext Markup Language
IATA International Air Transport Association
IBIS Integriertes Bus-Informations-System
IFOPT Identification of Fixed Objects in Public Transport
ICTS Intermodal Transport Control System
IP Intellectual Property
KML Keyhole Markup Language
MDT Mobile Data Terminal
NVBW Nahverkehrsgesellschaft Baden-Württemberg
O-D Origin-Destination
ODbL Open Database License
OKF Open Knowledge Foundation
ÖPNV Öffentlicher Personennahverkehr
PersBefG Personenbeförderungsgesetz
xviii acronyms
PDF Portable Document Format
RAPTOR Round-bAsed Public Transit Optimized Router
SIRI Service Interface for Real Time Information
SQL Structured Query Language
SWU Stadtwerke Ulm
VBB Verkehrsverbund Berlin-Brandenburg
VDV Verband Deutscher Verkehrsunternehmen
VVS Verkehrs- und Tarifverbund Stuttgart
VÖV Verband öffentlicher Verkehrsbetriebe
WGS84 World Geographic System 1984
XML Extensible Markup Language
1
INTRODUCTION
In 2010, publisher Tim O’Reilly coined the expression “Government
as a Platform”[68], in which he argues for a shift in providing govern-
ment services. Instead of what Donald Kettl had earlier called “vend-
ing machine government”[52]—tax money is inserted, and services
pop out of this metaphorical machine—O’Reilly proposes a govern-
ment model in which, “[one] thought of government as the manager
of a marketplace”. Just like the open source software development
community is likened to the image of a bazaar in The Cathedral &
the Bazaar[72], government, in his opinion, should be thought of as a
place where “the community itself exchanges goods and services”.
Taking the success of computer platforms as an example, O’Reilly
advocates, among other steps, for (1) the implementation of open
standards, (2) using simple systems and allowing them to evolve, (3)
designing for participation and allowing for the adoption of outside
expertise, and, (4) lowering the barriers to experimentation.
An often cited example for a well done implementation of this
platform concept is Apple’s iPhone. Unlike other phone ecosystems, Note that Apple has
rejected apps from
its store for a
number of not
always transparent
reasons, including
the app of a
Pulitzer-winning
political
cartoonist[81].
which only included applications designed by the phone vendor and
chosen partners, it expressively allowed and published the necessary
tools for anybody to develop apps for their platform. The result was
the first line of smartphones that became an application platform,
spawning a cottage industry of application developers that had pub-
lished over a million apps in Apple’s App Store by 2013 [60].
1.1 turning government into a platform through open
data
Adapting Apple’s principle to public administrations, proponents ar-
gue that governments should not use the data at their disposal as in-
put to finished services they deliver to citizens, but should see open
data as the finished product to serve to their end-users instead[49].
Definitions of open data have been proposed by the Open Knowledge
Foundation (OKF)’s Open Definition[35] and the Sunlight Foundation’s
Ten Principles for opening up Government Information[37].
For works to be “open” according to these definitions, they need
to be available to everybody and as a whole for no more than their
reproduction costs, and re-distribution, modification and the creation
of derivates need to be permitted. Furthermore open file formats
must be used, the license must not discriminate against anybody, and
alternative use must be permitted [35].
1
2 introduction
Von Lucke and Geiger summarize these definitions in the context
of open data stemming from the public sector[39]:
Open Government Data are all stored data of the public
sector which could be made accessible by government in
the public interest without any restrictions on usage and
distribution.
Such data, when opened, can be used by innovators in order to
implement new solutions, and to re-engineer processes and services.
On their basis, engaged citizens and developers can contribute new
applications and services, even for rather specialized use cases[39,
22].
1.2 open data in public transit
In this work, the author focuses on transit data as an example of openEven though, in
Germany, public
transit has been
de-regulated in the
1990s to allow for
open competition
between privately
and publicly owned
transit
operators[11], it is
(a) a public service
required by law [1,
Sec. 1][5, Sec. 8]
and (b) heavily
subsidized by federal,
state and municipal
governments [71, P.
287290].
government data, which has been quoted by Headd as “the clearest
example of how open government data can be used to encourage the
development of useful new applications”[49]. In the United States,
The history and
structure of GTFS is
further elaborated on
in Section 3.5.
governments had released transit data in the General Transit Feed
Specification (GTFS) format, which sparked the development of nu-
merous new transit apps. Since those applications all hinge around
the same, freely available transit data specification, they can be re-
deployed to any given transit system, whenever a transit agency de-
cides to release their schedule in the GTFS format. Headd argues that,
while many transit agencies continued to design, develop and deliver
their own transit apps, this development allowed agencies to econ-
omize on this task. Instead, transit agencies could choose to focus
on their task of efficiently transporting their ridership—while riders
could pick a solution that meets their specific requirements from the
new app market.
more flexibility Any given app—relying on a common data
format—could work in all transit systems for which open transit
data has been released. In contrast, journey planning applications
released by individual agencies tend to work only for the specific
transit agency they were released for, or for a limited set of transit
systems.
While relying on online journey planner interfaces instead of open
transit data, the Android application Öffi[78] serves as a case in point
for many German transit users. With Öffi, riders can use the same
user interface to plan their public transit journey for 23 local tran-
sit systems in Germany, as well as for 13 local transit systems in
other countries. Additionally, it allows journey planning for long-
distance public transit in Germany, Switzerland, Austria, Belgium,
Luxembourg, Denmark, Sweden, Norway, Poland, and the United
1.2 open data in public transit 3
Kingdom. Thus, users do not have to care about what agency will
provide their public transit when travelling to other cities—instead,
the same, familiar workflow can be used in all cities covered by Öffi,
although only separately for each transit authority.
specialized solutions and accessibility Publicly available
transit data also allows for the development of applications for very
specific use cases. While transit agencies usually need to deploy gen-
eralized solutions that meet the demand of the majority of their cus-
tomer base, they generally do not have the ressources to also release
applications relevant only for small subsets of their riders.
One example is the author’s university campus in Ulm. As of 2014,
the campus is served by no less than seven bus routes and another
night bus service, spread over five bus stops—however, not all routes
service all of these five stops. The walking distance from the south-
ernmost bus stop on the campus, using the public sidewalks around
the university buildings, to the bus stops along the northern campus
is, however, greater than nine minutes. Thus, the transit agency’s al-
gorithm will often dismiss connections departing from bus stops at,
respectively, the other part of campus, depending on the bus stop
chosen in the trip planner. However, most riders departing from
the university will start their trip from within the university com-
plex, where both parts of the campus are roughly the same walking
distance away. Therefore, riders need to check connections between
multiple bus stops and their desired destination to actually arrive at
the most time-efficient bus route.
While students and university employees would benefit from an
application that takes this fact into account, the transit agency can
neither simply change the routing algorithm for their official trip
planner—since it would require casual riders from outside the uni-
versity to have access to the university buildings during transit op-
erating hours—nor can it publish a customized trip planner for the
university by itself, since the small set of affected riders would not
justify such an investment.
Machine-readable transit data can also allow for more accessible
trip planning services for riders with disabilites. Although German
law requires transit authorities and operators to provide completely
accessible transit services by 2022 [5, Sec. 8], a 2012 study in the Ger-
man states of Niedersachsen and Bremen found only 10% of all rail-
way stations to be accessible to persons with disabilities[62]. While
transit data alone will not remedy structural barriers, such as miss-
ing lifts, specialized applications can, for instance, notify visually im-
paired riders about upcoming stops where loudspeaker announce-
ments are not available, or guide their transfers through audible in-
structions[56]. Other existing applications include GPS-equipped mo-
bile devices with braille displays, which allow users to find the near-
4 introduction
est transit stop in a series of US transit systems[10]. Despite current
efforts not only to let riders plan accessible itineraries, but also to of-
fer accessible web sites for doing so [61, P. 8], information continues
being “trapped” within Portable Document Format (PDF) documents,
which are often in-accessible to customers relying on screen-readers
to access the information[55].
Figure 1: Screenshot of Mapnificent,[88], showing the intersection of public
transit reach from Ulm central station and the university within 15
minutes each. The underlying data is the open GTFS data provided
by Stadtwerke Ulm.
Mapnificent[88] is another example for a use case usually not cov-
ered by transit authorities’ applications or trip planners, which can,
however, be easily implemented in any transit system offering their
schedule in GTFS format. It shows possible transit trips for a given
start point and maximum travel time, also allowing the placement of
multiple starting points and areas on which the travel radii intersect.
Thus, several riders spread throughout an area could choose a venue
where they could meet within a given time frame, using public transit
and their respective starting locations.
improving convenience While online trip planning services
allow riders to make optimal decisions concerning their public transit
itineraries without even having to consult printed timetables[84], and
1.2 open data in public transit 5
one can argue that this is another form of giving relevant information
to riders that serve to improve ridership [33], it still proves hard to
plan door-to-door itineraries spanning over several transit systems or
countries, or including more than one mode of transportation.
This is all the more important in rural areas, where citizens heavily
rely on their cars as a means of transport, and unwieldy itineraries
make riders percieve their cars to be more convenient than the often
sparse public transit coverage. While standardization efforts are un-
derway in order to facilitate seamless journey planning through the
DELFI1and EU-Spirit2initiatives, today’s online journey planners of-
ten cannot even calculate fares for itineraries depending on journeys
in multiple transit systems.
Also, it is still not possible for most journey planning services of-
fered by transit authorities to optimally take riders’ existing mobility
options into account. For instance, students of Ulm University could
book a round-trip from Ulm to Stuttgart and back by supplementing
their DING semester ticket with an inexpensive Verkehrs- und Tarif-
verbund Stuttgart (VVS) day pass, which includes the usage of all VVS
lines for that day. This would allow stop-to-stop trips from anywhere
in Ulm to any place in Stuttgart.
However, riders have to be aware of this fact when booking their
ticket. The journey planner of Deutsche Bahn offers a two-way ticket,
valid only for the train to and from Stuttgart, and, alternatively, a state
day pass, which is valid for all regional trains and select Verbund lines
in the state of Baden-Württemberg during that day. Both options are,
however, more expensive than the aforementioned VVS ticket, and
both come with different feature sets. To make matters worse, riders
can buy the cheaper VVS ticket only through the VVS journey planner
neither Deutsche Bahn’s nor DING’s journey planner can even display
the price for individual trip fares within the VVS area.
The same problem applies to new forms of multi-modal travel, e.g.,
using a private car to get from a smaller town to the next train station,
riding a train into a larger city, and changing mode to a car-sharing
solution in that city. Ideally, all thinkable modes of transport could be
integrated into a single journey planning application, including pri-
vately owned vehicles, public transit, car- or bike-sharing solutions,
taxis, or any other thinkable mode.
advancing scientific progress While the problem of find-
ing the shortest path between two nodes in a graph has been solved
efficiently in principle since Dijkstra published his now classical al-
gorithm in 1959 [30], Sanders and Schultes argue in [77] that, until
2005, speedup techniques for road network routing algorithms were
difficult to compare:
1http://www.delfi.de/
2http://eu-spirit.eu/
6 introduction
[S]tudies were either based on very small graphs or on
proprietary data that could not be used by other groups.
In particular, for ‘newcomers’ it was almost impossible to
start working in this field.
After the road network of the United States was extracted out of US
Census data[20], and the road network of Western Europe was made
available to the scientific community by the German company PTV
AG, Sanders and Schultes found [77] that variants of these graphs
had been used for most comparison studies—for instance, the re-
sults of the 2005 DIMACS implementation challenge[8]. Sanders and
Schultes “view it as likely that the sudden availability of data and the
fast rate of innovation since then are not a coincidence“.
However, as pointed out by Bast, “the algorithmic problem of com-
puting the fastest way to get from A to B is [. . . ] surprisingly differ-
ent on road networks than on public transportation networks.”[12],
making public transit routing significantly slower than road network
routing:
[A] change of vehicle takes time, and we want to penalize
paths with many changes of vehicle—two issues that do
not arise in road networks.
It was not until Delling’s Round-bAsed Public Transit OptimizedA free/open source
implementation of
RAPTOR has since
been published by
bliksem labs[17]
Router (RAPTOR) in 2012 [28] and the more recent Connection Scan
Algorithm (CSA)[29] that public transit routing algorithms became
on a par with road network routing—without requiring substantial
pre-computation as an earlier approach by Bast had [13].
However, the scope of scientific research into public transit routing
does not end with the routing algorithms per se. Colpaert argues that,
using linked open data, a multitude of other factors could be included
into route planning, such as weather conditions, street construction,
etc.[23]. Brosi presented a transit live map based on GTFS data he
claims is able to display vehicle movements for the whole world [19].
Analogous to the case presented by Sanders and Schultes, it can be
argued that easy access to open transit data could allow more re-
searchers to apply themselves to the advancement of this topic, lead-
ing to faster innovation.
1.3 aims of this work
Despite all the presented cases, as of mid-2014, Stadtwerke Ulm (SWU)
Verkehr and Verkehrsverbund Berlin-Brandenburg (VBB) are the only
German transit institutions offering open transit data to interested
developers, and one more transit authority stated to the author they
would follow suit in the near future. Meanwhile, German railway
operator Deutsche Bahn and a series of transit authorities have been
1.4 outline 7
implementing processes to export their schedules into the GTFS for-
mat, but provide them exclusively to Google [73].
This work aims to provide an overview over the data and data
models commonly found in German public transit, as well as a pro-
cess description for transforming one proprietary German industry
format into GTFS. It also aims to provide a better understanding of
the export capabilities and the mindsets of German transit authori-
ties towards publishing their schedules as open data, and common
obstacles identified by stakeholders towards doing so.
1.4 outline
After analyzing the data involved in the planning and operating of
public transit in Chapter 2, transit data models commonly used in
Germany are presented and compared in Chapter 3. The author
then presents two approaches he developed together with local tran-
sit providers on how to transform transit data from a proprietary
format into GTFS, in Chapter 4and Chapter 5. Finally, special con-
sideration to the reasons for the reluctant adoption of open transit
data in Germany is given in Chapter 6, starting with a look at the
legal ramifications of doing so, and analyzing current hurdles and
problems through a survey undertaken with 47 transit institutions in
Germany.
2
TRANSIT DATA
The planning and operation of public transport necessitates, gener-
ates and uses a plethora of data. Roach describes in [75] and [74] a
number of domains relevant for transit data processing in the early
1990s, namely scheduling, passenger information, Automated Vehicle
Management (AVM), fare collection as well as personnel disposition or
driver management.
An process incorporating all these domains is described by Ceder A tabular overview
over planning
activities according
to Ceder is included
in Table 8.
in [21, Ch. 1.2] as a sequence of the following steps:
network design Land-use characteristics, authority constraints
and current ridership are taken into account, the latter broken down
to time-of-day and day-of-week. As a result, the names and physi-
cal locations of stops and stations are defined, often including more
detailed information concerning specific platforms within each stop.
Pairs of stops are connected to make up a transit network, with accu-
rate descriptions of the distances between each pair of stops, both in
a spatial as well as in a temporal sense, depending on the way taken
between stops and the mode of transport used.
timetable development Using the previously defined transit
network, lines are created to group transit services using a distinctive Depending on the
vocabulary used, a
line might be called
a “journey pattern”
or a “route”. Even
more confusingly,
those names might
also be used to
describe subsets of
what is called “line”
here. See
Section 3.7.1for
more information on
this.
set and sequence of stops serviced by all vehicles within that group.
Each line is assigned an identifier, e.g., Bus #8, or the Jubilee Line.
Deviations within lines are possible. For instance, a line might en-
counter a high average ridership between a series of inner-city stops,
while ridership from there onwards is comparatively low. In order
to tackle this problem, every other vehicle might short turn [21, P. 56],
i.e., turn around at the last of the high-ridership stops, while other ve-
hicles follow the complete stop pattern. Another approach might be a
design branching after the last high-volume stop, with vehicles alter-
nating between two low-volume branches after that stop. Both short
turns and which branch is serviced by an individual vehicle should
be made aware to riders through outer displays on the vehicles and
published timetables.
vehicle scheduling Taking into account the times necessary to
prepare a vehicle for another journey once it has reached the final
stop, deadhead times from garages and depots to journey start lo-
cations and from journey end locations to garages and depots, and
deadhead times between journey end and start locations, vehicles can
9
10 transit data
be assigned to blocks or chains (German: Umlauf, see [21, Ch. 7]). For
instance, a vehicle could be assigned to follow a certain line end-to-
end for a number of times, then deadhead to a depot for refueling
and start service on another line afterwards. This process can be opti-
mized towards a balance of keeping the fleet size as small as possible
and minimizing deadheading distances. At the end of the process is
aworking timetable with all on-duty journeys and deadheads, defin-
ing which vehicle is where at what time. Note that the timetables
made available to the public are usually sub-sets of these working
timetables, excluding deadheads and not showing blocking.
crew scheduling After vehicle blocks have been created, crews
(drivers) have to be assigned to them. Since from the transit agency’s
perspective, driver’s wages are usually the largest single-cost item,
commercially available transit scheduling software often heavily fo-
cuses on crew-scheduling activities[21, Ch. 10]. This personnel dis-
position data is usually restricted to the use by the transit operator
itself and will not be shared with other agencies and even less with
the public.
fare structure While not explicitly covered by Ceder, a fareFor instance,
German railway
operator Deutsche
Bahn calculates the
standard fare based
on the product class
(regional trains,
InterCity or
InterCityExpress
trains) and the
traversed distance,
usually defining a
corridor between
certain stations
within which any
possible path can be
chosen.
structure is usually established—either just for the individual transit
operator, or, more common in Germany, for a complete linked transit
system. The fare is based on travel classes, usually derived from
the travelled distance or the number of predefined spatial zones the
journey passes through. Additionally, surcharges may be raised for
certain product classes, e.g., express lines.
This usually results in a Origin-Destination (O-D) matrix defining
the transit classes for each O-D pair, and a pricing index defining the
tariffs per transit class for single fare tickets, transit passes and, if
applicable, special fares (e.g., group passes, tickets for elderly riders,
etc.)
2.1 usage of data during operation
Once the schedule, vehicle and crew disposition as well as fares have
been defined, subsets of this data are usually provided to diverse
technical systems.
printed timetables The information necessary for typesetting
printed timetables can be exported from the planning software, for in-
stance in some kind of spreadsheet or Comma Separated Values (CSV)
format. This data can be used in desktop publishing software to cre-
ate line-based or stop-based printed timetables, or both. Line-based
timetables list all departures and/or arrivals at all stops serviced by a
certain line, while stop-based timetables list all departures and/or ar-
2.1 usage of data during operation 11
rivals of all lines servicing a certain stop. Alternatively, the planning
software automatically typesets line- or (mostly) stop-based timeta-
bles in a ready-to-print format that requires no or little subsequent
manual treatment.
(a) Testing setup employed by SWU. The IBIS rack is at the bottom; from left to right:
GSM module, IBISplus computer with connected CF storage and GPS antenna,
power supply, and RF transceiver. The upper rack emulates vehicle functions,
e.g., the door opening mechanism. On top sits the driver’s MDT user interface.
(b) Installation in a bus, above the
driver’s seat. The IBISplus computer
rack is the lower left part, complete
with voice radio equipment. Above
sits the destination display controller.
On the right side, controllers for the
bus’s CCTV and passenger entertain-
ment system are installed.
(c) The MDT user interface located within
the driver’s view, currently showing
deviation from the scheduled depar-
ture time, and upcoming stops. At
stops, guaranteed transfers can re-
quest the driver to wait for delayed
vehicles.
Figure 2:IBIS on-board computers.
on-vehicle computers On many German public transit buses,
on-board Integriertes Bus-Informations-System (IBIS)computers link var-
ious systems that make use of the stop pattern, e. g., destination signs,
inner signs (showing the next stop or stops), automated recorded stop
12 transit data
announcements, ticket validators, switch control, and automated pas-
senger counting systems[32, P. 8]. After their vehicle has been sup-
plied with the appropriate data, drivers can cycle through all the jour-
neys within their individual vehicle block by means of a user inter-
face terminal located next to their seat (see Figure 2). Making use
of the established distance between stops, and monitoring the bus’s
odometer and door-opening mechanism, it is possible to determine
the remaining distance to a stop, or whether one journey on a block
has been finished. Thus, the next stop can be announced via inner
signs and recorded announcements, and matching destination sign
information for the journey can be displayed. Positional precision
throughout the journey can be enhanced through infrared beacons
located at known positions—usually, lamp posts—and GPS receivers
(see Figure 3). Odometer inaccuracies or smaller detours, e. g., be-
cause of construction sites, can be coped with by relying on the door-
opening mechanism, which resets the distance counter to zero upon
application. Combining the odometer approach with GPS capability
can also be used to detect such deviations. The driver can also correct
for errors manually through the IBIS computer’s user interface.
(a) On-board infrared transceiver. (b) Roadside infrared beacon.
Figure 3: On-board and roadside infrared transceivers and beacons for en-
hanced positional accuracy
If applicable, passenger count information can be extracted from
the IBIS system and be fed back to the scheduler in order to evaluate
the passenger load and adjust the schedule accordingly, if necessary
(See Figure 2.2for more on this).
Furthermore, buses can interact with junction processors via in-
frared or radio transmission in order to request a green light from
the signal controller. If an authorized vehicle approaches the intersec-
tion, it registers with the appropriate junction processor, which then
decides whether to prioritize the bus over “ordinary” traffic. After the
bus has passed the intersection and deregistered, the signal controller
returns to its normal signal phasing [32, P. 4]
ticket vending machines If the transit operator intends to sell
tickets based on origin and destination of their passengers, ticket
vending machines—both stationary ones at stops or other points of
2.2 data acquisition 13
sale, and in-vehicle units—need to be supplied with the necessary
fare structure data in order to calculate the matching price. If prices
need to be determined for specific connections—i.e., a specific jour-
ney on a specific line that is served by different product classes with
different pricings—, the necessary routing data must be supplied as
well.
routing Using the transit network data, electronic journey plan-
ners can provide riders with an itinerary for their intended journey.
This service can be a web application, a telephonic interactive voice
response system, or a sub-set of a ticket vending machine’s capabil-
ities. Depending on the type of journey planner used, an itinerary
can include only one or several modes of transport, the latter being
called an intermodal journey planner. Also, journey planners differ in
whether they allow routing from any given point to any other given
point or whether start and destination locations have to be stops on a
network.
In order to allow for longer journeys being routed, transit operators
and authorities may exchange their network and schedule data with
others. For instance, transit authorities over a larger geographical
area might partner and aggregate all their data, either for use within
their own electronic journey planners—if existing—or in order to let
others provide an electronic journey planner for the complete, aggre-
gated area.
2.2 data acquisition
For the scope of this thesis, we will assume an already established and
running transit system, not one built from scratch. This means that
data concerning land use, authority constraints, ridership, and pre-
vious vehicle and crew disposition is based on empirical knowledge,
leaving only “data maintenaince”, for instance, establishing new or
changed lines or stops.
obtainment by test car and avm Any data referring to the
physical locations of stops, depots, relief-points and garages, as well
as the layout of lines can be obtained by transit agencies’ test cars
equipped with highly accurate GPS receivers. In Germany, transit
agencies usually employ test vehicles outfitted with an IBIS computer,
including an infrared transceiver for interacting with roadside bea-
cons. O-D matrices between stops subsequently allow for the planning
of journey patterns for individual lines.
Additionally, AVM systems can contribute to this data by giving
additional information on, for instance, systematic and unanticipated
delays during traffic peak times, or long layover times during off-peak
hours. If present, ridership can also be estimated, either using optical
14 transit data
Figure 4: Test car used by SWU Verkehr. The vehicle is outfitted with two
IBIS computers in the back, one in the version used for city buses,
one in the version for regional buses (lower left image). In order to
accurately measure positions of, and distances between stops and
infrared beacons, an IR transceiver sits on the vehicle’s roof (lower
right image).
sensors located at the vehicle doors, or measuring the vehicle’s pay-
load. The analysis of this statistical data can help identify unexpected
side effects of, e. g., badly-timed traffic lights, as well as changes in
ridership patterns.
2.3 excursus:real-time data
Automated Vehicle Location (AVL) systems can also contribute their
data in order to inform a control centre and riders about deviations
from the planned schedule. While originally intended to improve
operational efficiency, transit agencies have later used this informa-
tion to display it to passengers, e.g., through real-time information
displays at transit stops, or through Application Programming Inter-
face (API)s. Dziekan and Kottenhoff argue that “the mere existence
of such a system creates a general sense of trust in the public trans-
2.3 excursus:real-time data 15
port system”, and real-time information reduces the perceived wait
time by riders at transit stops significantly[31]. Ferris later extended
on this work, finding compareable results with real-time informa-
tion presented to the riders on mobile devices instead of fixed dis-
plays[87].
While this extension to the described transit data is therefore inter-
esting and relevant to transit users, the focus of this work is on bulk
target schedules, i. e., complete schedules as they were planned by a
transit agency.
In the following chapter, relevant data models for the exchange of
this data will be examined.
3
DATA MODELS
In Europe, the development of data models for data exchange within
and in between transit operators began in the late 1980s, resulting
in the ÖPNV-Datenmodell in Germany and CASSIOPE out of the ec
drive 1 research initiative. Both influenced the pan-european Trans-
model as part of the ec drive 2 initiative, which was ultimately stan-
dardized as EN12896:2006 [4]. Transmodel as a reference data model
served as a basis for European standard implementations, such as the
TransXChange standard used for bus schedules in the United King-
dom, and the Service Interface for Real Time Information (SIRI)
standard, which, however, deals with real-time schedule information.
Despite all these different standards, vendor-specific data models
still play a prominent role when encountering route network and
schedule data. On the German market, HAFAS and DIVA are two of
the major software suites used by public transit agencies—one using
a documented exchange format, the other a format with no publicly
available documentation whatsoever.
Despite there never being one transit data model adopted by any
world-wide regulatory body, in recent years, GTFS has become some
kind of de-facto standard widely used within the open data commu-
nity. First developed by Portland’s TriMet transit agency together
with Google, it is now used by Google’s Transit Planner as well as a
growing number of transit application by third parties.
3.1 the vdv transit data model (öpnv-datenmodell)
In the late 1980s, the German association of transport companies,
Verband Deutscher Verkehrsunternehmen (VDV)—which went by the
name Verband öffentlicher Verkehrsbetriebe (VÖV) back then—started
standardizing data layouts and formats in order to facilitate the ex-
change of schedules within and between transit authorities in West
Germany.
The stated goal was to increase efficiency in transit planning and
cost savings, as pointed out by Goetz in [42]. The Bison research
project—started in 1980 and funded by the German Federal Min-
istry of Research and Technology (see [51])—defined domains which
were found to be suitable for computer-aided planning and manage-
ment processes. Subsequently, different vendors started implement-
ing software systems for indidivual segments—thus, different prod-
ucts emerged, each covering only a slice of all required functional-
ity, with little or no interoperability between products from differ-
17
18 data models
ent vendors. As transit operators called for a modular concept with
compatibility between individual parts regardless of the software’s
manufacturer [42], VÖV started developing a data model out of the
Bison data model, which was originally meant to be implemented as
a central data storage within a relational database [41, P. 1]. The re-
sulting ÖPNV-Datenmodell was subsequently standardized within
the VDV-Schriftenreihe.
Today, the ÖPNV-Datenmodell encompasses several data standards
within the VDV-45x series, including the management of duty rosters
(VDV-455), realtime schedule data (VDV-453, -454 and -459), infras-
tructural data (VDV-456) and automatic passenger counting systems
(VDV-457 and -458). We will limit our attention to the exchange for-
mat first standardized in 1991 with VDV-451, as well as the network
and target schedule description defined in VDV-452 since 1999.
3.1.1VDV-Schrift 451: File Layout
The goals of the data exchange format definition according to VDV-
451 [85] are:
importing or exporting data from or to another hardware or
software platform
editing, inspecting or evaluating data, using standard software
such as text editors or spreadsheet software
The format is based on plain text files, which are named according
to DOS specifications (8+3letters in lower case) and a defined naming
pattern:
Prefix:
“i” (for interface file)
three-digit number designating the data table contained
within the file
the creation date as a three-digit Julian day number
a trailing zero
Suffix:
“.x10
Following this pattern, an example exchange file name would be
i4853580.x10 for a file containing table #485 according to VDV-451,
created on the 358th day of the year.
Alternatively, files can also be named by the name of the table
it describes, i. e., rec_frt.x10 for a file containing the table named
REC_FRT.
All entries within the file start with a three-letter descriptor indicat-
ing the meaning of the following line, separated by a semicolon.
3.1 the vdv transit data model (öpnv-datenmodell)19
Each file contains a file header, the table definition—table header, The “file header
must not be
confused with a file
signature, which
describes a file type
by using a magic
number.
records, table trailer—and a file trailer.
file header The header part describes conventiones necessary
for correctly parsing the body, i. e., the table records, of the file. It
specifies the date and time format and whether or not the record
columns follow a fixed-width (“aligned”) or free format (mod), infor-
mation about the data source (src), the character set (chs), as well As of
ÖPNV-Datenmodell
5.0, only ASCII and
ISO-8859-1can be
used as character
sets.
as version numbers of the source software (ver), the interface being
used (ifv), and the data set itself (dve). A file format field is specified,
but not used as of version 5.0(fft; "").
Imports must accept both the fixed-width as well as the free for-
mat. The fixed-width format was supposed to allow for easier visual
inspection of raw exchange files by use of a text editor, since the
columns line up on the screen.
Listing 1: Example file header in VDV-451 format
mod; DD.MM.YYYY; HH:MM:SS; aligned
src; "OneBusAway VDV Exporter"; "01.01.2014"; "00:00:00"
chs; "ISO8859-1"
ver; "0.1"
ifv; "1.0"
dve; "201401010000"
fft; ""
table definition The table itself is represented as CSV with a
semicolon (;) as field delimiter and double quotation marks (") as text
delimiters, all in conformance with RFC 4180 [79]. It is introduced
by stating the table name (tbl), column designators (atr) and data
types (frm), followed by a set of records (rec), and concluded by a
table trailer specifying the number of records within the table (end).
Comments (com) can be placed within the table name definition and
the table trailer.
Listing 2: Example table in VDV-451 format
tbl; EXAMPLE_TABLE
atr; ID; TEXT; NUMBER
frm; num[6.0]; char[8]; num[3.0]
rec; 1;"Bake 1" ; 22
rec; 2;"Bake ""2""a"; 8
com; This is a comment
end; 2
file trailer A simple eof, followed by the number of tables
within the file, marks the end of the exchange file. Since exchange
files according to VDV-451 must contain exactly one table each, this
sequence always reads as eof; 1”.
20 data models
3.1.2VDV-Schrift 452: Standard Interface for Network Plans and Sched-
ules
The VDV Standard Interface for Network Plans and Schedules (VDV-Within this model,
“schedule” describes
the employee
timetable, i.e., the
complete schedule of
any vehicle from the
time it leaves a depot
to its return,
including layovers,
dead mileage, etc.
Standard-Schnittstelle Liniennetz/Fahrplan) as standardized in VDV-452
[86] describes an Entity Relationship Model (ERM) for a route network
and the corresponding schedule. It is meant to be implemented in a
Structured Query Language (SQL) database, but can also be exported
or imported by means of exchange files according to VDV-451.
Source
System Exporter Exchange
Files Importer Target
System
Vendor
Specific
Database
VDV-452
Database
Vendor
Specific
Database
Figure 5: Data flow model using VDV-452 as exchange format between sys-
tems using vendor-specific databases. Modelled after [86, P. 14]
The standard encompasses the following items in it’s current ver-
sion 1.5:
Calendar dates (what service ID runs on which dates and when
they are in service)
Operational data (vehicle types, recorded stop announcements,
headsign texts)
Location data (stop areas, stop points, beacons, depots)
Transport network data (lines, distances, journey time groups,
journey times, stops)
Route data (routes and route shapes)
Schedule data (journeys and stop times dependent on journey
types, blocks)
Connection definition data for guaranteed transfers
The standard’s main goal is the seamless exchange of route and
schedule information within different software and hardware plat-
forms, e. g., exporting a working schedule for usage within an Inter-
modal Transport Control System (ICTS), or supplying ticket vending
machines or IBIS computers with data sets necessary for their opera-
tion.
3.1 the vdv transit data model (öpnv-datenmodell)21
table name description
BASE_VERSION_VALID Validity starting date of refer-
enced version
BASE_VERSION Version identifier
PERIOD Assignment of day types to calen-
dar dates
DAY_TYPE List of all day types (i.e., weekday,
day before a holiday, etc.)
POINT_TYPE List of point (location) types, e. g.,
stops, stations, traffic lights, de-
pots, etc.
STOP_TYPE List of location groups, i.e., stops
and depots
STOP_POINT Definition of location points
where passengers usually
(dis)embark. These points
can be grouped within STOPs
ACTIVATION_POINT Definition of other locations, e. g.,
beacons. Has a 1:1relation to a
STOP
STOP All locations within the transit net-
work, i. e., stations, depots or acti-
vation points
VEHICLE Description of vehicles
TRANSPORT_COMPANY Lists all transport companies
OPERATING_DEPARTMENT Operating department (e.g., Bus,
Metro, Subway, etc.)
VEHICLE_TYPE Description of vehicle types, i. e.,
articulated or standard buses
ANNOUNCEMENT List of announcement texts
DESTINATION List of possible destination signs
LINK Directional edges between pairs of
points, with distance in meters
POINT_ON_LINK Definition of intermediary points.
Allows for defining geographi-
cally accurate display of a route.
TIMING_GROUP Definition of timing pattern
groups
WAIT_TIME Waiting times per timing pattern
group and location
22 data models
TRAVEL_TIME Scheduled travel time per link and
timing pattern group (might differ
over the course of a day)
DEAD_RUN Directional edges between pairs of
points for deadheads
DEAD_RUN_TIME Driving time per deadheading
edge and timing group (might dif-
fer over the course of a day)
JOURNEY_TYPE List of journey times, e.g., on-
duty trip, deadheads from a de-
pot, to a depot, or between stops
ROUTE_SEQUENCE Sequence of stops serviced by a
route
LINE Definition of a route and assign-
ment to an operating department
JOURNEY Definition of a trip
JOURNEY_WAIT_TIME Trip-specific wait times per stop
BLOCK Vehicle blocks, from leaving a de-
pot until arriving at a depot
Table 1: List of data tables specified in VDV-452 [86].
3.2 uniting european standardisation efforts:trans-
model
Starting in 1988 with the Dedicated Road Infrastructure for Vehicle
Safety in Europe (DRIVE) programme, the Commission of the Euro-
pean Community (EC) endeavoured to improve road safety, improve
transport efficiency and reduce environmental pollution through a se-
ries of research projects [14]. Within drive, the Computer-Aided Sys-
tem for Scheduling Information and Operation of Public Transport
in Europe (CASSIOPE) project provided a first specification of func-
tional requirements for an Integrated Road Transport Environment
and defined the “approach to standards and protocols for the higher
communication levels” within this environment[84].
Before the end of the drive programme in 1992, its Requirements
Board recommended extending the findings, e.g., through field tri-
als[14, P. 252], resulting in the drive 2 programme. Within drive
2, the EuroBus (1992-1994) and Harpist (1995) projects further devel-
oped the CASSIOPE findings and included the experiences with VDV’s
ÖPNV-Datenmodell. In order to standardize the previously parallel
development of CASSIOPE and ÖPNV-Datenmodell, both project work-
ing groups were considered subgroups of Working Group 3within
3.2 uniting european standardisation efforts:transmodel 23
Figure 6: Entity relationship diagram of the VDV-452 data model.
the technical committee 287 of the European Committee for Standard-
ization (CEN)[18]. CEN/TC287 is “the technical committee responsible
for the development of standards related to Public Transports”[80,
P. 4], and the findings of EuroBus and Harpist were presented as
a pre-standard to the committee at the end of 1995 [18]. After in-
corporating feedback, version 4.1of Transmodel was published in
1996 and formalized into European prestandard ENV 12896 in Au-
gust 1997 [16,18]. Since this point in time, Transmodel serves as “a
pan-european reference model for public transport operating compa-
nies”[34, Ann. 2].
features In its revised version V5.1adopted in 2001 [18] and stan-
dardized as EN 12896:2006 [4], the datamodel includes the following
elementary data and domains:
Network description
Versions management
Tactical planning (Vehicle scheduling, driver scheduling, roster-
ing)
24 data models
Personnell disposition
Operations monitoring and control (including AVM)
Passenger information (offline and online)
Fare collection
Management information and statistics
The standard also takes multi-modal transport and the modelling
of multiple operators into account. While it is focused mainly on bus
operation, trolley bus and light rail (i.e., tramway or metro) transport
modes are also addressed[18].
Earlier versions of the standard also included references to the
Geographic Data File (GDF)[3] standard and “link[s] with traffic and
road data”[34].
implementations Since Transmodel is only a reference data
model[15], it has led to the development of several concrete Exten-
sible Markup Language (XML)-based implementations, including the
TransXChange standard in the United Kingdom[66], the Identifica-
tion of Fixed Objects in Public Transport (IFOPT) standard for identi-
fying fixed, transport-related objects[6], and SIRI as a standard to ex-
change real time information [7]. Additionally, a Transmodel-based
XML schema for exchanging GTFS stop and schedule information ex-
ists[53].
3.3 german industry standards:hafas
The origins of the HaCon Fahrplan-Auskunfts-System (HAFAS), de-
veloped and distributed by HaCon Ingenieursgesellschaft mbH in Han-
nover, date back to 1988. Back then, HaCon developed a first elec-
tronic journey planner as part of Deutsche Bahn’s Kurs 90 project,
which was able “to load the complete [Deutsche Bahn] schedule onto
a screen within six seconds”[65].
HAFAS became the electronic journey planner of choice for Deutsche
Bahn, first as stand-alone software that customers could download
and use on their Personal Computers, later as a online web service.
Today, according to HaCon, HAFAS software is used throughout “more
than 100 installations in nearly 20 countries”[47].
3.3.1Features
HAFAS depends on a server installation which processes the relevant
information and offers interfaces to compatible output channels. Ha-
Con advertises the following data exchange features:
3.3 german industry standards:hafas 25
Scheduling information, including network and geography data
Real-time data
Data exchange through matching modules for ticketing, GIS ser-
vices and fare calculation
Possible interfaces for delivering the data include Internet fron-
tends, mobile clients for transit personnel, print output clients, offline
clients and smartphone applications.
3.3.2HAFAS Exchange Format
HaCon developed their own, proprietary data format for exchang-
ing data between HAFAS installations, the HAFAS Rohdatenformat. Until
the Swiss SBB decided to publish their HAFAS-based raw schedules
information in 2014 [50], no specification of the format was publicly
available—apart from occasional excerpts that had found their way
onto the Internet and were exchanged between open transit data de-
velopers [89]. While it can not be ascertained that the documentation
provided by SBB covers the general workings of the HAFAS Rohdaten-
format, or just a variant used by SBB, it offers some insights into the
principal data layout.
Data is stored in text files encoded according to ISO-8859-1and
CRLF line terminators, with no file or column headers introducing
the data that follows. Instead, each line represents a record, and
follows a fixed-column format where fields are padded with spaces,
if necessary. Comments are introduced by a %sign, and all input after
them is ignored.
As an example, the BFKOORD_GEO file defines stop coordinates in the
WGS84 Coordinate Reference System (CRS), and follows the following
pattern:
a seven-digit integer, identifying the stop, in columns 17,
the X coordinate, in World Geographic System 1984 (WGS84) for-
mat (NNN.nnnnnn), in columns 918,
the Y coordinate, in WGS84 format (NNN.nnnnnn), in columns 20
29,
the Z coordinate, in metres above sea level, in columns 3136
(left justified)
a comment with the stop name (only for improved legibility), in
columns 38pp.
As can be seen in the example excerpt in Listing 3, the X and Y
coordinates are justified as if the leading zeroes were present.
26 data models
Listing 3: Excerpt from the stop coordinate example file provided by SBB
0000168 7.589551 47.547405 277 % Basel SBB
0001522 8.310170 47.050170 436 % Luzern
0001560 9.529195 47.003835 504 % Maienfeld
0001714 11.558334 48.140232 0 % München Hbf
8503424 8.632728 47.698282 404 % Schaffhausen
The practise of explaining each line’s contents with comments at
their end is used throughout many of the files provided by SBB, al-
though the documentation suggests this is merely an option for better
legibility, and not expressly required.
While most of the files provided by SBB define stops, their coor-See Listing 4for an
example. dinates, vehicle types, transfer information, etc., the actual schedule
information resides in a single FPLAN file. Journeys are defined by a
series of initialization lines, e.g., an initial line starting with Z*, defin-
ing the journey number, operator, variant of the journey and, option-
ally, journey frequencies. After the journey has been parametrized
through these lines, the journey pattern is defined line by line, each
line starting with the stop identifier, the stop’s name (optional, for
better legibility), arrival and departure times, and, optionally, new
journey or operator identifiers if they should change at that stop.
Listing 4: Excerpt from the schedule example file provided by SBB
1*Z 19704 000065 001 % 19704
000065 001 (001)
2*G SN 8503424 8014558 00110 00130 % 19704
000065 001 (002)
3*A VE 8503424 8014558 348970 00110 00130 % 19704
000065 001 (003)
4*A Z 8503424 8014558 00110 00130 % 19704
000065 001 (004)
5*R% 19704
000065 001 (005)
68503424 Schaffhausen 00110 % 19704
000065 001 (006)
78014487 Herblingen 00113 00113 % 19704
000065 001 (007)
88014490 Thayngen 00118 00119 % 19704
000065 001 (008)
98014491 Bietingen 00121 00121 % 19704
000065 001 (009)
10 8014492 Gottmadingen 00124 00124 % 19704
000065 001 (010)
11 8014558 Singen (Hohentwiel) 00130 % 19704
000065 001 (011)
While the SBB export does not provide any information on sev-
eral items found in VDV-452 or Transmodel, due to the function-
alities advertised on HaCon’s website and the widespread use of
HAFAS throughout German public transit authorities, it is safe to as-
3.4 german industry standards:diva 27
sume HAFAS is capable of modelling and exporting a number of them,
and the SBB export is merely a limited subset of HAFAS’s capabilities.
However, due to lacking publicly available documentation, no defini-
tive conclusion can be drawn on this.
3.4 german industry standards:diva
The Dialoggesteuertes Verkehrsmanagement- und Auskunftssystem
(DIVA) schedule management system was first developed by Munich-
based Mentz Datenverarbeitung GmbH in 1979 and has been marketed
to transport operators and authorities ever since[58]. Today, it is used
by operators and authorities throughout Europe, Australia and the
USA[59]. In Baden-Württemberg, 9out of 22 transit authorities are See [71] for details
on the
Verkehrsverbund
principle.
using DIVA, including Nahverkehrsgesellschaft Baden-Württemberg
(NVBW), a subsidiary of the State’s Ministry for Transportation and
Infrastructure acting as a coordinator between the state’s Verkehrsver-
bünde.
It is both used by SWU Verkehr—the transit branch of the mu-
nicipal works run by the twin cities of Ulm (Baden-Württemberg)
and Neu-Ulm (Bavaria)—as well as by Donau-Iller-Nahverkehrsver-
bund GmbH (DING), the transit authority responsible for the linked
transport system encompassing Ulm and Neu-Ulm as well as the sur-
rounding districts..
An electronic journey planning system, Elektronische Fahrplanaus-
kunft (EFA), is also marketed by Mentz. It is tailored to receive data
exported from DIVA in a vendor-specific format that is not publicly
documented.
3.4.1Features and Data Exchange Compatibility
The main features of DIVA, as advertised by Mentz, are:
Timetable management
Timetable and vehicle scheduling
Optimisation of vehicle schedules
Duty scheduling
Duty schedule optimisation
Personnell deployment
Geography
Cartography
Generation of timetable books and posters
28 data models
Transfer optimisation
Import and export plugins for a variety of exchange formats
Data supply for onboard systems (e.g., IBIS computers)
Data export for Mentz’s journey planner, EFA
Not all versions of DIVA offer all these features. For instance, Mentz
offers a variety of data importers and exporters, e. g., for GTFS, VDV-
452, VICOS LIO. However, each and any of these import and export
plugins must be licensed separately. In 2013, a single license for the
diva2vdv exporter was quoted at 14 100 EUR, with the 2nd to 5th li-
cense costing 7 700 EUR each1. Thus, DIVA and EFA in their basic vari-
ant form a closed ecosystem co-dependent on each other, relying on
either using both Mentz’s products or licensing additional interfaces
to be able to export the data for third-party route planning applica-
tions.
3.4.2Internal Raw Data Format
For data exchange between DIVA installations, raw data can be ex-Since Mentz offered
no documentation of
the format, all
findings within this
chapter are based on
the authors
dissemination of
schedule exports
made available by
DING and SWU.
Note that this
analysis is not
necessarily a
comprehensive
documentation of
the format, as it may
differ between
individual
deployments of DIVA.
ported and then re-imported into another DIVA installation through
a “DIVA2DIVA plugin. In contrast to the exchange format used by
HAFAS and described in Section 3.3, this exchange format is not doc-
umented and not intended to be used by third parties2. The data
formatting being used in most of the files is akin to the exchange
format of VDV-451: Data is stored in plain text files, with tabular def-
inition following the syntax of VDV-451. However, each text file may
contain several tables, some with primary and foreign keys and no
information whatsoever about the relationship between these tables.
For instance, DING provided the author with a file export containing
all stop data, haltestellen.ding, which defines 32 distinct tables, 11
of which contain no records at all. Of the remaining 21 tables, 19
tables have a foreign key defined and 4tables have a primary key.
While primary keys are always called _AutoKey_, and foreign keys
_FK__AutoKey_, the file contains no further explanation or clues as to
the relationship of the tables.
file naming pattern Most of the files are using the German
term of their content, or abbreviations thereof, e. g., tgtyp for Trans-
portgefäßtyp3. In contrast, files starting with numerical identifiers ap-
pear to exclusively model information concerning a specific line. The
1Reference: E-Mail exchange between DING and Mentz DV, October 2013.
2Reference: E-Mail exchange between the author and Mentz DV, 2013-10-07: “The
[DIVA] data format is way too complex and a purely internal format. For data
exchange with other systems, we use [.. . ] standardized formats such as VDV-452
[. . . ].”
3Vehicle type
3.4 german industry standards:diva 29
naming pattern of those line descriptions is similar to the pattern
described in the CSV exports used in Section 4.1.
99
Operator
line
073_.j13
Project
The first two digits of the file name are the numerical identifier Line numbers may
be followed by
distinguishing
suffixes, e.g.,
differentiating
between seasonal
variants of the same
line.
of the operating department responsible for the respective line. The
following four characters are the line number itself.
In most cases, when a file extension is being used, it describes a
logical assignment. For instance, the extensions of the lines desig-
nate the Project they are part of—namely, a specific schedule with a
validity start and end date.
A comparable pattern can be found for the teilstrecken4files,
which uses their corresponding operating department identifiers as
file extensions. Within the SWU exports, a combination can even be
found: The files defining Fahrzeitprofile5,Fahrzeitgruppen6, and Halte-
zeiten7, are subsequently appended with the operating department
and project they describe or are part of, respectively.
3.4.3File headers
Files containing tabular data begin with a header similar to that of See Section 3.1.1for
comparison.
VDV-451. Several header entries appear to be directly based on their
respective VDV-451 counterparts, although with a minor change in
one instance: The mod entry denotes only the modification time, not
the formatting of the file. As in VDV-451, a chs field denotes the
character set used within the file. The usr field appears to make use
of the domain and user name of the system the data is exported from.
In place of the src field found in VDV-451, an exe entry seems to
denote the data source software, and ver should, again, refer to the
source software version.
Listing 5: Typical DIVA file header
FORMAT000005;
mod;15.08.2012;15:41:26
usr;\\DING-ULM\Diva
cpt;NTDIVAAPP
exe;DIVA Grunddatenserver (3, 5, 23, 2)
lib;14
mrv;1
dvn;7
dfm;5
4Partial sections
5Travel time profiles
6Travel time groups
7Layover times
30 data models
ver;0
chs;ISO_LATIN_1
sep;0x3b
eoh;
The author was not able to make more than educated guesses about
the rest of the header entries. In between files, the first line alternated
between various versions, from FORMAT000001 to FORMAT000050, al-
though no meaning could be derived from this entry so far. The rest
of the entries appear to refer to version numbers of libraries—and eoh
marks the end of the file header.
3.4.4Line Definition Files
Within DIVA’s exchange format, each line is completely defined in a
single text file that follows a formatting pattern unique to those files.
As can be seen in Listing 6, after an initial header, each line starts
with a two letter combination which appears to serve as a identifier
of the dataset type in this line. The text files containing line definitionSee Chapter 5for
further reference. data use a different format altogether.
Listing 6: Excerpt of DIVA line definition file for SWU bus No. 15
1FORMAT002009000000000023 "\\DING-ULM2\\Doe" 27.03.13 14:49:220
BW00000000000000000000
2EMYYYYYYYYYq08YY0YYYYNY
3FW012H105010581048105910601061107910781087116012411240
4WdH S S
5WaH
6WzH 000000000
7WlH
8MSH01 01 01 01
9STH0120001 0011 0021 0031 0041 0051 0061
0071 0081 0091 0102 0112
10 ZZH
11 PFH 02
12 FTH00001120002*0601020504020101N0265000 -099900000000000000
13 FTH00002120002*0601020607020101N0265000 -099900000000000000
14 FAH00515000000001051510000110 0 000000000
0000150033000150100000000000026415001-0999N 0
000
15 FAH00615000000001061510000110 0 000000000
0000150048000150100000000000026415001-0999N 0
000
16 FAH00725000000001072510000210 0 N 000000000
0000150018000150100000000000026415001-0999N 0
000
17 FAH00825000000001082510000110 0 N 000000000
0000150025000150100000000000026415001-0999N 0
000
3.5 along comes google:gtfs 31
18 FAH00925000000001092510000110 0 N 000000000
0000150057000150100000000000026415001-0999N 0
000
19 UEHO1 000000 "" 08126L03 Y00000[...]
20 UEHO2 000000 "" 08245L03 N00000[...]
21 EEH "Universität Süd" 0051500000000000000_00000000000000000000
22 EEH "Universität Süd" 0061500000000000000_00000000000000000000
23 EEH "Universität Süd" 0072500000000000000_00000000000000000000
24 EEH "Universität Süd" 0082500000000000000_00000000000000000000
25 EEH "Universität Süd" 0092500000000000000_00000000000000000000
26 BUH "15" "bus" "SWU_Verkehr" "Willy-Brandt-Platz - Universität
d" "" "" "" 11111600N ""N ""
27 LAH00000000000010000000012
28 La000001050000000000
29 La000001058000000000
30 La000001048000000000
31 La000001059000000000
32 La000001060000000000
33 La000001061000000000
34 La000001079000000000
35 La000001078000000000
36 La000001087000000000
37 La000001160000000000
38 La000001241000000000
39 La000001240000000000
The author reverse-engineered most of this—also undocumented—
format in order to transform it into GTFS, first through intermediary
CSV files of SWU’s schedule, and then directly out of DIVA, with the
complete schedule of the DING transit system. The findings of this
process are laid out in Chapter 4and Chapter 5, with the latter ex-
plaining the workings of the DIVA data format.
3.5 along comes google:gtfs
What was to become GTFS started out as a side project of Google em-
ployee Chris Harrelson in mid-2005, who “monkeyed around with
ways to incorporate transit data into Google Maps [.. . ] when he
heard from Tim and Bibiana McHugh, married IT managers at TriMet,
the transit agency for Portland, Ore[gon]”[76, P. 3]. McHugh found it
“very frustrating to try and find transit directions in [. . . ] unfamiliar
cities”, it being “much easier at that time to get driving directions
from popular online mapping services” [57, P. 126]. Having realized
this status quo as being a potentially encouraging fact for choosing
car usage over public transit, the McHughs eventually got into con-
tact with Google. TriMet provided Google with CSV files from their
existing enterprise database, based on TriMet’s database schema, and
in December 2005, Portland became the first city to be featured in the
first version of Google’s “Transit Trip Planner[38]. This extension
of Google Maps allowed querying bus and light rail schedules from
32 data models
File Describes SWU DING
anschlb Transfer definition
Aushangbeschreibungen Public notices
bzw Operating branches
fahrzeitprofil Journey time patterns
ferien Holidays
fz_gruppe Journey time pattern
group
haltestellen Stops
haltezeit Stopping times
hinweise Footnotes, e.g., for
transfers
hstattr Operators responsible
for specific stops
()
hst_liste unknown
hz_gruppe Stopping time group
lnrlit Lines and their opera-
tors, per project
mastmat Unknown () ()
pkbez Project definition
streckenzeit unknown
tarifz Fare zones and their
adjacency
teilstrecken Partial sections of jour-
neys
()
tgtyp Vehicle type
umstmat Transfer matrix with
minimum transfer
times
unter Transit operators
unt_addr Operators’ addresses
uvz_texte Subdirectory names
vbesch Service restrictions
vmtext Means of transport de-
scription
()
Table 2: File names encountered, their presence in SWU and/or DING ex-
ports, and what is defined within them. In some instances, files
were present but appeared to be legacy files only—in these cases,
the checkmark is enclosed within brackets.
3.5 along comes google:gtfs 33
within the service, akin to the journey planning service for drivers,
pedestrians and, since 2010, bicyclists [46]. In September 2006, five
more US cities were added to the Google Transit Trip Planner, and
the data format released as the Google Transit Feed Specification [48].
agency fare_attributes
fare_rules
routes shapes
calendar
trips calendar_dates
frequencies
stop_times
transfers
stops feed_info
agency_id
fare_id
route_id
route_id
shape_id
service_id
trip_id
trip_id
stop_id
stop_id
trip_id*
Figure 7:GTFS class diagram. The relationship between the transfers and
trips table is a feature of the Google Transit extension to GTFS.
In the United States, unlike Germany, there had not been any stan-
dard for public transit timetables prior to the advent of GTFS, not
even a de-facto standard. According to long-time Bay Area Rapid
Transit (BART) website manager Timothy Moore, before the advent of
GTFS,BART had to provide different data consumers with different
formats, making a standardized transit format very desirable[76, P.
23]. The publicly and freely available format specification, as well
as the availability of GTFS schedules, quickly made developers base
their transit-related software on the format. This resulted in “hun-
dreds of useful and popular transit applications”[57, P. 129] as well
as catalogues listing available GTFS feeds, such as the GTFS Data Ex-
change[26]. Due to the common data format those applications ad-
here to, solutions do not need to be custom-tailored to one transit
34 data models
filename required defines
agency.txt One or more transit agencies that
provide the data in this feed.
stops.txt Individual locations where vehi-
cles pick up or drop off passen-
gers.
routes.txt Transit routes. A route is a group
of trips that are displayed to rid-
ers as a single service.
trips.txt Trips for each route. A trip is a se-
quence of two or more stops that
occurs at specific time.
stop_times.txt Times that a vehicle arrives at and
departs from individual stops for
each trip.
calendar.txt Dates for service IDs using a
weekly schedule. Specify when
service starts and ends, as well as
days of the week where service is
available.
calendar_dates.txt Exceptions for the service IDs de-
fined in the calendar.txt file. If
calendar_dates.txt includes ALL
dates of service, this file can fully
replace calendar.txt.
fare_attributes.txt Fare information for a transit or-
ganization’s routes.
fare_rules.txt Rules for applying fare informa-
tion for a transit organization’s
routes.
shapes.txt Rules for drawing lines on a map
to represent a transit organiza-
tion’s routes.
frequencies.txt Headway (time between trips) for
routes with variable frequency of
service.
transfers.txt Rules for making connections at
transfer points between routes.
feed_info.txt Additional information about the
feed itself, including publisher,
version, and expiration informa-
tion.
Table 3: Overview of necessary and optional files of a GTFS feed, and their
content according to the specification[44]. The GTFS vocabulary is
used.
3.5 along comes google:gtfs 35
operator, but can easily be extended to any region where a GTFS feed
is available.
3.5.1File format
AGTFS feed consists of at least six, and up to 13 plain text files, prefer-
ably in UTF-8text encoding with CR or CRLF line endings[44]; see
Table 3for a complete listing of all files cited from the GTFS reference.
Each file contains tabular data as CSV, formatted according to RfC
4180 [79]. The first line of each file must specify the column names
used within the file, with each file having a set of mandatory and
another set of optional columns. Field values are case-sensitive, must
not contain Hypertext Markup Language (HTML) tags, comments or
escape sequences, and may not contain tabs, carriage returns or new-
lines.
In its minimal form with only the required files, GTFS can model
one or more transit operators’ schedule, as well as the geolocation of
stops. Making use of the optional files, the feed can be enriched with
fare information, exact line shapes, transfer rules in between lines, as
well as service exceptions (i. e., journeys which run or don’t run on
specific dates).
Listing 7: Excerpt of SWU’s GTFS stops.txt
1"stop_id","stop_code","stop_name","location_type","parent_station
","stop_lon","stop_lat"
29001745,"Arena","Arena",1,,10.00564,48.38168
3900174501,"Arena","Arena",0,9001745,10.00577,48.38237
4900174502,"Arena","Arena",0,9001745,10.00564,48.38168
3.5.2Features
As outlined in this chapter’s introduction, GTFS is aimed towards pro-
viding bulk schedule data for applications tailored towards the needs
of riders for journey planning purposes. Therefore, the format lacks
operational information provided by other standards that are used for
internal purposes, such as anything relating to personnell, or parts
only relevant for a working timetable—e.g., deadheads.
In its basic form, GTFS allows to model a transit schedule, both
timetable-based and frequency-based, for light rail8, subway or metro
lines, rail, bus, ferry, cable car, gondola, and funicular service. This
does not suffice to model the subtleties of, e. g., rail services, which
usually differentiate between a multitude of different train classes,
ranging from regional service over inter-regional trains and long dis-
tance trains to high speed rail service. Also, the basic GTFS allows only
8“Light rail”, “Tram” and “Streetcar all fall under the same identifier in GTFS.
36 data models
for stop-based transfers, i.e., whether transfers are possible from one
stop to another, not for line-to-line or journey-to-journey transfers.
Therefore, transfers which are only guaranteed during certain times
or in between specific lines can not be modeled in the basic version
of GTFS.
The Google Transit Extension to GTFS resolves these issues, and
allows for a series of other features commonly found in transit mod-
elling: Station entrances can be defined, as well as stop point names
(i.e., Track #4), or stop points only served by a particular set of ve-
hicles[70]. This extension already partly implements the Extended
Route Types proposal[43], which expands on the possible line types
and allows to differentiate between a number of different rail, coach,
bus, urban railway, and taxi services—right up to horse-drawn car-
riages.
3.6 comparison of models
The feature sets and modelling capabilities of the presented data mod-
els vary widely: While some allow for the modelling of many (if not
all) aspects listed in Table 8,GTFS is only fit for the data targeted
towards the end-users of public transit. As VDV-451/VDV-452 and
Transmodel were the results of a long-lasting development in con-
junction with European transit agencies, they have over time evolved
to incorporate many necessities in modelling and exchanging the
complex data encountered in public transit. Or, as Kizoom and Miller
put it, Transmodel “has already encountered and adressed many of
the additional requirements that GTFS is encountering piecemeal is it
follows a path already well trodden in Europe”[53].
While, due to the lack of publicly available documentation, no
definitive assessment can be made about the capabilities of the DIVA
and HAFAS data formats, it is safe to assume they have been follow-
ing the development of Transmodel and VDV-452 and offer compa-
rable feature sets—especially so since the VDV-452 specification lists
interfaces distributed by each of the two vendors quoted as being
inter-operable with the VDV-452 data model.
GTFS, on the other hand, makes it hard to model some of the con-
ditions commonly encountered in public transit in Germany, for in-
stance when multiple transit operators offer transit services within
aVerkehrsverbund.GTFS offers the choice of either assigning all lines
to the transit authority, losing information as to what operator offers
which line, or assigning lines to the respective operator, and conse-
quently losing the logical connection in between those operators that
is offered by the transit system. Also, some concepts prevalent in Eu-
ropean public transit (especially concerning railway systems) prove
difficult to implement into GTFS. As an example, the S1line of the
Munich S-Bahn incorporates portion working, i. e., upon arriving in Ne-
3.7 challenges of transforming and merging transit datasets 37
item vdv transm.hafas diva gtfs
Specification publicly
available
()
Freely licensed specifi-
cation
()
Schedule information
Fare information ?
Operational data (vehi-
cle sizes etc)
?
Location data (stops
etc)
Line- and stop-specific
URI fields
Connections and guar-
anteed transfers
Rosters and duty infor-
mation
?
Table 4: Comparison of data models. Note that “VDV”, in this context,
refers to VDV-452 only. Additional VDV standards are available,
e.g., for rostering. Only the publicly available feature information
on the HAFAS-Rohdatenformat is used.
ufahrn, the train is divided, with the front part continuing its journey
to Freising, while the back part of the train goes to Munich Airport.
Furthermore, in his process description of transforming the sched- In interviews and
the evaluation
described in
Section 6.2,
complete vehicle
blocks with pull-outs
and deadheads were
examples of data
which must not be
made known to
potential
competitors.
ules by Stadtwerke Münster into GTFS, Müller points out that the
source working timetable included operational data not intended for
publishing, e. g., information on vehicle blocks and deadheads, or ros-
tering information[63]. While the rostering information cannot even
be modelled in GTFS, vehicle blocks including deadheads must be re-
moved from the working timetable in order to arrive at a resulting
GTFS schedule fit for public release. Apart from this internal informa-
tion concerning deadheads, GTFS constitutes a clearly specified subset
of transit data that can safely be distributed to the public without acci-
dentally disclosing internal data that might be considered a corporate
secret.
3.7 challenges of transforming and merging transit
datasets
Establishing a process to create open transit data poses a series of
challenges. For one, even if transit authorities use a standardized
38 data models
format to model their schedules, e. g., VDV-452, there are still differ-
ent approaches on how to actually model the data. Also, the previ-
ously addressed issue of creating presentational timetables out of the
working timetables, e. g., removing pull-outs, deadheads and pull-
ins, must be addressed if one wants to create a schedule that can be
publicly distributed.
Another challenge, especially when merging datasets from differ-
ent transit authorities, lies in different identifiers used for the same
items throughout different transit operators or authorities. For in-See the HAFAS
excerpt in Listing 3,
where SBB models
Deutsche Bahn
stations with SBB
identifiers.
stance, while Deutsche Bahn uses their own system of identifying
train stations[64], railway operators in other countries or regional
transit authorities may integrate the same station, albeit in their own
identification system. Some data models allow to identify stops by
different identifiers—DIVA, as an example, has multiple ways of adress-
ing stop areas, including optional fields for Deutsche Bahn identifiers
and IATA codes, where applicable.
To address this issue, OKF’s Open Transport working group has
proposed building a referential database for stop identifiers, which
might serve as a translation aid between different nomenclatures[45].
The same problem arises with journeys that span the area of multi-
ple transit authorities and will subsequently show up in the schedule
data of more than one authority. This could be mitigated by either
cross-checking the operator identifiers in each schedule and merg-
ing overlapping journeys accordingly, or by only using the schedule
of the transit operator as a reference for these journeys and ignoring
them in the transit autorities’ schedules. Nonetheless, manual inspec-
tion and correction of the resulting schedules is most likely necessary.
3.7.1Transit Vocabulary
Another problem is the result of the differing capabilities in between
transit data models explained in Section 3.6. While VDV-452 and
Transmodel differentiate between stop points as locations where pas-
sengers can embark and disembark, and stop areas which aggregate
groups of stop points, this distinction is less pronounced in GTFS,
where everything is a “stop”—although logical connections can be
assigned to model the stop point/stop area dichotomy.
The data models also use different terminology when defining the
same principle. For instance, what is known as a “route” in GTFS isSee the VDV-452
table reference in
Table 1called a “line” in VDV-452, and a GTFS “trip” corresponds to a VDV-
452 “journey”. Transmodel uses an even more refined vocabulary,
distinguishing, for example, between a “trip” as a journey made by
a passenger, and a “vehicle journey” as the journey undertaken by a
vehicle.
These differences between the different data models can be a source
of confusion when converting from one format to another. Open
3.8 example data flow at swu verkehr and ding 39
transit data developers have made efforts to build transit vocabularies
that allow the mapping of format semantics, i. e., [27] or [36].
For the scope of this document, the author closely follows the vo-
cabulary proposed by the OKF’s Open Transit group[36]:
stop point A point where a vehicle stops to let passengers embark
and disembark
stop area A collection of stop points
journey pattern An ordered list of stop points
journey A single run of a vehicle along a journey pattern
route A collection of journey patterns following the same commer-
cial direction
line A collection of—usually two—routes
stop time The time when a vehicle is scheduled on a stop point.
Extending on the Open Transit group’s vocabulary, a stop time
for an scheduled arrival is called an arrival time within this work,
and a stop time for a scheduled arrival is called the departure time
3.8 example data flow at swu verkehr and ding
SWU Verkehr is the public transit subsidiary of Stadtwerke Ulm, the
municipal works owned by the twin cities of Ulm (in the state of
Baden-Württemberg) and Neu-Ulm (in the state of Bavaria). It is re-
sponsible for most bus and tram public transit within the city bound-
aries, making it the public transit operator offering the second-most
transit services within the DING linked transport system, surpassed
only by the bus and rail services offered by Deutsche Bahn and its
subsidiaries. In turn, DING is the transit authority responsible for co-
ordinating the 33 transit operators offering services within the city of
Ulm and the districts Alb-Donau-Kreis, Biberach and Neu-Ulm.
Through interviews with employees of SWU Verkehr and DING, the
following process was identified, from the planning stage to supply-
ing the vehicles with operational data.
3.8.1Planning Stage
In 2001,SWU procured DIVA by Mentz Datenverarbeitung as a plan-
ning tool. Pre-existing stop names and identifiers were entered into
DIVA and supplemented with additional information, such as ICTS
identifiers. Based on the stop locations, route segments were created
between stop pairs to create an O-D matrix, and distances acquired
40 data models
through SWU’s test car and on-board IBIS computers. Finally, the pre-
existing timetables were entered into DIVA, completing the software’s
roll-out.
If new stops or new O-D edges are introduced, distances are usually
first estimated in a Geographic Information System (GIS) software,
and later refined through test car measurements. Note that the spatial
shapes of the O-D edges are not recorded, and actually never show
up in SWU’s data store—all of the measurements only concern the
accurate distance between stops.
After having entered lines and journey patterns into DIVA, the Bild-
fahrplan9is created. In this step, pre-determined guaranteed transfers
serve as guidelines as to where vehicles must meet at certain stops.
Building up from these anchors, the rest of the working timetable
and blocks are determined. After this step, variants are set up for
different operating day types.
If the resulting working timetable has passed all plausibility checks,
it is exported to the ICTS system, and to the personnell disposition de-
partment, where matching service shifts are created based on it. Af-
terwards, the working timetable is reduced to representational timeta-
bles showing only on-duty journeys. This representational timetable
is then exported for further use through DING, and serves as the base
for typesetting line- and stop-based printed timetables.
3.8.2Operational Stage
After transferring the planning data into an operational exchange for-
mat for export onto SWU’s vehicles and its ICTS system, the distances
between stops are re-measured, if necessary. Afterwards, matching
destination texts are added for each journey pattern, both for the
vehicles’ internal signs, as well as for on-stop information displays,
and the appropriate recorded messages for each stop. Finally, opera-
tional details are added, such as roadside infrared beacons, activation
points for registering with junction processors, and the necessary data
for displaying protected transfers between journeys on the driver’s
Mobile Data Terminal (MDT) display.
After this operational dataset is complete, it is rolled out through
aDepot Data Manager, which distributes it to all affected vehicles
through the Wireless LAN throughout the depot, and it is imported
into the ICTS.
3.8.3Data Handling by DING
Upon every schedule change, DING requests all the transit operatorsWithin the DING
linked transit
systems, schedule
rollover occurs once
a year in December.
within its area of responsibility to send in their respective changes
9A graphical working time table in the form of a time-distance diagram.
3.8 example data flow at swu verkehr and ding 41
from last year’s schedule. In SWU’s case, the data can be directly im-
ported from SWU’s DIVA data export; after importing the data through
the DIVA2DIVA interface, only minor manual corrections are neces-
sary. Other operators usually submit the changes from last year’s
schedule as spreadsheet documents or PDF files with relevant sched-
ule changes highlighted within the document. The deviations from
last year’s schedule are then manually entered into the DIVA installa-
tion maintained by DING.
Interestingly, though one other operator within DING’s area of re-
sponsibility also uses DIVA in their planning stage, their data has
proven unfit for directly importing them into DING’s DIVA system.
This is a result of that operator using a different modelling approach—
i.e., though two parties use the same planning software and the
same underlying data model, data compatibility is not automatically
achieved.
After having finalized the aggregated schedule, timetables in poster
and book form are typeset automatically, and PDF files for each route
and line are exported for upload on DING’s website. The DIVA files are
then imported into DING’s online journey planner EFA, from where
the data is exchanged on a daily basis with the EFA installation main-
tained by NVBW. Through this data exchange system, NVBW’s EFA
system is able to plan journeys in and through all the transit systems
in Baden-Württemberg contributing to their database, and additional
transit systems bordering on the state. In return, the aggregated data
is re-distributed to the partnering transit authorities.
4
EXPORTING DIVA DATA: A FIRST APPROACH
THROUGH CSV EXPORTS
The author first attempted in cooperation with SWU to make their
timetable data available in GTFS format in 2012. Neither SWU nor
DING own licenses for bulk schedule export into any standardized ex-
change format other than the undocumented DIVA exchange format.
However, schedule information can be exported through the follow-
ing interfaces:
Timetables can be exported from DIVA as CSV tables with one file
containing a tabular schedule for one service type (weekdays,
saturdays or sundays) per route, each. These tables are meant to
be used for typesetting printed timetables in desktop publishing
software. An example table is listed in Table 5.
Stop information and geolocation can be exported from VICOS
LIO as a Keyhole Markup Language (KML) file.
A Tagesartenkalender (type of day calendar) is maintained by SWU
Verkehr as a Excel spreadsheet, allocating a numeric type of day
key to each calendar day.
Additionally, shapes for each route were hand-drawn by a SWU
employee once and provided in KML format.
Since, at this point, the author considered the DIVA exchange format
to be too obscure to be used as a base for a GTFS transformation, he
chose to analyze whether the data provided by SWU was sufficient in
order to create at least a basic GTFS feed.
4.1 exporting data from csv timetables
The data provided by SWU consists of a series of CSV files which are
named according to a common pattern [10,11][0-9]{3}z[H,R][0,2,3]\.xls.
For example, the schedules of line number 3would be found in the
following files:
10003zH0.xls
10003zH2.xls
10003zH3.xls
10003zR0.xls
10003zR2.xls
43
44 exporting diva data:a first approach through csv exports
0 1 2 3 4 5 6 7 8 9 10 11 12
Fahrtenschlüssel 05:23 06:23 07:22 07:32 08:23 08:33 09:23 09:33
Verkehrsbeschränkung
Hinweis Su Su Su
1 1050 Willy-Brandt-Platz 1 05.23 06.23 07.22 07.32 08.23 08.33 09.23 09.33
2 1058 Ostplatz 1 05.24 06.24 07.23 07.33 08.24 08.34 09.24 09.34
3 1048 Örlinger Straße 1 05.25 06.25 07.25 07.35 08.25 08.35 09.25 09.35
4 1059 Steinhövelstraße 1 05.26 06.26 07.26 07.36 08.26 08.36 09.26 09.36
5 1060 Safranberg 1 05.27 06.27 07.27 07.37 08.27 08.37 09.27 09.37
6 1061 Albecker Steige 1 05.28 06.28 07.28 07.38 08.28 08.38 09.28 09.38
7 1079 Eichenplatz 1 05.29 06.29 07.29 07.39 08.29 08.39 09.29 09.39
8 1078 Ludwig-Beck-Straße 1 05.30 06.30 07.30 07.40 08.30 08.40 09.30 09.40
9 1087 Egertweg 1 05.32 06.32 07.32 07.42 08.32 08.42 09.32 09.42
10 1160 Hörvelsinger Weg 1 05.37 06.37 07.38 07.48 08.37 08.47 09.37 09.47
11 1244 Staudingerstraße 1 05.41 06.41 07.48 07.58 08.41 08.51 09.41 09.51
12 1245 Kliniken Wissenschaftsstadt 1 05.42 06.42 07.49 07.59 08.42 08.52 09.42 09.52
13 1246 Universität West 1 05.43 06.43 07.50 08.00 08.43 08.53 09.43 09.53
14 1247 Manfred-Börner-Straße 1 05.44 06.44 07.51 08.01 08.44 08.54 09.44 09.54
Table 5: Example CSV timetable for SWU line number 15 from Willy-Brandt-Platz to Science Park on a weekday. The first line with column names is not
part of the CSV file but was manually inserted for better legibility.
4.1 exporting data from csv timetables 45
10003zR3.xls
For the files provided by SWU Verkehr, operating branch, line num-
ber, direction of travel and weekday validity can be inferred from the
file names, according to the following pattern.
The first two digits of the file name define the operating branch.
For SWU Verkehr, this is either 10 for all bus services or 11 for
tram service. In other contexts, different branches can be used
to differentiate between operators.
The following three digits describe the line number. This is
not necessarily the line number usually used for customers.
For instance, SWU night bus service is marketed as lines N1,
N2,. . . —within the aforementioned naming convention, how-
ever, the matching timetables can be found under the names
10901zH0.xls,10902zH0.xls, etc.
After the letter z, the direction of travel is specified. This is
either H(German, hin: towards) or R(German, rück: back).
Finally, the last digit describes the service period. In the context
of SWU Verkehr, all tables ending in 0describe weekday service
from monday through friday, 2is for saturday service, and 3
marks the sunday schedule.
Note that, though all files end with .xls, the data is stored as CSV
data with tab stops as delimiters and no text string escaping.
4.1.1Data Layout
In Table 5and Table 6, two example timetables are shown. The data
representation works as follows: The four-digit
identifier of a stop
area matches the
identifier used by
DINGs EFA,
although the latter
prefixes 900 to all
identifiers.
a.Column 0enumerates the stop arrivals or departures. This is not
necessarily the journey pattern followed by all the vehicles on
that route, as can be seen in Table 6, and can safely be ignored.
b.Column 1is the identifier of the stop area. In SWU’s case, this is
a four-digit numeric value that is also found in the KML file for
the stop location (see Section 4.2) and can be used as a unique
identifier for the stop area.
c. In column 2, the name of the stop area—as it would be displayed
on the physical stop itself—is saved.
d.Column 3is only used for layovers at stops. See Table 6for an
example at “Kliniken Wissenschaftsstadt”: The line at an (0)
marks the arrival time, while ab (1) is the departure time for
the same stop. If no such differentiation is present, all times are
departure times only.
46 exporting diva data:a first approach through csv exports
e. In columns 4...n, after a series of labels for the first line’s re-
maining columns, SWU Verkehr exports the identifier suffix for
the exact stop point the vehicle services within this journey pat-
tern. For instance, in Table 6, the vehicle services stop area num-
ber 1010Theater, a stop area with four different stop points for
bus and tram service—at stop point number 3. Stop area iden-
tifier and stop point suffix can be concatenated, i. e., stop point
number 3at stop area number 1010 can be identified through
the unique stop point id 101003.
f. In the following columns, the timing patterns for the individual
journeys are defined.
4.1.2Deciphering A Journey Timing Pattern Column
Each column represents one journey, with the row 0serving as aArrival and
departure times
greater than 24.00
are being used for
departure/arrival
times past midnight.
A vehicle starting a
journey at 11:50
P.M. and arriving at
the final destination
at what is
technically 1:13
A.M. on the
following day,
would start its
journey at 23.50 and
arrive at 25.13 of the
same schedule day.
unique identifier within the scope of the current file, i. e., within the
same route and operating day. This identifier is frequently—but not
necessarily—the departure time from the first stop. Since this identi-
fier is not unique once the context of this particular route and day is
left, it can be made unique within the scope of all journeys by being
prefixed with line name, route direction and day. The first journey in
Table 6could consequently be designated 005H0-0435.
In row 2, scheduling exceptions and guaranteed transfers are saved,
if applicable. For SWU Verkehr, guaranteed transfers are assigned
alphanumeric identifiers: [A-D]{1}[0-5]{1}. These are ignored for
the approach described in this chapter. Scheduling exceptions are
two letter designators, and their meaning has to be delivered along
with the CSV files by SWU Verkehr or deducted by comparing the
footnotes of the finished print-ready timetable for a given journey. For
instance, Su describes journeys only occuring on weekdays during
lecture periods of Ulm University, Fb journeys only take place on
sundays before a holiday.
If row 3is not the row of the first stop within the pattern, option-
ally, the vehicle size can be specified. In the data provided by SWU, a
difference is being made between single buses and articulated buses,
as well as between own vehicles and vehicles provided by contrac-
tors. For the scope of this transformation process, this information is
ignored.
If vehicle size information is not present, row 3marks the departure
time for the first stop point. If this journey should start at a later
stop point, or if it short turns, a single hyphen (-) will appear at all
stops before the starting point or after the final stop. Otherwise, all
departure times follow the pattern hh.mm.
If a route branches in a way that, after a series of stops has been
serviced, another series of stops is not serviced by a particular jour-
4.1 exporting data from csv timetables 47
0 1 2 3 4 5 6 7 106 107 108 109 110
Fahrtenschlüssel 04:35 05:20 05:35 23:39 24:05 24:07 24:09 24:20
Verkehrsbeschränkung
Hinweis A5A5A5Fa A5Fa A5Fa
1 1765 Hasenweg 1 04.35 -05.05 23.20 -23.50 -24.20
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
12 1741 Fachoberschule 1 04.47 -05.17 23.32 -24.02 -24.32
13 1744 Washingtonallee 1$04.53 $ $ 23.33 $24.03 $
14 1743 Neue Hochschule 1$04.53 $ $ 23.33 $24.03 $
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
21 1719 Waldeck 1$05.01 $ $ 23.41 $24.11 $
22 1729 Meininger Allee 1$05.02 $ $ 23.42 $24.12 $
23 1700 ZUP 3 04.48 05.03 05.18 23.33 23.43 24.03 24.13 24.33
24 1710 Rathaus Neu-Ulm 1 04.50 05.05 05.20 23.35 23.45 24.05 24.15 24.35
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
29 1008 Hauptbahnhof 1 05.00 05.15 05.30 23.45 23.55 24.15 24.25 24.45
30 1010 Theater 3 05.02 05.17 05.32 23.47 -24.17 -24.47
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
46 1245 Kliniken Wissenschaftsstadt an (1)2 05.21 05.36 05.51 24.06 ----
47 1245 Kliniken Wissenschaftsstadt ab (0)2 05.25 05.40 05.55 24.25 ----
.
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
.
.
50 1240 Universität Süd 2 05.29 05.44 05.59 24.29 ----
Table 6: Excerpt of a more complex CSV timetable for SWU line number 5from Neu-Ulm to Science Park, including different journey patterns. The first
line with column names is not part of the CSV file but was manually inserted for better legibility.
48 exporting diva data:a first approach through csv exports
ney, before yet another series of stops is serviced again, the omitted
stops are marked with a dollar ($) symbol instead of a departure time.
This kind of branching can be observed in Table 6: The journeys in
columns 5,7,106,108 and 110 all start at Hasenweg stop, servicing
all stops until Fachoberschule and then proceeding directly to ZUP. In
contrast, the journeys in columns 6,107 and 109 start only at Washing-
tonallee, servicing a different series of stops until following the same
stop pattern after also arriving at ZUP. Note that the journeys in
columns 107,108,109 and 110 also short turn.
4.1.3Programmatical Transformation
The following pseudo-code suffices for transforming DIVA CSV ex-For the initial
transformation from
SWUs files to GTFS,
the author used a
object-oriented
approach using Java.
In it, a hashmap of
lines was populated
with journeys, each
with the according
stop times. Thus,
journeys could be
checked online
against DINGs
journey planner EFA,
also making up for
missing headsign
information in the
CSV files.
ports into GTFS. The ProcessFile procedure is called for each file,
extracts line number, type, direction and service ID—in this case, 0
for Monday through Friday, 2for saturdays and 3for sundays—and
makes use of a library function ParseCSV in order to parse the file’s
contents into a two-dimensional array structure.
ProcessFile(filename)
1if Substring(filename,0,2) = 11
2then lineType 0
3else lineType 7
4lineNr Substring(filename,2,5)
5if Substring(filename,6,7) = H
6then routeDirection 0
7else routeDirection 1
8serviceId Substring(filename,7,8)
9
10 table[ ][ ] ParseCSV(filename)
11
12 PrintRoutes(lineNr,lineNr,lineNr,lineType)
13
14 while startLine =nil
15 do
16 if table[0][i] = 1
17 then startLine i
18 else ii+1
19 TripIterator(table,lineNr,routeDirection,serviceId,startLine)
The PrintRoutes, PrintTrips and PrintStopTimes routines can
be used to append any input passed to them onto a routes.txt,
trips.txt and stop_times.txt file respectively. This will result in
duplicate lines ending up in routes.txt—up to six occurrences per
unique line—which can, however, be filtered out by using GNU sort
(1) or manually editing the file in a spreadsheet program.
4.1 exporting data from csv timetables 49
Iterating over the first row of the table array until the first stop is
found is not an elegant approach, but it suffices in determining the
start line in which all journeys of the current table have their first
departure entry.
Afterwards, the array and the line information is passed to TripIt-
erator, which will parse all columns starting with index 4. The first
cell of each column is used as a journey identifier, which is then being
made unique within the whole GTFS dataset’s scope by prepending it
with line number, direction and service identifier.
TripIterator(table,lineNr,routeDirection,serviceId,startLine)
1columns getColumns(table)
2lines getLines(table)
3for column 4to columns
4do
5journId table[column][0]
6uJournId Concat(lineNr,routeDirection,serviceId,journId)
7if Contains(table[2][column],Fa)
8then
9serviceId Fa
10 proceed accordingly for all service exceptions
11 PrintTrips(lineNr,serviceId,uJournId)
12 for line startLine to lines
13 do
14 Atable[line][column]
15 if A6=$ and A6=-
16 then
17 Replace(A,"\."," :")
18 Concatenate(A," :00")
19 if table[line][3] = an (1)
20 then
21 line line+1
22 Dtable[line][column]
23 Replace(D,"\."," :")
24 Concatenate(D," :00")
25 else DA
26 stopId table[line][1]
27 sequence table[line][0]
28 PrintST(ujournId,A,D,stopId,sequence)
Service exceptions are expected in line 2and handled by supersed-
ing the current service identifier with the service exception’s identi-
fier. In the pseudocode example, only the “Fa” exception is handled—
in reality, this would be solved by means of an if-ladder. Once this is
taken care of, the current journey can be appended to trips.txt.
Afterwards, iterating over all lines of the current column, the de-
parture time Ais extracted, and—if it is an actually serviced stop,
50 exporting diva data:a first approach through csv exports
i.e., neither -nor $—its decimal point replaced with a colon, and a
zero-seconds-appendix is attached to the end.
Should column 3at the current line contain the string an (1), the
departure time Dis not identical to the arrival time, but will be found
in the following line. Thus, the line counter is increased, the depar-
ture time Dextracted from that next line, and transformed and ap-
pended accordingly. In all other cases, Dgets the same value as A.
The current stop identifier is then extracted from column 1of theNote that in some
instances this will
result in gaps in the
sequence numbering.
This is acceptable in
accordance with the
GTFS specification.
current line, and the stop sequence number can be extracted from
column 0of the current line. Using the PrintST procedure, the cur-
rent unique journey id, arrival and departure times, the identifier
of the current stop and the sequence number are then written into
stop_times.txt.
4.2 creating stops.txt from kml stop locations
SWU Verkehr was able to export stop geolocation information as a
KML file out of VICOS LIO (see Section 3.8for more information on
SWU’s data flow). This file is formatted according to the following
pattern:
Listing 8: DIVA KML geolocation excerpt
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.
google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml
/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<name>Haltepunkte.kml</name>
<!-- Omitted: A series of placemark styling definition -->
<Folder id="layer fmain">
<name>Haltepunkte</name>
<!-- Omitted: Styling definition -->
<Folder id="layer217">
<name>Ha ß lerstra ß e</name>
<Placemark id="layerP407">
<name>HASS - 01</name>
<Snippet maxLines="0" id="s407"></Snippet>
<description>OLIFID: 135901</description>
<LookAt>
<longitude>9.97761777777778</longitude>
<latitude>48.3896536111111</latitude>
<altitude>0</altitude>
<heading>0</heading>
<tilt>0</tilt>
<range>1000</range>
<altitudeMode>relativeToGround</altitudeMode>
</LookAt>
<styleUrl>#msn_placemark_circle30</styleUrl>
<Point>
4.2 creating stops.txt from kml stop locations 51
<coordinates>9.97761777777778,48.38965361111111,0</
coordinates>
</Point>
</Placemark>
<Placemark id="layerP408">
<name>HASS - 02</name>
<Snippet maxLines="0" id="s408"></Snippet>
<description>OLIFID: 135902</description>
<LookAt>
<longitude>9.97882166666667</longitude>
<latitude>48.3904411111111</latitude>
<altitude>0</altitude>
<heading>0</heading>
<tilt>0</tilt>
<range>1000</range>
<altitudeMode>relativeToGround</altitudeMode>
</LookAt>
<styleUrl>#msn_placemark_circle30</styleUrl>
<Point>
<coordinates>9.97882166666667,48.3904411111111,0</
coordinates>
</Point>
</Placemark>
<Placemark id="layerP409">
<name>HASS - 03</name>
<Snippet maxLines="0" id="s409"></Snippet>
<description>OLIFID: 135903</description>
<LookAt>
<longitude>9.97862138888889</longitude>
<latitude>48.3893241666667</latitude>
<altitude>0</altitude>
<heading>0</heading>
<tilt>0</tilt>
<range>1000</range>
<altitudeMode>relativeToGround</altitudeMode>
</LookAt>
<styleUrl>#msn_placemark_circle30</styleUrl>
<Point>
<coordinates>9.97862138888889,48.3893241666667,0</
coordinates>
</Point>
</Placemark>
<Placemark id="layerP410">
<name>HASS - 04</name>
<Snippet maxLines="0" id="s410"></Snippet>
<description>OLIFID: 135904</description>
<LookAt>
<longitude>9.97806722222222</longitude>
<latitude>48.3884594444444</latitude>
<altitude>0</altitude>
<heading>0</heading>
<tilt>0</tilt>
52 exporting diva data:a first approach through csv exports
<range>1000</range>
<altitudeMode>relativeToGround</altitudeMode>
</LookAt>
<styleUrl>#msn_placemark_circle30</styleUrl>
<Point>
<coordinates>9.97806722222222,48.3884594444444,0</
coordinates>
</Point>
</Placemark>
</Folder>
<!-- Omitted: More folders, one for each stop -->
</Folder>
</Document>
</kml>
As can be seen, within the KML Document Object Model (DOM),
a single Document node contains—besides a document name, and a
series of styling description used by the individual points—a sin-
gle Folder. Within this folder resides a series of furter Folders,
each describing a stop area with all its stops. Each stop is assigned
aPlacemark node, with the stop id—i. e., the four-digit identifier
for the stop area and the two-digit suffix for the platform—in its
Description node. Latitude and longitude are doubly encoded, once
as Longitude and Latitude nodes within a LookAt object, and once
as a comma separated concatenation of longitude, latitude and alti-
tude within a Point/coordinates node. Longitude and latitude use
the WGS84 coordinate reference system, and are in decimal degree
notation.
Note that the stop name can be prepended with a municipality
name. For instance, the Lindenstraße stop in the Blaustein munic-
ipality west of Ulm would show up in the KML file as Blaustein,
Lindenstraße.
4.2.1Programmatical transformation
Transforming the KML file into a stops.txt adhering to the GTFS spec-
ification can be achieved by simple xpath selections and subsequent
text replacement. The following pseudocode example will make use
of a Find procedure returning XML nodes matching a xpath ex-
pression, ignoring namespaces. Actual implementations might need
to prepend a matching kml: namespace prefix. Again, the output is
abstracted by a PrintStops procedure that will take care of opening
a file, deleting its contents if not empty and redirecting all output into
the file after writing the header row according to GTFS specification.
The outer loop iterates over all Folders containing stop sets, ex-
tracts the name and retains two forms of it: The full GTFS stop_code,
including the municipality prefix, if present; and, by removal of any
such prefix, the stop_name.
4.3 transforming the type of day calendar 53
The inner loop iterates over all the Placemarks within the Folder,
locating and appropriately truncating the stop identifier, latitude, and
longitude. Since each Placemark defines a stop point, the first four
digits are extracted in order to obtain the parent_station, i.e., the
stop area.
After printing all thusly obtained “stops located within a stop”, the
stop area is output as a matching parent station.
ProcessFile(kml)
1for each stopfolder Find(kml,"/kml/Document/Folder/Folder")
2do
3code Find(stopfolder,"./name")
4name code
5Replace(name,".,","")
6for each stop Find(stopfolder,"./Placemark")
7do
8id Find(stop,"./description")
9Replace(id,"OLIFID :","")
10 parent Substring(stop_id,0,4)
11 lat Find(stop,"./LookAt/longitude")
12 lon Find(stop,"./LookAt/latitude")
13 PrintStops(id,\"code\",\"name\",0,parent,lon,lat)
14 PrintStops(parent,\"code\",\"name\",1,,lon,lat)
4.3 transforming the type of day calendar
The type of day calendar was provided by SWU Verkehr as a spread-
sheet with several different versions of the calendar. This spread-
sheet was apparently not meant for automated processing, but as a
(printed) visual reference for employees—for instance, as a wall cal-
endar. In SWU’s case, three major distinctions are being made:
School days school holidays
University session semester break
Days with night bus operation no night bus service
Apart from this distinction, service days are categorized into the
following groups:
Monday through thursday
Friday
Saturday
Sunday
54 exporting diva data:a first approach through csv exports
Weekday ID School Day University Night service
Mon—Thu
10
11
12
13
Friday
50
51
52
53
Saturday 60
Christmas Eve 61 *
New Year’s Eve 62 *
Sunday 70
71
Preholiday
80
81
82
83
Table 7: Type of day calendar matrix used by SWU Verkehr. Note that on
Christmas Eve and New Year’s Eve, night bus service is customized
and differs from the usual night service during the year.
Day before a holiday
Special holidays1
Combining these two categorizations results in the types of day
matrix laid out in Table 7.
manual exception handling In the author’s first approach,
aGTFS calendar.txt was prepared in which the obtained trips from
Section 4.1.3were automatically in service mondays through fridays
(service id 0), saturdays (service id 2), or sundays (service id 3). Night
service was set by default to fridays and saturdays. Afterwards, ex-
1Additional night service is offered on Christmas Eve and New Year’s Eve. Also, Ulm
has a special “holiday” in July, dating back from Imperial City days. During the
week surrounding this holiday, additional service is offered to cope with increased
ridership. The same applies to larger cultural and sports events. These additional
services are, however, not present in the CSV exports mentioned in this chapter and
therefore omitted here.
4.3 transforming the type of day calendar 55
ceptions were manually looked up in the time of day calendar and
inserted into calendar_dates.txt.
For instance, a holiday occurring in the middle of a week meant See Table 7
disabling the regular weekday service for that date; enabling sun-
day/holiday service; optionally enabling night bus service (e.g., if
the specified date was a friday, or if the following date is also a hol-
iday); and enabling pre-holiday service for the previous day. This
quickly proved cumbersome and error-prone, requiring time and de-
tail to attention, so that all service types were correctly enabled or
disabled for each exception from the regular calendar.
programmatical alternative Alternatively, the GTFS specifi- In order to maintain
compatibility with
GTFS consuming
applications that
rely on a
calendar.txt
being present, one
might also consider
providing one with
all services being
disabled on each and
every day of the
week.
cation allows omitting the calendar.txt file altogether, relying only
on calendar_dates.txt to enable all service types in use at a cer-
tain date. Thus, after manually preparing the calendar spreadsheet,
it was possible to programmatically transform the type of day calen-
dar into a working calendar_dates.txt file. First, using spreadsheet
software, the spreadsheet layout of the most compact calendar ver-
sion was manually adjusted. In the beginning, each month used two
columns, the first one being the date, the second the day type. After
arranging the data of all calendar months subsequently in the first
two columns and changing the date formatting to YYYYMMDD, the data
was saved as a CSV file. By means of a Perl script, each line was then
split and services enabled accordingly for each day. An example for
the transformation can be seen in Listing 9.
Listing 9: Transformed type of day calendar and the resulting calen-
dar_dates.txt entries
Source CSV:
20131227,51
20131228,60
20131229,70
20131230,11
20131231,62
Result:
Sf,20131227,1
Na,20131227,1
Yr,20131227,1
Fa,20131227,1
2,20131228,1
3,20131229,1
Nb,20131229,1
0,20131230,1
Sf,20131230,1
Na,20131230,1
2,20131231,1
Yu,20131231,1
56 exporting diva data:a first approach through csv exports
4.4 optional:matching route shapes
Having completed all previous steps results in a fully functional and
working GTFS data set with all required data for usage, e. g., by a
journey planning software. However, there is no information on the
actual ways vehicles take in between stops.
SWU Verkehr provided the author with a KML file containing all
the shapes of all variants of all SWU lines. This file, however, turned
out not to be an export from one of SWU’s data sources, but was in-
stead hand-traced by an employee in a GIS software. Thus, this input
file can neither be procured by an automated process, nor do the
shapes match any of the O-D distances within either DIVA or VICOS
LIO. While it would have been possible to match the shape variants
to the journey pattern variants exported through the previously de-
scribed steps, the author chose to disregard this approach and focus
on alternative means of assigning shapes to routes.
online route planner variant Another option to obtaining
shapes was to group all journeys with identical journey patterns, and
query DING’s EFA with one arbitrary journey of each set. The result
XML output of EFA contains WGS84 coordinate pairs that can be parsed
through xpath queries and converted into GTFS notation. The result-
ing shapes were then assigned to all journeys of each group, accord-
ingly.
Note that, when requesting a shape for a journey from its origin
to its destination, journeys other than the intended one might be re-
turned. This occurs mostly when other lines present a more direct al-
ternative of getting from origin to destination; therefore, the intended
journey might not even be returned by the online route planner.
A heuristic to mitigate this effect is to request a journey from its
origin to its destination via a stop area in the middle of the planned
journey pattern. Nonetheless, the resulting shapes have to be checked
for their correctness. This is all the more true for journeys occuring
only on special dates, e. g., on Christmas eve. These journeys might
not be available through the journey planner outside a certain period
surrounding this date.
4.5 drawbacks of this approach
The approach described in this chapter has some shortcomings.
timetable extraction One problem with extracting GTFS data
as described is that guaranteed transfers cannot be extracted without
further information that is not present in the CSV timetables. This is
all the more unfortunate in cases like the presented SWU bus line 5,
which turns into bus line 3within Ulm’s Science Park, allowing riders
4.5 drawbacks of this approach 57
to “transfer from line 5to line 3—and vice versa—without changing
the vehicle.
To make matters worse, the described process uses the representa-
tional timetables, which make use of a peculiar notation when it comes
to describing the change from line 3to line 5. While a vehicle ap-
proaching “Hochschule Eselsberg” from the Eselsberg quarter does
so as a No. 5bus, it departs from there onwards as a No. 3bus
back towards the city center. The printed timetables for the No. 5
bus, however, continues the schedule until “Universität Süd”. In the
other direction, a No. 3bus approaches “Universität Süd”, and sub-
sequently leaves the stop area as a No. 5bus; the printed timetable,
however, continues the No. 3bus schedule until “Hochschule Esels-
berg”. While this overlapping line display may be helpful for riders,
it results in a GTFS file in which not only transfers are not displayed
correctly, but phantom buses appear to accompany each other along-
sides, when they are, in fact, one and the same vehicle.
Also, no predictions can be made on what headsign a certain vehi-
cle uses. While the headsign column is optional as per GTFS specifica-
tion, it can help riders identify whether a certain vehicle short turns,
or which of the journey patterns within a route it follows. While jour-
ney planners using GTFS data can use the name of the last stop area as
destination information, this is not necessarily the headsign actually
used on vehicles by the transit operator.
Figure 8: Geodata example of Haßlerstraße, plotted in QGIS onto map ma-
terial from OpenStreetMap. The green markers are the stop point
positions from the KML file, while the red marker is the stop area lo-
cation as found in DIVA’s internal exchange format and lies approx-
imately between all four stop points. Map background © Open-
StreetMap contributors[67].
geolocation representation The stop location conversion al-
gorithm arbitrarily chooses one stop point to represent the whole stop
area—in this case, the last one. In many cases, this will not be a good
choice. For instance, the four stop points of Haßlerstraße, the example
used in the KML excerpt and visualized in Figure 8, are spread over
three road segments around a junction, and are up to about 180 me-
58 exporting diva data:a first approach through csv exports
ters apart. The code example would use stop point number 4—the
lowermost green marker in Figure 8—to represent Haßlerstraße, which
rather mis-represents the actual location of all its other stop points.
A more refined approach would be averaging the locations of all
stop points in order to represent the stop area. However, keep in
mind that averaging coordinates is not identical with calculating the
arithmetic mean of, respectively, all decimal latitude and longitude
values. While a simple arithmetic mean of coordinates might suffice
for small distances between stop points, and all the more in lower
latitudes, it produces a noticeable distortion for larger distances be-
tween stop points. This distortion becomes more prominent in higher
latitudes. A coordinate conversion into cartesian coordinates, subse-
quent averaging of their (x,y,z)components and re-conversion into
longitude and latitude would result in a more representational “aver-
age” coordinate.
However, since a representational coordinate for each stop area
already exists within DING’s DIVA data set—which is, however, not
present in VICOS LIO and therefore also not in the KML export—
one might as well choose between the approach described within this
chapter, or directly exporting DIVA data as described in Section 5.5.
4.6 conclusion
The approach outlined in this chapter suffices for exporting schedule
data from DIVA into the GTFS format, and SWU subsequently decided
to publish this data under the Open Database License (ODbL) license,
allowing interested developers to freely use and share this data. Dur-
ing the schedule change in December 2013, the author was able to
use the process described in this chapter to export the new schedule
to GTFS within only a few hours and little manual correction.
However, the shortcomings of this process led the author to re-
inspect the DIVA exchange data to find out whether the problems
could be solved by directly exporting a GTFS feed from this data.
5
EXPORTING DIRECTLY FROM DIVA DATA
Both DING and SWU provided the author with datasets of their sched-
ule in the DIVA exchange format. In SWU’s case, the dataset matched
the set they had regularly shared with DING; in DING’s case, it was
directly copied from DIVA’s working directory.
general approach to exporting diva to gtfs By reverse-
engineering the datasets provided by both parties, the author estab-
lished an export process using a source and target database. The
process consists of the following steps:
1. importing tabular DIVA data into a database,
2. extracting and converting stop and service information from the
database,
3. parsing and converting the line description files (optionally by
means of a DIVA line list table), and,
4. optionally: Extracting and converting transfer information from
the DIVA database.
This will result in a generally complete GTFS feed, although without
route shapes.
5.1 file structure and layout
The export consists of a series of plain text files which appear to
closely follow the plain text files within DIVA’s working directory.
As mentioned in Section 3.4, the internal data structure of the files
closely resembles that of VDV-451, with some notable exceptions. All
relationships between tables presented within this work were reverse
engineered by the author by comparing the data with printed timeta-
bles, or by plotting coordinates onto OpenStreetMap material in GIS
software1.
Also, the character encoding of the files could not be determined
with certainty. While several files contain a chs; entry akin to VDV-
451, which usually claimed the file encoding to be either Windows-
1252 or ISO Latin-1, several files contained characters illegal to both
character sets, e.g., pairs of null bytes. The author suspects that
these character sequences might be leftovers from earlier versions of
the respective files that were originally encoded using Code Page 437.
1In this case, the free and open source Quantum GIS[82].
59
60 exporting directly from diva data
Since simply transforming the text files from Code Page 437 to, e.g.,
UTF-8through GNU iconv (1) would lose special characters such as
Umlauts, a character translation was used as a workaround. Using
tr (1) from the GNU coreutils package, the offending character \000
was substituted by a blank space.
Both SWU’s and DING’s exports included a number of files with
different content structures than all of the other files. Since all those
different files’ last modification timestamp dated from up to 21 years
back, it can be assumed they are legacy files from either earlier DIVA
versions, or from manual edits. Why the export, which was done
directly out of DIVA by it’s export function, would include those files,
is not clear.
5.2 importing tables into an intermediary database
DIVA exchange files containing tabular data can easily be imported
into a SQL database for further processing. Since they all follow a pat-
tern akin to that of VDV-451—and assuming they are all well-formed
according to this specification—the conversion is merely a matter of
text transforming the tabular entries into SQL statements.
Diva2SQL(filename)
1for each line in filename
2do
3if line s/tbl;//
4then
5tableName line
6elseif line s/atr;//
7then
8columns[ ] split(line,";")
9elseif line s/frm;//
10 then
11 formats[ ] split(line,";")
12 elseif Contains(line,num;)
13 then
14 CreateTable(tableName,attributes,formats)
15 elseif line s/rec;//
16 then
17 InsertValue(tableName,line)
As outlined in the Diva2SQL pseudo-code example, it suffices toDepending on the
type of database
being used, column
data types might
need to be adapted to
the database’s
requirements.
parse each tabular data file line by line. In this example, creating the
actual SQL statements is abstracted into a CreateTable and Insert-
Values procedure, respectively. The CreateTable procedure has lit-
tle left to do, apart from iterating over the column names and their
data types, resulting in a table creation statement like the example
laid out in Listing 10.
5.3 setting up a target database 61
Each table entry is then transformed into a SQL insert statement
by an InsertValue procedure. In this step, each rec line from the The author used
Perl’s quotewords
function.
source file is split along the semicolons not enclosed in quotes. After-
wards, true and false text entries might need to be replaced with 1
and 0, respectively. Finally, single quotes—which might be used as
apostrophes, for instance—need to be properly escaped.
Listing 10: Example SQL table creation statement
CREATE TABLE IF NOT EXISTS StopAreaKoord (
_FK__AutoKey_INTEGER,
_FK_ARR_IDX INTEGER,
plan VARCHAR(4),
status CHAR,
xINTEGER,
yINTEGER,
zINTEGER,
disp_xINTEGER,
disp_yINTEGER,
text_xINTEGER,
text_yINTEGER
);
Note that this resulting SQL import will not convey any relational
information at all—thus, update or delete actions on the database
will not propagate correctly. However, since it is only meant to be a
reference from which data is extracted for conversion to GTFS, this is
of no further relevance.
5.3 setting up a target database
In order to save the converted GTFS information, a target database
proved useful. The author used SQLite[83], both for the intermediary
DIVA database, as well as for the target GTFS database. Since SQLite
stores data in a single file, this solution allowed the easy querying,
insertion and modification of data, while not requiring a full-fledged
database installation, which was not available on all the computers
used by the author throughout the prototyping process.
The utilized SQL table structure followed the GTFS specification with
the Google Transit extension [70] to allow for journey-specific trans-
fers. Furthermore, indices were added at relevant points to speed up
database transactions (see Listing 21 for the statements used).
5.4 transforming the line information files
The line information files can be parsed line by line in order to gather See Listing 6as a
reference for the
transformation
outlined in this
section.
all necessary data to transform it into GTFS compliant notation. Each
file describes exactly one line, with up to two routes per line. All
parameters are identified by a two-letter designator at the beginning
62 exporting directly from diva data
of a line and are assigned to one route of the line by either the letter
Hor R.
A Perl script used by the author to transform the DIVA data accord-
ing to the description in this section is included in Listing 22.
5.4.1Choosing Relevant Line Definition Files
Analyzing the exports provided by SWU and DING, it appeared as
if some of the line definition files referred to variants that were no
longer active. Furthermore, some lines are described within several
files, each specifying certain variants of the line.
Therefore, it is not sufficient to simply parse all and any files de-
livered within a DIVA export, but to rely on the TabelleLnrlit table
within the lnrlit file. All active lines are referenced here through
one or more records, specifying the file name, a (non-unique!) line
short identifier, the schedule period they are part of, etc.
Through a SQL SELECT statement, a complete list of relevant files
with their path (relative to the export’s base directory) can be con-
structed:
Listing 11: Selecting relevant line definition file information
SELECT
uvz,
lierg,
kbez,
textpfp,
TextBAlang
FROM TabelleLnrlit;
A file list of the relevant files can be built from this query’s result
according to the following pattern:
[base directory]/[uvz]/[lierg].[kbez]
Note that some characters need to be replaced. In the DIVA database,
the lierg field was found to be a six-character string, padded with
spaces at the end. The associated file name is identical, albeit with
underscores (_) in place of the whitespace(s).
Furthermore, it was found that, in the analyzed exports, two files
reference the same line if
1. the first five characters of the lierg field are identical, and
2. the textpfp and TextBAlang fields are identical, respectively.
Due to the first two characters of each file name being the operatorSee Section 3.4.2
identifier, the relevant routes.txt entries can already be made from
the information available at this point. A combination of the operator
identifier and the textpfp line name proved sufficient to uniquely
5.4 transforming the line information files 63
identify each line within one dataset, with the operator identifier by
itself serving as the agency ID foreign key for the line. The textpfp
field entry can serve as the line’s short name, and the TextBAlang—if
different from textpfp—as the line’s long name.
After inserting this information into the target database, each line
definition file is parsed line-by-line according to the patterns outlined
in the following subsections.
5.4.2Journey Patterns
Each journey pattern description starts with FW2, followed by three
digits specifying the number of stop area identifiers that follow, and
the route direction it applies to. Taking the example from Listing 6,
the journey pattern can be deciphered as follows:
FW
ID count
012 H 1050
1
1058
2
1048
3
1059
4
1060
5
1061
6
1079
7
1078
8
1087
9
1160
10
1241
11
1240
12
In the exports provided by both DING and SWU, all station area
identifiers had four-digit numerical values. Since this might differ for
other transit agencies, one can obtain the list of station identifiers by
splitting the rear of the FW line into nequal parts, with nbeing the
three digit ID count after the initial FW.
If the same stop area is referenced twice in a row within a journey
pattern, this is in order to model layovers, in conjunction with the
timing pattern explained in Section 5.4.4.
5.4.3Stop Points
Starting with ST, followed by the direction and three digits specify-
ing the number of stop points. Example: STH012, followed—in this
case—by twelve three-digit enumerators for the previously defined
stop areas, and a platform suffix.
STH
Count
012 000
ID 0
Platform
1 001
ID 1
Platform
1 002
ID 2
Platform
1 003
ID 3
Platform
1 004
ID 4
Platform
1 005
ID 5
Platform
1 006
ID 6
Platform
1...
Each of the three digit identifiers references the respective stop area
identifier previously defined. The five following characters specify
the suffix for identifying the specific stop platform and are padded
with spaces, if necessary.
As already mentioned in Section 4.1.1, the stop area and stop plat-
form identifiers can be concatenated in order to uniquely identify
2Probable meaning: Fahrweg (German for “journey pattern”)
64 exporting directly from diva data
stop points. Thus, the journey pattern of this example journey would
start at stop point 105001, servicing stop points 105801,104801, etc.,
until ending at stop point 124002.
Note that the platform identifiers are not limited to numerical val-
ues and might take the form Zug3or Gleis4.
5.4.4Timing Patterns
Timing patterns, i. e., series of time differences between arrivals and
departures at stops, are defined in lines starting with FT, again fol-
lowed by either a Hor Rspecifying the direction of travel they apply
to.
FTH
Timing pattern identifier
00001 120002 0601
6stops with 1minute time difference each
020504020001N0000000
FTH0000212 00
First stop: 0minutes from starting time
02 0601020607020001N0000000
Afterwards, a five-digit value serves as an identifier for this partic-
ular timing pattern within this direction of travel.
After two characters without any discernible meaning, the actual
timing pattern follows. In its most simple form, it is a succession of
two-digit numerical values, with each one defining the time differ-
ence in minutes between the last time reference point and the next
stop area. Thus, in the example above, timing pattern 00001 would
depart from stop point 105001 zero minutes after the journey’s start
time, departing from the next stop point (105801) two minutes later.
In order to condense identical time differences, a simple run-length
encoding is used. In the example above, stops three through eight
have a departure time difference of one minute each. This is encoded
as *0601, i.e., six times a one-minute difference.
By using this timing pattern string in conjunction with the startNote that the timing
pattern string is
longer than the
required twelve
values for this
string.
time of a journey, the absolute arrival/departure times for the journey
can be deduced. Using a fictional starting time of 12:00 for a journey
following the 00001 pattern, the respective arrival/departure times
are 12:00,12:02 (+2minutes), 12:03 (+1minute), 12:04 (+1minute),
12:05 (+1minute), 12:06 (+1minute), 12:07 (+1minute) 12:08 (+1
minute), 12:10 (+2minutes), 12:15 (+5minutes), 12:19 (+4minutes)
and 12:21 (+2minutes).
Just as in the CSV export timing patterns described Section 4.1.2,The |character was
not encountered in
the CSV exports used
in Chapter 4as SWU
does not use this
modelling approach.
apart from numerical values, the special symbols -,$and |can occur.
Again, -defines stops at the start or end of an itinerary that are
not served by this particular timing pattern (short turns); $specifies
3Train
4Railway platform
5.4 transforming the line information files 65
branches within journey patterns not served by particular journeys,
and |is used for skipped stops.
Note that, since stop areas can appear as pairs in the journey pat-
terns, layovers can be modeled by this method. If a layover occurs, a
non-zero timing value for the second occurrence of the stop describes
the time offset in minutes between arrival and departure. A value
of zero for the second occurrence marks a pattern where arrival and
departure time are identical, and a -can occur when this stop is the
final destination of a pattern that short turns.
5.4.5Journey Definition
Each line starting with FA5defines one specific journey. The FA lines
were found to follow a pattern that can be matched by means of the
regular expression outlined—with named capture groups—in List-
ing 12. The observed pattern was found to be as follows.
fa: Marks this line as a journey definition
route direction identifier: Either Hor R.
day of week identifier:0for Mondays through Fridays, 2for
Saturdays, 3for Sundays6
journey key: Four digits, mostly—but not necessarily—identical
to the journey start time7. To uniquely identify a journey within
a line, the combination of route direction identifier, day of week
identifier and journey key is sufficient.
five zeroes
four further characters (optional): In all of the provided
files, these were digits. No discernible meaning could be de-
rived as to what they mean.
journey start time: Four digits, specifying the base time refer-
ence point for a journey’s timing pattern in the local time zone.
single digit: No discernible meaning could be deduced.
timing pattern identifier: References the timing pattern that
applies to this journey.
whitespace (optional)
vehicle type identifier (optional): Two characters, alphanu-
meric, which reference a vehicle type defined in the Transportgefaesse
table.
5Probably meaning Fahrt, i.e., journey.
6This is identical to the type-of-day identifiers encountered in Section 4.1.
7This is identical to the Fahrtenschlüssel in Section 4.1.
66 exporting directly from diva data
white space(s) (optional): The number of white spaces encoun-
tered appears to be arbitrary from 0...n.
service restriction identifier (optional): References a ser-
vice type defined in the ServiceRestriction table through two
characters, alphanumeric.
whitespace(s): At least one
train capture group: If the journey is part of a train line, theTrain types were
found to follow the
usual names
encountered in
Germany, e. g., RE
for
“Regionalexpress”.
following string is found: An alphanumeric train number, fol-
lowed by 0...n whitespace(s), an optional character with no
discernible meaning, and one or more characters referencing
the train type.
notice identifier (optional): After an arbitrary number of char-
acters, of which no meaning could be deduced, three optional
digits at the end of the line reference a “Notice”. This identifier
is encountered elsewhere in the DIVA data set; it does not seem
relevant for a transformation to GTFS, though.
Listing 12: Regular expression matching the relevant parts of a journey def-
inition line
^FA(?<journeyid>(?<direction>[H,R])(?<serviceid>[0,2,3])(?<
journeykey>[0-9]{4}))0{5}(.{4})?(?<starttime>[0-9]{4}).(?<
timingpattern>[0-9]{5})\s?(?<vehicletype>[A-Z0-9]{1,2})?\s
*(?<servicerestriction>[A-Za-z][a-z0-9]{1,2})?\s+((?<trainid
>[A-Z]?[1-9][0-9]{0,5})\s*[A-Z]?\s*(?<traintype>[A-Z]+))
?.*[0-9]{3}(?<notice>\".*\")*
5.4.6Headsigns
Information on what headsign is being used during a journey—or a
part thereof—is referenced in lines starting with EE, followed by the
route direction (Ror H) it applies to.
EEH
Headsign text
"Universität Süd"0
Day of week identifier
Journey key
0515 00000000000
From beginning of journey
000 _000000 ...
EER "Ulm ZOB über Einsingen"0
No journey key: Valid for all journeys
0000 00000000000 017
From 17th stop point onwards
_000000 ...
Headsign texts can be defined
a. for an individual journey, referenced by its previously defined
direction and day of week identifier, and its journey key.
5.4 transforming the line information files 67
b. for all journeys of a route, if the journey key is set to 0000.
Furthermore, headsign texts can be defined
a. for a complete journey, i.e., from its origin to its destination.
b. from the nth stop point onward.
Any combination of these two modes is possible. A regular expres-
sion suitable for capturing the relevant information is referenced in
Listing 13.
Listing 13: Regular expression matching the relevant parts of a journey def-
inition line
^EE(?<direction>[HR])\s\"(?<headsign>.*)\"\s+(?<journey>[0-9]{5})
.*(?<startingstop>[0-9]{3})_
5.4.7Line Name and Description
Information on the line’s name and the mode used—i.e., whether it
is a bus, train, etc.—can be extracted from lines starting with BU.
BU H
Direction
Line short name
"15" "bus"
Line type
"..."
Line long name 1and 2
"Willy Brandt Platz Uni Süd" "" "" "" ...
After the direction identifier (Hor R), a series of strings within dou-
ble quotes describe the following items
A short identifier, e. g., the line number.
The line type. In DING’s example, those could be “bus”, “bahn”8,
“strab”9, etc.
A text string referring to the operator (omitted in the example
above).
One or two strings with the line’s long name, i.e., a short de-
scription of its itinerary from start to destination.
A series of further characters whose purpose could not be de-
duced.
The following regular expression suffices to extract all relevant in-
formation:
8Train
9Tram
68 exporting directly from diva data
Listing 14: Regular expression for relevant parts of a line name line
^BU(?<direction>[HR])\s\"(?<shortid>.*)\"\s\"(?<linetype>.*)\"\s
(\".*\")\s\"(?<longid1>.*)\"\s\"(?<longid2>.*)\"\s(\".*\")\s
(\".*\")\s[0-9]*[NY]
Since it is unclear whether there exists a standardized nomencla-
ture for the line types, one would propably have to create their own
transformation if-ladder to assign a correct line type identifier for the
specific DIVA export one would like to transform. Furthermore, DING
differentiates between regular buses and, e. g., “Anrufsammeltaxi”10.
For a precise transformation, one might consider using the extended
GTFS route type proposal[43] in order to map such subtleties accord-
ingly.
5.5 transforming stop structures and coordinates
Importing the haltestellen files into the intermediary database as
outlined in Section 5.2should suffice for exporting all stop-relevant
information into GTFS. In the analyzed data sets, the haltestellen
file contained a total of 32 tables, 11 of which were completely empty.
Of the remaining 21 tables, five tables proved sufficient to extract all
necessary information regarding stop areas, stop points, their respec-
tive relationship and coordinates, as well as the fare zones they are
part of.
stop areas The Stop table is the main pivot around which all stop
information hinges. Apart from its _AutoKey_, which is referenced by
several other tables, it provides a numerical identifier for each stop
area (hstnr), validity start and end dates, and the name of the stop
prefixed with the name of the municipality it is located in.
fare zones Fare zones are assigned to stop areas in the Stop_tzonen
table. Stop areas can be part of more than one fare zone, e.g., if they
are located on a boundary between fare zones. Since, in GTFS, a stop
can be part of only one fare zone, the author formed “virtual”, new
fare zones for these occasions. For instance, if a stop was assigned to
both fare zones Aand B, the stop was assigned to fare zone AB.
stop area coordinates The Stop_hst_koord table provides coor-The coordinate
model is analyzed in
Section A.2.2dinate pairs for the stop areas defined in the Stop table. Each entry is
outfitted with an identifier for the custom CRS being used, as well as
axand ycoordinate.
10 Dial-a-ride-transit making use of small vans or a taxi
5.5 transforming stop structures and coordinates 69
stop points A table named Stop_hst_steig defines all Steige11 at-
tributed to one stop area. Each stop point references the _AutoKey_
of one stop area, and is itself identified by both a integer (nummer) and
a five-character string (steig). While both identifiers would suffice to
identify individual platforms uniquely (within their stop area), only
the string matches the platform identifiers used in Section 5.4.3.
stop point coordinates An entry exists within the StopPlat-
formKoord table for each georeferenced stop point from Stop_hst_steig.
Again, each entry comes with a CRS identifier, and a xand ycoordi-
nate.
5.5.1Querying Stop Areas and Stop Points
The relevant queries from the intermediary database are similar, with
minor distinctions between the following:
Stop areas containing two or more platforms are covered by the
query in Listing 15
Stop “platforms” as parts of a stop area (i.e., stop points) can
be obtained by means of the query in Listing 16
Solo stop points, i. e., stop points not affiliated to a stop area,
will be the result of Listing 17’s query
Through this distinction, the mapping to the three GTFS stop types—
parent stops or stations, child stops, and “solo” stations—is already
made.
Listing 15: SQL query for DIVA stop areas containing stop points
SELECT S.hstnr AS stop_id,
S.hstname AS stop_name,
group_concat(tz.tzonen,"") AS zone_id,
HK.x AS stop_lat,
(-1 *(HK.y - 6160000)) AS stop_lon,
HK.plan AS plan
FROM Stop AS S
LEFT OUTER JOIN Stop_hst_koord as HK
ON S._AutoKey_=HK._FK__AutoKey_
AND S.input=HK.input
LEFT OUTER JOIN Stop_tzonen as tz
ON S._AutoKey_=tz._FK__AutoKey_
WHERE S._AutoKey_IN (
SELECT SHS._FK__AutoKey_
FROM Stop_hst_steig AS SHS
WHERE S.input = SHS.input)
GROUP BY stop_id, HK.x
11 Platforms
70 exporting directly from diva data
Listing 16: SQL query for DIVA stop points which are part of a stop area
SELECT S.hstnr AS stop_id,
S.hstname AS stop_name,
group_concat(tz.tzonen,"") as zone_id,
SHS.steig AS platform,
SPK.x AS stop_lat,
(- 1 *(SPK.y - 6160000)) AS stop_lon,
SPK.plan AS plan
FROM Stop AS S
LEFT OUTER JOIN Stop_tzonen as tz
ON S._AutoKey_=tz._FK__AutoKey_
LEFT OUTER JOIN Stop_hst_steig AS SHS
ON S._AutoKey_= SHS._FK__AutoKey
LEFT OUTER JOIN StopPlatformKoord AS SPK
ON SHS._AutoKey_= SPK._FK__AutoKey_
WHERE SHS.platform NOT LIKE "Eing%"
GROUP BY stop_id, platform, SPK.x
Listing 17: SQL query for DIVA stop points which are not part of a stop area
SELECT S.hstnr AS stop_id,
S.hstname AS stop_name,
group_concat(tz.tzonen,"") as zone_id,
HK.x AS stop_lat,
(-1 *(HK.y - 6160000)) AS stop_lon,
HK.plan AS plan
FROM Stop AS S
LEFT OUTER JOIN Stop_tzonen as tz
ON S._AutoKey_=tz._FK__AutoKey_
LEFT OUTER JOIN Stop_hst_koord as HK
ON S._AutoKey_=HK._FK__AutoKey
WHERE S._AutoKey_NOT IN (
SELECT SHS._FK__AutoKey_
FROM Stop_hst_steig AS SHS)
GROUP BY stop_id, HK.x
5.5.2Coordinate Transformation
As the stop coordinates are in a DIVA specific format, they need toSee Section A.2.2for
details on the DIVA
coordinate model. be transformed to the WGS84 CRS. In the dataset provided by DING,
all relevant coordinates were in Gauss-Krüger Zone 3with a custom
offset. Coordinate transformation was simply achieved by passingNote that the
northing offset
compensation had
already been taken
care of in the SQL
queries beforehand.
the x and y coordinates to cs2cs12 with the matching source and
target CRS identifiers.
Finally, the thusly obtained stops and their coordinates are written
back into the stops table within the target GTFS database.
Listing 24 in the Appendix shows the Perl script used to extract
and transform the stop data.
12 Available through the proj(1) package.
5.6 importing service types and dates 71
5.6 importing service types and dates
Service types and their validity dates are defined in the ServiceRestric-
tion table—with the exception of the “regular weekday types 0,2
and 3for Monday through Friday, Saturday, and Sunday, which need
to take the current year’s holidays into account.
Therefore, the definition of the GTFS calendar and calendar_dates
consists of two steps:
Determining local holidays and changing those days’ service to
Sunday service within calendar_dates
Inserting the matching validity dates for all other service types
into calendar_dates and defining them in calendar.
5.6.1Determining Local Holidays
This task can be taken care of by standard calendar libraries, such as
Perl’s Date::Holidays::DE package for German holidays.
First, regular services for Monday through Friday, Saturday and
Sunday are inserted into calendar, with the corresponding weekdays
set to 1.
Afterwards, exceptions to this rule are determined by iterating over Date and Time
libraries like Perl’s
DateTime are
helpful here
the list of nation-wide and state holidays, obtained by a matching
library. If a holiday falls on anything between Monday and Friday,
the 0service is disabled in calendar_dates, and a sunday service (3)
is enabled, instead. In case of a Saturday, the 2service is disabled
and, again, a Sunday service is enabled. If the holiday falls onto a
Sunday, no action is necessary.
5.6.2Importing All Other Service Types
All other services are defined in the ServiceRestriction table and can
be obtained by the SQL query in Listing 18.
Listing 18: Querying service exceptions from DIVA
SELECT anfjahr,
code,
vbt_von,
vbt_bis,
vt
FROM ServiceRestriction
anfjahr sets a date in YYYY format that sets a reference point
against which to calculate the matching service dates—i.e., if anfjahr
is 2013, all dates are calculated from 2014-01-01.code is the equiva-
lent to GTFS’s service_id.vbt_von and vbt_bis are offsets in months
from the anfjahr reference date, defining the validity start and end
72 exporting directly from diva data
date. Therefore, if anfjahr is 2013, and vbt_von is 23, the current
service validity starts on December 1st, 2014. The same logic applies
to vbt_bis, as the service validity end date.
The code field serves as the identifier for the service in question,
and is inserted into the GTFS calendar table with all day types from
monday through sunday set to zero, and the validity start end end
dates from the SQL result. The dates on which the respective service
types are valid can be inserted into the GTFS calendar_dates table
afterwards.
Finally, vt defines the individual days on which a given service is
valid. To achieve this, it assigns a logical 0or 1value to each day of
the month—depending on whether the service is valid for that day,
or not—, resulting in a 32-bit binary pattern for each month. TheA month with 31
days will always
start with one zero
in this notation; a
month with 30 days
will have two
leading zeroes, etc.
least significant bit of each month’s binary pattern is the first day of
the month, and the leading bits are padded with as many zeroes as
necessary to arrive at a 32-bit value. The vt column for each service
comes with one such 32-bit value, in hexadecimal notation, for each
month it is valid, separated by a whitespace.
Programmatically, this day-service-assignment was solved by iter-
ating over the vt column, one month at a time. First, the hexadeci-
mal value was converted into an array with 32 fields of one bit each,
and the first and last day of the month in question were calculated
through the DateTime function. All that was left to do now was to
iterate over the array, starting with the last field and the first day of
the month, and incrementing the date on each iteration, until the last
day of the month was reached. Whenever the array held a 1value,
the current day and the service identifier were inserted into the GTFS
calendar_dates table with the exception_type set to 1, thus enabling
the service for that particular day.
5.7 handling transfers
Protected transfers are modelled in the TransferProtection table, and
the relevant information can be extracted through the query in List-
ing 19.
The individual fields are:The table offered
more fields, which
are, however,
apparently not being
used by the DING
and SWU modelling
approach.
linie_erg_an and linie_erg_ab, the line identifiers for the ar-
riving (an) and departing (ab) lines between which transfers are
guaranteed
richt_an and richt_ab, the route direction identifiers for the
respective lines
wttyp_an and wttyp_ab identify the type of day for which this
transfer is valid; the usual DIVA 0,2and 3distinction applies, as
well as an Afor all types of day
5.8 exporting the gtfs feed from the database 73
zeit_von_an,zeit_bis_an,zeit_von_ab and zeit_bis_ab, the
time period for which this transfer is valid, with a start and end
time for both the arriving and the departing line. Time points
are formatted in minutes since midnight.
hst_nr_an and hst_nr_ab, the stop area identifiers to which this
protected transfer applies
sitz_blb, which is Yif a transfer can be made by staying in the
same vehicle
Listing 19: Querying DIVA transfers
SELECT hst_nr_an,
linie_erg_an,
richt_an,
wttyp_an,
zeit_von_an,
zeit_bis_an,
hst_nr_ab,
linie_erg_ab,
richt_ab,
wttyp_ab,
zeit_von_ab,
zeit_bis_ab,
sitz_blb
FROM TransferProtection
The validity timestamps can easily be translated into the GTFS-style
timestamps by simple modulo and string manipulation operations;
i.e., an input value of 1530 results in an output value of 25:30.
The author used a handler that distinguished between transfers
being made by staying on a vehicle or not, and assigned either the
same block identifier in the GTFS trips table or modelled the transfer
through the transfers table, accordingly. A Perl script that takes care
of this handling is included in Listing 23.
5.8 exporting the gtfs feed from the database
As the resulting GTFS feed already exists in the required structure, all
that is left is to export the individual tables into the matching CSV files.
In order to export only stops that are actually serviced by journeys,
and only service ids to which journeys are assigned, the series of
commands in Listing 20 was used.
Listing 20: Exporting the GTFS database into a text file feed
sqlite3 -header -csv diva2gtfs.db "select *from stops AS s where
s.stop_id in (select distinct parent_station from stops AS
st where location_type = 0 and st.stop_id in (select distinct
74 exporting directly from diva data
stop_id from stop_times)) UNION select *from stops where
stop_id in (select distinct stop_id from stop_times);" >
stops.txt
sqlite3 -header -csv gtfs.db "select *from calendar where
service_id in (select distinct service_id from trips);" >
calendar.txt
sqlite3 -header -csv gtfs.db "select *from calendar_dates where
service_id in (select distinct service_id from trips);" >
calendar_dates.txt
sqlite3 -header -csv gtfs.db "select *from trips;" > trips.txt
sqlite3 -header -csv gtfs.db "select *from routes;" > routes.txt
sqlite3 -header -csv gtfs.db "select *from stop_times;" >
stop_times.txt
5.9 issues
The GTFS feed resulting out of this process solved the drawbacks of
the first approach outlined in Chapter 4, namely the better represen-
tation of stop area coordinates, journey headsigns, and transfers re-
sulting out of vehicle blocks. It even expanded the feed’s capabilities
by providing guaranteed transfers and required no manual work on
the service exception calendar.
However, the resulting feed still lacks journey shapes, as the re-
quired data was also not found within the DIVA data files. Further-
more, the way lines are modelled in DING’s data results in a GTFS feed
that represents single lines (as they appear in the official schedule)
through several distinct lines, albeit with similar names. This is usu-
ally the case if variants of a line exist for very specific dates, e.g., on
Christmas Eve, or during sports events; or when a line is served by
more than one operator.
Another issue arises out of the variety of line types offered in Ger-
man public transit. Similar to the problem described by Müller in
his transformation process description for Münster[63], DING models
less frequented suburban lines not with precise stop times: Series of
stops are assigned the same departure time, which serve as an ap-
proximation of actual stop times. While this seems reasonable, given
that the lines in question are served by taxi-like minibuses, and pas-
sengers have to request a pick-up in advance via telephone, the GTFS
validation tools will treat such a schedule as erronous. Furthermore,
this pickup-on-request-pattern for specific lines and operating hours
has proven difficult to transform automatically.
The journey shapes can still be extracted from the EFA system by
the process described in Section 4.4. One way of addressing the re-
maining issues would be the provision of an GTFS feed editor with an
easy-to-use user interface, which would allow the manual editing of
the feed without an intimate knowledge of the data format, or having
to manually edit the source database or text files.
5.10 conclusion 75
5.10 conclusion
The author implemented the approach outlined in this chapter in a
series of Perl scripts handling the individual parts. At the end of
the implementation process, he was able to transform the DIVA data
sets supplied by DING into a working GTFS feed, including transfers,
service exceptions, and the assignment of stop areas to fare zones,
automatically. Only a limited set of issues remained, which could be
addressed by further expansion of the process.
While lacking data from other DIVA users leave open the question
whether this process could be easily adapted to other DIVA deploy-
ments, it should have lowered the bar to transforming the data into
open transit datasets.
6
WHAT IS HOLDING BACK OPEN TRANSIT DATA IN
GERMANY?
While governments aim to implement open data strategies at national,
state and municipal level in Germany, and processes in order to trans-
form schedule data to GTFS exist, the question remains why, as of mid-
2014, only two German transit authorities have chosen to release their
schedules as open data, with a third one having announced to follow
suit. In this chapter, the author analyzes legal obligations, as well
as a survey taken with transit authorities and operators throughout
Germany.
6.1 legal matters
Public transit and the publishing of schedule data is regulated by a
series of legal questions.
One issue that could potentially prohibit the sharing of transit data
is contract obligations between transit operators and authorities, and
with service contractors. If data is aggregated based on specific re-
strictions on the usage of data, e. g., if participating operators pro-
vide their data only for the integration in existing journey planning
systems, such contracts might need to be adjusted accordingly. Such
specifics need to be addressed on an individual basis, and no general
statement can be made.
6.1.1Transit Legislation
Through the Personenbeförderungsgesetz (PersBefG), regional transit
operators are required to create schedules which include the journey
patterns, with origin and destination, stops en route and travel times
between routes, for each line; to issue these schedules in the custom-
ary manner; to post the valid schedules in designated waiting areas;
and, upon request, to provide the schedule data to the transit permit
authority in a suitable electronic format [5, Sec. 40, Sec. 45].
Railway undertakings—which are not covered by PersBefG but All-
gemeines Eisenbahngesetz (AEG)—are required to include informa-
tion about possible transfers to trains of all other operators in their
publicised schedule information[9, Sec. 12]. Gennaro et al. point
out that no requirement for the provision of schedules for passen-
gers or competing railway undertakings can be found in AEG [40, P.
52]. Additionally, [25, Annex II] requires railway undertakings to in-
form their passengers, before a journey, about the time schedules and
77
78 what is holding back open transit data in germany?
conditions for the fastest trip and for the lowest fares as a minimum
requirement, therefore also not requiring the publication of complete
schedules.
6.1.2Intellectual Property Rights
In the 2010 final report of a research project by transit authorities and
transit service providers, Gennaro et al. analyze the legal framework
conditions as to whether schedule data can be considered protected
through German Intellectual Property (IP) legislation, and what other
legislation applies to schedule data.[40, P. 4666].
Personal, intellectual creations are protected through the German
Gesetz über Urheberrecht und verwandte Schutzrechte[2, Sec. 2], i.e., a
creation has to be the work of a human person, and a certain thresh-
old of originality must be crossed for IP rights to apply. Since the
implementation of the EU Database Directive into German copyright
law, databases can constitute a protected work, if “by reason of the se-
lection or arrangement of their contents, constitute the author’s own
intellectual creation”[24, Art. 3][2, Sec. 4]. German IP legislation also
grants sui generis database rights[2, Sec. 87 a pp.] in order to pro-
tect databases into the creation, verification or presentation of which
a qualitatively or quantitatively substantial investment was made by
the creators. These sui generis rights also extend to databases which
do not meet the originality criteria to be a protected work by themself.
Lastly, computer software constitute another class of works to which
IP rights can be applied[2, Sec. 69 a pp.].
Gennaro et al. differentiate on this basis between four classes of
databases[40, P. 51]:
if the database is an intellectual creation due to the selection and
arrangement of their contents, and if a substantial investment
was made, the database is a protected creative work, and sui
generis database rights apply
if the database is an intellectual creation due to the selection and
arrangement of their contents, but no substantial investment
was made, the database is a protected creative work
if the database cannot be considered an intellectual creation, but
a substantial investment was made, only sui generis database
rights apply
if the database can neither be considered an intellectual creation,
and no substantial investment was made, the database is not
covered by IP protection
6.2 evaluation:the status quo 79
Gennaro et al. further elaborate on the different stages of collecting
and processing transit data, from the raw data1, through content-
related editing of the data,2and technical conversions 3to the final
schedule that can be used to offer schedule information to riders.
They argue that the database does not constitute a protected work
or database work from the raw data stage through the technical con-
version stage, since the originality criteria are usually not met. In-
stead, the mere purpose of aggregating and editing the data drasti-
cally reduces the leeway in which the data can be ordered in a logical
and purposeful manner. Only in exceptional cases could schedules
differ from other schedules as a result of personal creative work by a
schedule creator. While, in the final stage, the typeset schedule might
be considered a result of a personal and creative process, and there-
fore a protected work, they argue that IP protection is usually not
extended to the schedule data itself.
Sui generis database rights were also found unlikely by Gennaro et
al. to apply for the databases in the different stages, with the notable
exception of the final journey planning system, since a substantial
investment is usually made to create and maintain the system.
6.2 evaluation:the status quo
In order to better understand German transit authorities’ position
concerning the handling and publishing of schedule data, the author
designed a questionnaire in Ulm University’s online evaluation sys-
tem. Subsequently, institutions responsible for regional public tran-
sit were invited via E-Mail to participate in this evaluation. The list
of 136 E-mail recipients encompassed the list of all Verkehrsverbünde,
Tarifgemeinschaften,Nahverkehrszweckverbände, and Landratsämter. Re-
cipients were invited to share the link with their respective operators
and service contractors, if applicable.
Participants were asked to give basic information to the type of The complete
questionnaire is
appended in
Section A.1
their institution and its area of operation, as well as whether it han-
dled schedule data and, if yes, whether the schedule data could be
queried through an electronic journey planner.
If the participants had stated that they handled schedule infor-
mation, they were subsequently asked with whom they shared the
data, and what kind of exchange formats they used. Afterwards, the
open data concept and criteria were briefly explained, and partici-
pants were asked whether they had heard of the term before taking
the survey, and whether their institution’s distribution of schedules
1In this context: The acquisition of arrival and departure times.
2In this context: Linking to stop identifiers and coordinates, entering foot paths, and
linking with data from other transit operators.
3E.g., format transformations.
80 what is holding back open transit data in germany?
matched a series of criteria concerning open data, including license
information.
A series of approval questions closed the survey. On a scale from
1(agree strongly) to 5(disagree strongly), participants were asked
for their personal judgement concerning open transit data: Whether
their institution possessed the technical and legal expertise to publish
open transit data, and whether free and open conversion tools and
step-by-step instructions, including legal advice, would help them
in doing so. Furthermore, participants where asked whether they
thought new, innovative transit apps could help improve the attrac-
tiveness of public transit, and whether transit authorities could save
money through third parties developing transit apps. Finally, opin-
ions were asked about whether third-party developers should be ob-
ligated to pay money to schedule distributors if they made money off
their applications, whether the provision of online journey planners
was primarily the responsibility of transit authorities, and whether
riders would make transit authorities responsible for third-party apps
giving faulty information out of correct schedules.
As a last question, participants were asked for a short statement
about what, in their opinion, is currently the greatest obstacle in pub-
lishing open transit data.
Participants were able to give feedback and remarks in a final step,
and leave their E-Mail address if they were interested in the results
of the study or in participating in further studies concerning open
transit data.
6.2.1Findings of the Evaluation
Up until the submission date of this work, 47 surveys were completed
and evaluated. Institutions from all German states, except for Hesse
and Mecklenburg-Western Pomerania participated in the survey, with
the majority of them from Bavaria and Baden-Württemberg. This is
not surprising, since both states cover a large area, with their transit
systems spanning only a few counties, whereas Berlin and Branden-
burg are both covered by only one linked transit system.
With the exception of one single institution4, all participants han-
dled schedule data in one form or another. Only in the case of a single
county administration in Bavaria is schedule data not contributed to
an electronic journey planning software. All other participants allow
riders to query their schedule through electronic journey planners, be
they run by themselves (46,8%), by a third party (29.8%), and/or by
partner institutions (55.3%).
4A group of transit operators; the individual transit operators can be assumed to
handle (their) schedules, though.
6.2 evaluation:the status quo 81
6.2.1.1Data Exchange Formats
When receiving schedule data, spread sheets, PDF and free-form text
together form the most popular class of exchange data (28 mentions,
63.3% aggregated). The proprietary DIVA (n=14,31.82%) and HAFAS
(n=17,38.64%) exchange formats were more frequently mentioned
than VDV-452 (n=12,27.3%). This is especially true for the state of
Baden-Württemberg, where practically all of the Verkehrsverbünde and
their state-level coordination agency NVBW exchange data in DIVA for-
mat. Four institutions (9.09%) receive their data in the DIVA Daten-
pool Nord (DINO) format, two each (4.55%) mentioned the IVU.pool
and ISA format, and a series of formats was mentioned only once
(2.27% each), including TransXChange, RailML and formats encoun-
tered in other European countries. One Verkehrsverbund also claimed
to receive schedule data in GTFS format.
When sharing schedule data with others, free-form text and spread
sheets played a much smaller role (16 mentions, 36.36% aggregated),
surpassed by the HAFAS (n=18,40.91%) format, and followed by DIVA
(n=14,31.82%) and VDV-452 (n=12,27.27%). 5instutions share sched-
ule data in the GTFS format (11.36%), one uses a Transmodel based
format (2.27%), and 8noted different formats (18.18%), including,
again, RailML, DINO, IVU.pool and ISA.
6.2.1.2Open Transit Data
10 %20 %30 %40 %50 %60 %70 %80 %90 %
Cumulative % of answers
n=47
yes no yes, but meaning was unknown
Figure 9: “Had you heard about the term ‘open data’ before this survey?”
72.3% of the participants had previously known the term “open
data”, and 4.3% had at least heard of the term. Unfortunately, the
questions whether schedules published by the respective institutions
meet open data criteria might have been worded ambiguously, as
many participants considered the schedules they published as PDF to
be a machine-readable exchange format—which is technically true,
but not in the sense of open data. Others regarded APIs to their on-
line journey planner as a means of downloading the schedule data.
One participant noted that their agency places emphasis on applica-
tion developers only using their own journey planner API to allow
for consistent journey results, regardless of the frontend being used.
Another comment stated that both for the raw data, as well for the
API, third parties have to sign a user agreement.
82 what is holding back open transit data in germany?
Of the 23 participants who had earlier stated that their institution
provided schedule data to download for anyone—and be it only in
PDF format—, 82.6% stated that there was no explicit license attached
to the data, or that they did not know the license terms it was made
available under.
Through the free-text comments, three participants noted that they
planned to publish open transit data in the near future, or were cur-
rently preparing neccessary steps to do so, with one of them pursuing
the release for non-commercial purposes only. One institution com-
mented that they considered providing Google with a GTFS schedule
feed, but a distribution to others was not their goal.
6.2.1.3Personal Opinions: Capabilities
10 %20 %30 %40 %50 %60 %70 %80 %90 %
Cumulative % of answers
n=41, avg=2.7, s=1.6, NA=3
disagree strongly=5disagree neutral agree
agree strongly=1
Figure 10: “Are the necessary tools and technical expertise for exporting
open transit data available in your institution?”
10 %20 %30 %40 %50 %60 %70 %80 %90 %
Cumulative % of answers
n=40, avg=3.3, s=1.3, NA=4
disagree strongly=5disagree neutral agree
agree strongly=1
Figure 11: “Is the legal expertise necessary for publishing open transit data
available within your institution?”
The self-assessment whether the technical capabilities and the legal
expertise for providing open transit data were present in the partic-
ipants’ institutions varied widely. Similarly, no general consensus
could be found whether free export tools could facilitate the provi-
sion of open transit data.
However, the proposal of a step-by-step instruction “manual” for
providing open transit data, including the necessary legal advice, was
seen favourably by a majority of the participants (54.54% aggregated
agreement and strong agreement), while only an aggregated 25.72%
disagreed, or disagreed strongly.
6.2 evaluation:the status quo 83
10 %20 %30 %40 %50 %60 %70 %80 %90 %
Cumulative % of answers
n=33, avg=3.2, s=1.4, NA=11
disagree strongly=5disagree neutral agree
agree strongly=1
Figure 12: “Free exporting tools for the data formats we use could help us
in publishing open transit data”
10 %20 %30 %40 %50 %60 %70 %80 %90 %
Cumulative % of answers
n=35, avg=2.7, s=1.4, NA=9
disagree strongly=5disagree neutral agree
agree strongly=1
Figure 13: “A step-by-step manual, including an explanation of the legal
framework, would help us in publishing open transit data”
6.2.1.4Personal Opinions: Opportunities
A majority of the survey participants (69%) agreed or strongly agreed
that innovative third-party applications built on schedule data could
help improve the attractiveness of public transit. However, the opin-
ions were split on the question whether transit authorities could save
money through not needing to develop own transit applications, if
third parties offered such solutions based on open transit data.
10 %20 %30 %40 %50 %60 %70 %80 %90 %
Cumulative % of answers
n=45, avg=2.3, s=1.2, NA=2
disagree strongly=5disagree neutral agree
agree strongly=1
Figure 14: “Third-parties developing new, innovative applications based on
schedule data can help improve attractiveness of public transit”
6.2.1.5Personal Opinions: Obstacles
The scepticism about being able to save money through third-party
solutions might be linked to a sense that transit authorities them-
selves are obligated to provide an “official” journey planning service.
84 what is holding back open transit data in germany?
10 %20 %30 %40 %50 %60 %70 %80 %90 %
Cumulative % of answers
n=46, avg=3.2, s=1.4, NA=1
disagree strongly=5disagree neutral agree
agree strongly=1
Figure 15: “If third parties develop solutions based on schedule data, transit
authorities can save money since they do not have to develop
applications themselves”
Asked whether the responsibility for providing online services based
on schedule data is primarily the responsibility of transit authorities
or their service partners, two thirds of the participants responded
positively, with only 11.63% and 2.33% disagreeing or disagreeing
strongly, respectively. It can therefore be assumed that even if transit
authorities publish their schedule as open data and third parties pub-
lish applications based on the data, the transit authorities might still
want to supplement their online journey planners with, e.g., “official”
smartphone applications.
10 %20 %30 %40 %50 %60 %70 %80 %90 %
Cumulative % of answers
n=47, avg=2.1, s=1.1
disagree strongly=5disagree neutral agree
agree strongly=1
Figure 16: “Providing online services based on schedule data is primarily
the responsibility of transit authorities or their service contrac-
tors”
10 %20 %30 %40 %50 %60 %70 %80 %90 %
Cumulative % of answers
n=44, avg=2.3, s=1.1, NA=3
disagree strongly=5disagree neutral agree
agree strongly=1
Figure 17: “If third-party developers make profit off applications based on
schedule data, they should pay fees to the schedule publishers”
Also, an aggregated 58.54% of the participants agreed or strongly
agreed to the question whether third-party application developers
6.2 evaluation:the status quo 85
should pay fees to schedule creators if they made money off their
product.
One interesting finding concerned the question whether the par-
ticipants thought users would hold transit authorities responsible if
third-party applications gave false advice based on the correct sched-
ules provided by the respective authorities. 48,84% of the participants
strongly agreed to this idea, and a further 37.21% agreed, with only
11.63% being neutral and a single strong disagreeing opinion.
10 %20 %30 %40 %50 %60 %70 %80 %90 %
Cumulative % of answers
n=47, avg=1.7, s=0.9
disagree strongly=5disagree neutral agree
agree strongly=1
Figure 18: “If third-party applications give false information based on cor-
rect schedule data, riders will seek the fault at the transit author-
ity”
The free text answers reinforced this issue by providing examples
when, for example, schedule updates did not propagate into part-
ners’ journey planning services for weeks, leading to customers com-
plaining to the respective authority instead of to the partner with
the outdated schedule information. In total, nine free text answers
related to third-party journey planners possibly giving false or not
optimal results, often with remarks about metadata concerning foot-
path routing between stop points, or the up-to-dateness of the sched-
ule data—especially as far as short-notice schedule changes are con-
cerned. Another seven comments pointed out legal concerns, mostly
relating to intellectual property rights, but also with regards to pos-
sible liability for false information being given. A total of four com-
ments were concerned with possible malicious use of transit data,
mostly through competing operators to whom sensitive information
must not be spread, but also through third parties compiling delay
statistics, or maliciously altering the schedule data. One comment ex-
plicitly pointed out a general uneasiness of transit stakeholders’ per-
ceived losing control over their data when it is made publicly avail-
able, and one comment each mentioned a lack of understanding for
the sense and purpose of open transit data, and a perceived lack of
willingness to develop an understanding for the topic. Another eight
comments mentioned a lack of ressources—both personnell-wise and
financially—, lacking know-how, or lacking interfaces or software to
pursue opening transit data.
Through the closing comments, the emphasis on correct journey
planning results through the transit autorities’ own services and APIs
86 what is holding back open transit data in germany?
was pointed out five more times, again with emphasis on result qual-
ity and once pointing out that VBB’s API was in higher demand by
developers than the raw open transit schedules. One participant men-
tioned the susceptibility to being dependent on third parties provid-
ing journey planning services, which could “blackmail” transit au-
thorities into paying them money for correctly providing results to
riders, or for not disadvantanging them over competing modes of
transport, e.g., on-demand car-sharing solutions. Lastly, one partic-
ipant welcomed open transit data as an opposing pole to not being
dependent on, e. g., Google, since everybody was able to get access
to the data on the same terms, and no individual contracts needed to
be negotiated.
6.3 outlook
While many transit authorities appear to be sceptical about publish-
ing open transit data, the agencies willing to implement such a pro-
cess could serve as torch-bearers for a wider adoption. By making
open transit data available at least in limited regions in Germany, tran-
sit developers and researchers can be incited to redeploy or freshly de-
velop applications that solve specific problems not addressed by the
current “official” planning services. These solutions could, in turn,
serve as positive examples for the actual usefulness of publishing the
data in the first place.
The legal framework concerning IP rights and liability issues also
deserves closer analysis, since these were the second-most frequently
cited concerns, but literature seems sparse apart from the work by
Gennaro et al.[40]. An easy-to-follow explanation of the most impor-
tant legal concerns and how they might be addressed could remove
reservations as to possible liability claims. Furthermore, an analysis
on whether open transit data feeds could be published as “unofficial”,
experimental schedules for developing the proper publishing process
without invoking liablity issues might prove interesting. This is all
the more true since there is no real way to prevent open transit enthu-
siasts from accessing the data anyway in one form or another—and
be it by scraping the existing PDF schedules, as McHugh pointed out
had been done in the US[57].
The problem of transit authorities unjustifiably getting the blame
for third-party journey planners giving false information based on
correct schedules also deserves further exploration. User studies
might provide insights on how well users can differentiate between
false advice being given due to the routing software, and false advice
based on incomplete or erroneous transit data.
Finally, process definitions for exporting only the schedule data fit
for public dissemination might address the reservations concerning
6.3 outlook 87
possible competitive disadvantages if trade secrets are accidentally
published.
One possible approach to implement an open transit data strategy
without having to license new export interfaces is to export the rel-
evant schedule data through the already existing interfaces. Ideally,
this would be VDV-452 or a Transmodel-based standard, but the
publishing of SBB’s HAFAS data and the DIVA transformation process
introduced by the author in this work might serve as foundations to
also unlocking these proprietary industry standards. This is also the
strategy taken by OVapi [69], which takes the official, Transmodel-
based open transit data feeds by Dutch transit authorities, and trans-
forms them into GTFS—even supplying them with shape information
in the process if the original feed does not contain it[54]. This ap-
proach would allow developers to directly work with the data mod-
els the transit authorities use themselves, which might lead to the
creation of a suite of conversion tools from those standards to GTFS.
7
CONCLUSION
The situation in Germany concerning transit data appears to be pro-
foundly different from the situation that led US transit agencies to
open their transit data to interested developers. While GTFS was the
first standardized transit data exchange format in the USA and has
become the de-facto standard for open transit data throughout the
world, transit data model standardization is much further advanced
in Europe. German transit agencies rely on a series of established
data exchange standards, two dominant ones of which are propri-
etary with little or no documentation publicly available to open tran-
sit developers. Also, in contrast to the United States, where providing
developers with transit data was the first step in arriving at online
journey planning services for a series of transit agencies, such ser-
vices already exist literally nationwide in Germany. Transit authori-
ties seem furthermore reluctant to let third parties use their schedules
to provide what they see as potentially inferior services. As such, the
emphasis for most agencies in Germany appears to be on providing
interfaces to their own planning services, which can then, in turn, be
used by third party applications—that is, the route computation is
done by the “official” route planning service, and third parties can
integrate the results in their custom user interface.
The obstacles for open transit data found through the survey eval-
uation seem to predominantly be less of a technical than of a struc-
tural and cultural nature. They closely follow the weaknesses and
threats Geiger and von Lucke outlined for open government data [39].
Letting third parties do their own routing on transit data is a cul-
tural shift away from the model where transit agencies control the
schedules and are the sole providers of definite information; and it
is also a danger to existing business models, where service contractors
are charged with providing and maintaining journey planning sys-
tems. This reservation against a missing interpretive predominance is
also in tune with the perceived fear of misinterpretation and a possi-
ble populist mobilisation of mass through maliciously mis-interpreted
data. The uncertainty of existing copyright laws appears to be a fur-
ther deterrent, even for agencies principially willing to open their
data. Also, existing standardization processes have led to a situation
where the data format of choice for developers is different from the
data models prevalent in German public transit. This includes the
problem of clearly defining the data subset that is fit for publishing
without inadvertently disclosing internal data.
89
90 conclusion
However, the survey also found a number of German transit au-
thorities to be actively pursuing open data strategies, and others who
were interested in starting to do so. Cooperating with these agencies
in order to develop the required processes for releasing experimental
transit data sets seems to be a promising approach. This could both
provide a data basis for transit developers in Germany—potentially
leading to showcase projects that could serve to illustrate the mean-
ingfulness of open transit data—and show by example that no nega-
tive repercussions need to be feared.
The author has contributed to such a process by dedicating two
months of this work’s timeframe to analyzing the proprietary and
previously undocumented DIVA data format, and subsequently defin-
ing transformation processes in order to extract the relevant data for
publishing a GTFS data feed from a transit authority’s DIVA data set.
Further work can build upon these findings by contributing to bet-
ter GTFS editors, and implementing processes to make the resulting
transit data feeds even more complete.
Also, since the author made the decision of focusing on the design
and execution of the survey only after this transformation process
was developed, survey results continued arriving until the finalizing
stage of this work. For this reason, the survey evaluation could not
include more sophisticated analysis and deserves closer inspection in
later work.
By following up on the findings of this work, researchers and open
transit developers can further contribute to implementing transit data
publishing processes, together with those agencies identified as will-
ing to also contribute to such processes—thus serving as examples
for the spirit of government as a platform.
A
APPENDIX
a.1 evaluation questionnaire
a.1.1Page 1: Basic Questions
Für welche Art von Einrichtung arbeiten Sie?
Verkehrsunternehmen
Verkehrsverbund
Tarifgemeinschaft
Servicedienstleister
Nahverkehrszweckverband
Landratsamt
Kommune/Stadt/Gemeinde
anderes (siehe Freitext)
Ergänzung zur Art der Einrichtung
Freiwillige Angabe: Wie ist der Name Ihrer Einrichtung?
Freiwillige Angabe: Was ist Ihre Rolle innerhalb Ihrer Einrich-
tung?
Meine Einrichtung ist in folgendem Bundesland bzw. folgenden
Bundesländern aktiv: (Mehrfachnennung möglich)
Baden-Württemberg
Bayern
Berlin
Brandenburg
Bremen
Hamburg
Hessen
Mecklenburg-Vorpommern
Niedersachsen
Nordrhein-Westfalen
Rheinland-Pfalz
Saarland
Sachsen
91
92 appendix
Sachsen-Anhalt
Schleswig-Holstein
Thüringen
weitere (siehe Freitext)
Ergänzung zur Tätigkeitsregion
Postleitzahl meiner Einrichtung
Fließen Ihre Fahrplandaten in eine elektronische Fahrplanaus-
kunft ein?
Ja, in eine von uns selbst betriebene Auskunft
Ja, in eine von einem Dritten in unserem Auftrag betriebe-
ne Auskunft
Ja, in die Fahrplanauskunft eines oder mehrerer Partner
Nein
Weiß ich nicht/keine Angabe
Meine Einrichtung führt keine Fahrplandaten
Etwas ganz anderes! (siehe Freitext)
Ergänzung zur Fahrplanauskunft
a.1.2Page 2: Schedule exchange
Meine Einrichtung nimmt Fahrplandaten von folgenden ande-
ren Einrichtungen entgegen (Mehrfachnennung möglich)
Sub- bzw. Schwesterunternehmen
Eisenbahnverkehrsunternehmen (z.B. DB)
andere Verkehrsunternehmen
andere Verkehrsverbünde (oder vergleichbare)
sonstige Einrichtungen (bitte im Freitext ausführen)
meine Einrichtung nimmt keine Fahrplandaten von ande-
ren Einrichtungen entgegen
Falls Sie Daten entgegennehmen: Bei der Entgegennahme von
Fahrplandaten kommen die folgenden Austauschformate vor
(Mehrfachnennung möglich)
Freitext („liniertes Papier“)
Excel- oder OpenOffice-Tabellen (oder vergleichbares)
ÖPNV-Datenmodell (VDV-451/-452)
Transmodel
Echtzeitfahrpläne (VDV-454, SIRI o.ä.)
A.1 evaluation questionnaire 93
DIVA-Austauschformat
HAFAS-Austauschformat
GTFS
sonstiges Format (bitte im Freitext angeben)
Ergänzungen zur Entgegennahme von Fahrplandaten:
Meine Einrichtung gibt Fahrplandaten an folgende Einrichtun-
gen weiter (Mehrfachnennung möglich)
Eltern- oder Schwesterunternehmen
Eisenbahnverkehrsunternehmen (z.B. DB)
andere Verkehrsunternehmen
andere Verkehrsverbünde (oder vergleichbare)
Google Transit
sonstige Einrichtungen (bitte im Freitext ausführen)
meine Einrichtung gibt keine Fahrplandaten an andere Ein-
richtungen weiter
Falls Sie Daten weitergeben: Bei der Weitergabe von Fahrpland-
aten kommen folgende Formate vor (Mehrfachnennung mög-
lich)
Freitext („liniertes Papier“)
Excel- oder OpenOffice-Tabellen (oder vergleichbares)
ÖPNV-Datenmodell (VDV-451/-452)
Transmodel
Echtzeitfahrpläne (z.B. VDV-454 oder SIRI)
DIVA-Austauschformat
HAFAS-Austauschformat
GTFS
sonstiges Format (bitte im Freitext angeben)
Ergänzungen zur Weitergabe von Fahrplandaten:
a.1.3Page 3: Open Transit Data
War Ihnen der Begriff „Open Data“ bereits vor dieser Umfrage
bekannt?
Ja
Nein
Ich hatte den Begriff gehört, kannte aber die Bedeutung
nicht
94 appendix
Meine Einrichtung stellt den Soll-Fahrplan jedermann (z.B. per
Download) zur Verfügung
Ja
Nein
Weiß nicht/keine Angabe
Der bereitgestellte Soll-Fahrplan steht ohne Anmeldung, Frei-
schaltung oder Vereinbarung zur Verfügung
Ja
Nein
Weiß nicht/keine Angabe
Der bereitgestellte Soll-Fahrplan liegt in einem maschinenlesba-
ren Austauschformat vor
Ja
Nein
Weiß nicht/keine Angabe
Der bereitgestellte Soll-Fahrplan ist folgendermaßen lizenziert:
keine explizite Lizenz angegeben
Lizenz, welche die Nutzung, Weiterverbreitung und Wei-
terverwendung nur zu nichtkommerziellen Zwecken erlaubt
Lizenz, welche die Nutzung, Weiterverbreitung und Wei-
terverwendung auch zu kommerziellen Zwecken erlaubt
sonstige Lizenz (bitte in den Anmerkungen ergänzen)
weiß nicht/keine Angabe
Anmerkungen und Ergänzungen:
a.1.4Page 4: Personal judgements, Part 1
In meiner Einrichtung sind das technische Wissen und notwen-
dige Werkzeuge vorhanden, um einen kompletten Soll-Fahrplan-
datensatz für die Veröffentlichung als Open Data zu exportieren
Das rechtliche Wissen (zum Beispiel zu Lizenzrechten) zur Ver-
öffentlichung eines Soll-Fahrplandatensatz als Open Data ist in
meiner Einrichtung vorhanden
Kostenlose Exportsoftware für die von uns verwendeten Daten-
formate würde uns helfen, Fahrplandaten als Open Data bereit-
zustellen
Eine Schritt-für-Schritt-Anleitung samt Erklärung der rechtli-
chen Rahmenbedingungen würde uns helfen, Fahrplandaten
als Open Data bereitzustellen
A.2 diva exchange format 95
Ihre Einschätzung zur Verwendung offener Fahrplandaten
Wenn Dritte auf Basis von Fahrplandaten neue, innovative Lö-
sungen entwickeln, kann dies die Attraktivität des ÖPNV stei-
gern.
Wenn Dritte Onlinedienste, Apps und Ähnliches auf Basis von
Fahrplandaten anbieten, können ÖPNV-Anbieter Kosten einspa-
ren, da sie diese nicht mehr aus dem eigenen Budget entwickeln
(lassen) müssen.
Falls Dritte durch Produkte Geld verdienen, die auf Fahrpland-
aten basieren, sollen sie für diese Daten auch Geld an die Her-
ausgeber der Fahrpläne bezahlen
Für die Bereitstellung von Onlinediensten, die auf Fahrplanda-
ten basieren, sind in erster Linie die ÖPNV-Anbieter oder von
ihnen beauftragte Unternehmen verantwortlich
Wenn Dritte auf Basis (korrekter) Fahrplandaten falsche Aus-
künfte geben, werden die Anwender die Schuld beim Heraus-
geber der Fahrplandaten suchen
Worin sehen Sie die größte(n) Hürde(n) bei der Bereitstellung von
Fahrplandaten als Open Data?
a.1.5Closing questions
Anmerkungen, Ergänzungen und Feedback
Ich möchte über die Ergebnisse der Studie informiert werden
Ich bin einverstanden, bei Rückfragen per E-Mail kontaktiert zu
werden
Ich würde gerne weiterführende Studien zu Open Transit Data
unterstützen
E-Mail-Adresse (Angabe freiwillig, wird nicht veröffentlicht. Ih-
re Antworten können bei der Veröffentlichung der Ergebnisse
nicht mit Ihnen in Verbindung gebracht werden.)
a.2 diva exchange format
a.2.1Folder Structure and Naming Conventions
Figure 19 shows an abbreviated directory tree of DING’s DIVA export.
Note the distinction between operational files in the root directory,
and line definition files ending in .j13 within the din folder. A file
96 appendix
Input element
Timetable
1Line number
2Nodes (stops and stop times on a line)
3Pattern (sequence of nodes on a route)
4Avg. passenger loads between adjacent
5Load factor (desired number of passengers on
board the transit vehicle)
6Policy headway (the inverse of the minimum fre-
quency standard)
7Vehicle type
8Vehicle capacity
9Avg. running time (travel time between stops/-
timepoints)
Vehicles
1Journey recovery-time tolerances (maximum and
minimum time to be prepared for next journey)
2Journey departure-time tolerances (maximum de-
parture delay and maximum advance departure)
3List of garages (names and locations)
4List of start and end locations
5Average deadhead times from garage locations to
each journey start location (pull-outs)
6Average deadhead times from journey end loca-
tions to garage locations (pull-ins)
7Average deadhead time matrix between all jour-
ney end and start locations (by time-of-day)
Personnell
1Relief-point location (stops, start and end points,
garages)
2Average travel times between relief points
3Journey-layover time (minimum and maximum
rest times between two adjacent journeys)
4Type of duty (early, late, split, full, tripper, etc.)
5Duty length (maximum spread time)
6Number of vehicle changes on duty
7Meal breaks
8Duty composition
9Other work rules
10 List of drivers by name and type
11 Driver priority and equality rules
12 One-day-on, one-day-off work pattern
Table 8: Input elements for schedule-planning according to Ceder[21, P. 8]
A.2 diva exchange format 97
uvz_texte1appears to assign a longer description to the three-letter
directory name, in this case Donau-Iller-Nahverkehrsverbund GmbH.
In comparison, the exports provided by SWU lacked subfolders,
placing all files within the root directory. Furthermore, DING’s ex-
port includes a series of files related to transit operators lacking in
SWU’s version. In return, the SWU exports include operational tables
not present in DING’s version (see Table 2for details).
a.2.2DIVA Coordinate Model
The coordinate model used within DIVA’s data model took some com-
paring and puzzling to make sense of. As a reference, take this abbre-
viated record:
"NAV4";E;4352240;794216;
At first glance, the author took this to be a fixed-digit coordinate in
WGS84 decimal degree notation, specifying a place at 43.52240° north-
ern latitude, 7.94216° eastern longitude. This idea quickly turned
out to be wrong. While the number of decimal places would have
matched, this location marks a spot in the Mediterranean, roughly
30 kilometers off the coast of Sanremo, instead of anything even re-
motely near Ulm—which lies at, approximately, 48.4° N, 9.98° E. The Gauss-Krüger
CRS makes use of
zones spanning 3° of
longitude each. GK
zone 4uses 12° of
longitude as its
central meridian.
The “NAV4 identifier and the leading digit 4of the supposed east-
ing value hinted towards a Gauss-Krüger coordinate located within
zone 4of the GK CRS. However, a valid Gauss-Krüger coordinate
would consist of two seven-digit values, while the supposed northing
value is specified to only six digits. Plotted directly in the Gauss-
Krüger CRS, this coordinate pair resolves to only 7.18423°N latitude
and 10.66115°E longitude—while the longitude looks plausible, the
latitude places this coordinate in Cameroon, way closer to the Nige-
rian border than to any means of public transit provided by DING.Note that DIVA uses
xto specify the
easting and yfor the
northing, which
deviates from usual
cartographic
practise.
Further comparison of coordinates suggested that MentzDV had,
for reasons unknown, introduced a custom offset to the northing
value of their coordinates, which do, in fact, rely on the Gauss-Krüger
CRS. The actual northing is the absolute value of subtracting 6 160 000
from the northing specified in DIVA.
Apart from NAV4coordinates, DIVA appears to make use of sev-
eral other custom reference systems listed in Table 9. These refer-
ence systems and their parameters were acquired by analyzing the
parametrization files of transit applications provided by MentzDV
through transit agencies to Android handset users. A number of
the identifiers used suggest the agency they might have been devel-
oped for: The TFLV set makes use of the Ordnance Survey Great
1Propably an abbreviation for “Unterverzeichnis_Texte”, which translates to “subdirec-
tory texts”
98 appendix
/
hstattr
anschlb.j13
Aushangbeschreibungen.bnv
Aushangbeschreibungen.ding
auto_keys
bes_tage.BW
bzw
Bzwfarben.txt
ferien.B
ferien.BW
haltestellen.bnv
haltestellen.ding
haltestellen.format32.bnv
haltestellen.format32.ding
hinweise
hst_liste
Linien.Praes
Linienfarben.txt
mastmat
mastmat.bnv
mastmat.ding
num_ber_hst
pkbez
tarifz
teilstrecken.01
teilstrecken.02
...
teilstrecken.99
tgtyp
tsp
umstmat
unter
unt_adr
uvz_texte
vbesch.1213
vmtext
zwgruppe
din
01001e.j13
01001R.j13
...
9905cN.j13
99073_.j13
99077_.j13
99078_.j13
lnrlit
Figure 19: Example list of DIVA files provided by DING. Some files were
omitted.
A.3 scripts reference 99
DIVA Ellipsoid Reference System Offset
NAV2Bessel 1841 Gauss-Krüger Zone 2 6 160 100
NAV3Bessel 1841 Gauss-Krüger Zone 3 6 160 100
NBWT Bessel 1841 Gauss-Krüger Zone 3 6 160 100
NAV4Bessel 1841 Gauss-Krüger Zone 4 6 160 100
MVTT Bessel 1841 Gauss-Krüger Zone 2 6 160 100
NAV5Bessel 1841 Gauss-Krüger Zone 5 6 160 100
GIP1Bessel 1841 ÖBMN M34 6 000 000
VVTT Bessel 1841 ÖBMN M28 1 000 000
STVH Bessel 1841 ÖBMN M34 1 000 000
TFLV Airy 1830 OSGB 1 000 000
ITMR GRS 80 ITM 1 000 000
MTCV WGS 1984 UTM Zone 10N5 000 000
GDAV WGS 1984 UTM Zone 55S10 000 000
Table 9: Identifiers of Coordinate Reference Systems used by DIVA and their
meaning. The offset is subtracted from the northing and the abso-
lute value of the result is used.
Britain datum, suggesting a relation to Transport for London (TFL),
which uses EFA. Also, the identifiers making use of the Austrian Bun-
desmeldenetz (ÖBMN) datum bear resemblance to the abbreviations
of the Verkehrsverbünde of Tirol and Styria, which, respectively, also
lie within the matching meridian bands.
a.3 scripts reference
a.3.1GTFS Target Database Creation Statements
Listing 21: SQL Create statements for setting up a target GTFS database
CREATE TABLE IF NOT EXISTS stops (
stop_id TEXT,
stop_code TEXT,
stop_name TEXT,
stop_lat REAL,
stop_lon REAL,
zone_id TEXT,
location_type INTEGER,
parent_station INTEGER,
wheelchair_boarding TEXT
);
CREATE TABLE IF NOT EXISTS routes (
100 appendix
route_id TEXT PRIMARY KEY,
agency_id TEXT,
route_short_name TEXT,
route_long_name TEXT,
route_type TEXT,
route_color TEXT,
route_text_color TEXT
);
CREATE TABLE IF NOT EXISTS trips (
route_id TEXT,
service_id TEXT,
trip_id TEXT PRIMARY KEY,
trip_headsign TEXT,
trip_short_name TEXT,
direction_id INTEGER,
block_id INTEGER,
shape_id TEXT
);
CREATE INDEX IF NOT EXISTS tr_rid ON trips(route_id);
CREATE TABLE IF NOT EXISTS stop_times (
trip_id TEXT,
arrival_time TEXT,
departure_time TEXT,
stop_id TEXT,
stop_sequence INTEGER,
stop_headsign TEXT,
pickup_type INTEGER,
drop_off_type INTEGER,
shape_dist_traveled REAL
);
CREATE INDEX IF NOT EXISTS st_trid ON stop_times(trip_id);
CREATE INDEX IF NOT EXISTS st_stid ON stop_times(stop_id);
CREATE INDEX IF NOT EXISTS st_starrtime ON stop_times(
arrival_time);
CREATE INDEX IF NOT EXISTS st_stdeptime ON stop_times(
departure_time);
CREATE TABLE IF NOT EXISTS calendar_dates (
service_id Text,
date TEXT,
exception_type INTEGER,
PRIMARY KEY (service_id, date)
);
CREATE INDEX IF NOT EXISTS cd_service ON calendar_dates(
service_id);
CREATE INDEX IF NOT EXISTS cd_date ON calendar_dates(date);
CREATE TABLE IF NOT EXISTS calendar (
service_id Text PRIMARY KEY,
monday INTEGER,
A.3 scripts reference 101
tuesday INTEGER,
wednesday INTEGER,
thursday INTEGER,
friday INTEGER,
saturday INTEGER,
sunday INTEGER,
start_date TEXT,
end_date TEXT
);
CREATE TABLE IF NOT EXISTS fare_attributes (
fare_id TEXT PRIMARY KEY,
price REAL,
currency_type TEXT,
payment_method INTEGER,
transfers INTEGER,
transfer_duration INTEGER
);
CREATE TABLE IF NOT EXISTS fare_rules (
fare_id TEXT,
route_id TEXT,
origin_id TEXT,
destination_id TEXT,
contains_id TEXT
);
CREATE TABLE IF NOT EXISTS agency (
agency_id TEXT PRIMARY KEY,
agency_name TEXT,
agency_url TEXT,
agency_timezone TEXT,
agency_lang TEXT,
agency_phone TEXT,
agency_fare_url TEXT
);
CREATE TABLE IF NOT EXISTS shapes (
shape_id TEXT,
shape_pt_lat REAL,
shape_pt_lon REAL,
shape_pt_sequence INTEGER,
shape_dist_traveled REAL
);
CREATE TABLE IF NOT EXISTS transfers (
from_stop_id TEXT,
to_stop_id TEXT,
transfer_type INTEGER,
min_transfer_time INTEGER,
from_route_id TEXT,
to_route_id TEXT,
102 appendix
from_trip_id TEXT,
to_trip_id TEXT
);
CREATE TABLE IF NOT EXISTS feed_info (
feed_publisher_name TEXT,
feed_publisher_url TEXT,
feed_lang TEXT,
feed_start_date INTEGER,
feed_end_date INTEGER,
feed_version TEXT
);
a.3.2Converting DIVA Journeys To GTFS
Listing 22: Transforming DIVA journeys into GTFS
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use Switch;
use DBI;
use Getopt::Long;
use open ':encoding(cp850)';
# take care of windows newlines
$/ = "\r\n";
my $divadbh;
my $dbh;
my $basename;
my $tripname;
my $operator;
my $textpfp;
my $textbalang;
my $basepath = '';
dbconnect();
GetOptions ( "path=s" => \$basepath)
or die("Error in command line arguments\n");
A.3 scripts reference 103
my $sth = $divadbh->prepare('SELECT uvz,lierg,kbez,textpfp,
TextBAlang FROM TabelleLnrlit');
$sth->execute();
while (my $row = $sth->fetchrow_hashref()) {
# tripname: Everything, e.g. 11310a or 11310_
# basename: Just the operator and route, e.g. 11310
$tripname = $row->{lierg};
$tripname =~ /(?<basename>(?<operator>.{2}).{2}[^_]?).+/;
$operator = $+{operator};
$basename = $+{basename};
# trim trailing spaces
$tripname =~ s/\s/_/;
$basename =~ s/\s+$//;
$textpfp = $row->{textpfp};
$textpfp =~ s/\s+$//;
$textbalang = $row->{TextBAlang};
$textbalang =~ s/\s+$//;
# build the path to each file. Pattern is uvz/lierg.kbez
with trimmed spaces
my $path = $basepath . $row->{uvz} . "/" . $tripname . "."
. $row->{kbez};
print "Route: $basename, tripname $tripname, Path: $path\n"
;
if($textpfp eq $textbalang) {undef $textbalang;}
my $newroute = $dbh->prepare('INSERT OR REPLACE INTO routes
(route_id, agency_id, route_short_name,
route_long_name) VALUES (?, ?, ?, ?)');
$newroute->execute($operator."-".$textpfp,$operator,
$textpfp,$textbalang);
$dbh->commit;
my %job = ('path' => $path, 'tripname' => $tripname, '
operator' => $operator ,'textpfp' => $textpfp, '
textbalang' => $textbalang, 'route' => $operator . "-"
. $textpfp);
104 appendix
process(%job);
}
# ----------------------------------------
# SUBROUTINE TO EXPAND TIMING PATTERNS
sub expandtimes {
my @timearray;
# push minutes, -, |, $ to array
foreach (@_) {
# Expand *sequences. First capture is the amount of
occurrences, second the content.
if ($_=~ /\*([0-9]{2})(\-|\||\$|[0-9]{2})/) {
for (my $i = 0; $i < $1; $i++) {
push @timearray, $2;
}
}
# Deal with single occurrences
else {
push @timearray, $_;
}
}
# Done pushing the timing pattern to the array
return @timearray;
}
# -----------------------------------------
# ------------------------------------------
# PROCESS FILE
# -----------------------------------------
sub process {
my %process = @_;
my $file = $process{path};
open (FILE, "<", "$file") or die("Could not open inputfile:
$!");
my $line;
my @stops;
my %platforms;
my $route_type;
my $direction;
A.3 scripts reference 105
my $route_long_name;
my %FT;
foreach $line (<FILE>) {
chomp $line;
#
----------------------------------------------------------
# HEADERS FOR EACH DIRECTION TO BE TAKEN CARE OF
# These are: Journey Patterns, Stop Points, Timing
Patterns
#
----------------------------------------------------------
# Recognize Fahrwege (Journey patterns)
if ($line =~ s/FW[0-9]*[H,R]//) {
@stops = ();
push @stops, substr $line, 0, 4, '' while $line;
print $log " FW recognized: ";
print $log "$_"for @stops;
print $log "\n";
}
# Recognize Stop Platforms
elsif ($line =~ s/ST[H,R][0-9]{3}//) {
while ($line =~ /([0-9]{3})(.{5})/g) {
my $stid = $1;
my $plat = $2;
$plat =~ s/\s+$//; # trim trailing spaces
if ($plat ne '-' and $plat ne '0') {
$stops[$stid] = $stops[$stid] . $plat;
}
}
}
# Recognize Timing Patterns
elsif ($line =~ /FT(?<ftid>[HR][0-9]{5}).{2}(?<pattern>.*)
[ ,N].*.*/) {
# create identifier for current pattern
106 appendix
my $ftid = $+{ftid};
my $pattern = $+{pattern};
# match: *00-, *0000, *00|, *00$ or 00 or - or | or $
# write everything in temporary tmparray for later
expansion of *sequences
my @tmparray = $pattern =~
/(\*[0-9]{2}\-|\*[0-9]{4}|\*[0-9]{2}\||\*[0-9]{2}\$
|[0-9]{2}|\-|\||[\$])/g;
# expand *sequences
@{ $FT{$ftid} } = expandtimes(@tmparray);
}
# Done with timing pattern
# -------------------------------
# HEADERS BEEN TAKEN CARE OF HERE
# -------------------------------
# -----------------------
# HERE COME ACTUAL TRIPS
# -----------------------
elsif ($line =~ s/^FA//) {
if ($line =~ /(?<tripid>(?<direction>[H,R])(?<serviceid
>[0,2,3])(?<tripkey>[0-9]{4}))0{5}(.{4})?(?<starttime
>[0-9]{4}).(?<timingpattern>[0-9]{5})\s?(?<
vehicletype>[A-Z0-9]{1,2})?\s*(?<servicerestriction>[
A-Za-z][a-z0-9]{1,2})?\s+((?<trainid>[A-Z
]?[1-9][0-9]{0,5})\s*[A-Z]?\s*(?<traintype>[A-Z]+))
?.*[0-9]{3}(?<notice>\".*\")*/) {
#" regular expressions detailed in the DIVA
transformation chapter
my $tripid;
my $trip_short_name;
if (defined $+{servicerestriction}) {
$tripid = $+{direction}.$+{servicerestriction}.$+{
tripkey};
}
else {
$tripid = $+{tripid};
}
my $timingpattern = $+{direction} . $+{timingpattern};
A.3 scripts reference 107
# Taking care of directions
if ($+{direction} eq "H") {
$direction = 0;
}else {
$direction = 1;
}
# If train, use train number as trip id
if (defined $+{trainid}) {
$trip_short_name = $+{traintype} . $+{trainid};
}
# Take care of service restriction. If a restriction is
defined, the previous service id is replaced
my $service_id = $+{serviceid};
if (defined $+{servicerestriction}) {
$service_id = $+{servicerestriction};
}
my $sth = $dbh->prepare('INSERT OR REPLACE INTO trips (
route_id, service_id, trip_id, trip_short_name,
direction_id, shape_id) values (?, ?, ?, ?, ?, ?)');
$sth->execute($process{route},$service_id,$process{
tripname}.$tripid,$trip_short_name,$direction,
$process{route}.$timingpattern);
# Analyze timing pattern for trip and save stop times
my $hours = substr($+{starttime},0,2);
my $minutes = substr($+{starttime},2,2);
my $arrival_time;
my $departure_time;
for my $i (0 .. $#stops) {
if ($FT{$timingpattern}[$i] ne '|' and $FT{
$timingpattern}[$i] ne '$' and $FT{$timingpattern}[
$i] ne '-') {
my $sth = $dbh->prepare('INSERT OR REPLACE INTO
stop_times (trip_id, arrival_time, departure_time,
stop_id, stop_sequence) values (?, ?, ?, ?, ?)');
$minutes = $minutes + $FT{$timingpattern}[$i];
108 appendix
if ($minutes > 59) {
$minutes -= 60;
$hours++;
}
$minutes = sprintf("%02d", $minutes);
$arrival_time = "$hours:$minutes:00";
# Handle departures/arrivals at same stop: Take time
of next stop and use it as departure
if ($i < $#stops and $stops[$i] eq $stops[$i+1] and
$FT{$timingpattern}[$i+1] ne '-' and $FT{
$timingpattern}[$i+1] ne '$' and $FT{
$timingpattern}[$i+1] ne '|') {
my $dep_hours = $hours;
my $dep_minutes = $minutes + $FT{$timingpattern}[$i
+1];
if ($dep_minutes > 59) {
$dep_minutes -= 60;
$dep_hours++;
}
$dep_minutes = sprintf("%02d", $dep_minutes);
$departure_time = "$dep_hours:$dep_minutes:00";
# If the above procedure has been performed, the next
iteration is skipped
}elsif ($i > 1 and $stops[$i] eq $stops[$i-1] and $FT
{$timingpattern}[$i-1] ne '-' and $FT{
$timingpattern}[$i-1] ne '$' and $FT{
$timingpattern}[$i-1] ne '|') {
next;
# regular arrival/departure handling
}else {
$departure_time = $arrival_time;
}
$sth->execute($process{tripname}.$tripid,$arrival_time
,$departure_time,$stops[$i],$i);
}
}
}
}
# -------------------------
A.3 scripts reference 109
# END OF TRIPS
# -------------------------
# -------------------------
# HEADSIGN HANDLING
# -------------------------
elsif ($line =~ s/^EE//) {
if ($line =~ /(?<direction>[HR])\s\"(?<headsign>.*)\"\s
+(?<tid>[0-9]{5}).*(?<startingstop>[0-9]{3})_/) {
#" regular expressions detailed in the DIVA
transformation chapter
my $tripid;
if ($+{tid} == 0) {
$tripid = "$process{tripname}$+{direction}%";
# Discriminate: If startingstop is 1 (first stop), Set
headsign for routeuid within TRIPS table. Otherwise
, update STOP_TIMES table
if ($+{startingstop} == 1) {
my $sth = $dbh->prepare('UPDATE trips set
trip_headsign = ? where trip_id LIKE ?');
$sth->execute($+{headsign},$tripid);
}
else {
my $sth = $dbh->prepare('UPDATE stop_times set
stop_headsign = ? where trip_id LIKE ? and
stop_sequence >= ?');
$sth->execute($+{headsign},$tripid, $+{startingstop
}-1);
}
}
else {
$tripid = $process{tripname} . $+{direction} . $+{tid};
if ($+{startingstop} == 0) {
my $sth = $dbh->prepare('UPDATE trips set
trip_headsign = ? where trip_id = ?');
$sth->execute($+{headsign},$tripid);
}
else {
110 appendix
my $sth = $dbh->prepare('UPDATE stop_times set
stop_headsign = ? where trip_id = ? and
stop_sequence >= ?');
$sth->execute($+{headsign},$tripid, $+{startingstop
}-1);
}
}
}
}
# -------------------------
# END OF HEADSIGN HANDLING
# -------------------------
# ---------------------------------
# BUS NAME AND DESCRIPTION PARSING
# ---------------------------------
elsif ($line =~ s/^BU//) {
if ($line =~ /(?<direction>[HR])\s\"(?<shortid>.*)\"\s
\"(?<routetype>.*)\"\s(\".*\")\s\"(?<longid1>.*)\"\s
\"(?<longid2>.*)\"\s(\".*\")\s(\".*\")\s[0-9]*[NY]/)
{
#" regular expressions detailed in the DIVA
transformation chapter
if (not defined $process{textbalang}) {
$route_long_name = $+{longid1} . $+{longid2};
}
else {
$route_long_name = $process{textbalang};
}
# take care of route types
switch ($+{routetype}) {
case "bus" { $route_type = 3 }
case "bahn" { $route_type = 2 }
case "strab" { $route_type = 0 }
case "SAM" { $route_type = 99999 } #This needs to be
corrected manually
case "AST" { $route_type = 99998 } #This, too
case "Fahrradbus" { $route_type = 99997 } #And this!
else { $route_type = 99} # This probably, too.
}
A.3 scripts reference 111
}
}
# -------------------------------------
# END OF BUS NAME/DESCRIPTION PARSING
# -------------------------------------
}
my $sth = $dbh->prepare('UPDATE routes SET route_type = ?,
route_long_name= ? where route_id IS ?');
$sth->execute($route_type,$route_long_name,$process{route})
;
$dbh->commit;
close FILE;
}
# ---------------------------------
# END OF FILE PROCESSING SUBROUTINE
# ---------------------------------
# ---------------------------------
# CLEANING UP!
# ---------------------------------
close $log;
$dbh->disconnect();
$divadbh->disconnect();
print "Database closed. ";
print "Everything done. Bye!\n";
sub dbconnect {
# --------------------
# CONNECT TO DATABASE
# --------------------
my $driver = "SQLite";
my $database = "gtfs.db";
my $dsn = "DBI:$driver:dbname=$database";
my $userid = "";
my $password = "";
$dbh = DBI->connect($dsn, $userid, $password, { RaiseError
=> 1 })
or die $DBI::errstr;
112 appendix
my $divadatabase = "divadata.db";
my $divadsn = "DBI:$driver:dbname=$divadatabase";
$divadbh = DBI->connect($divadsn, $userid, $password, {
RaiseError => 1 })
or die $DBI::errstr;
# sacrificing security for speed
$dbh->{AutoCommit} = 0;
$dbh->do( "PRAGMA synchronous=OFF" );
print "Opened database successfully\n";
# --------------------------
# END OF DATABASE SETUP
# --------------------------
}
a.3.3Transfer Handling Script
Listing 23: Excerpt: DIVA transfer information transformation to GTFS
# Database functions have been previously established; $dbh
is the database handler for the target GTFS database,
and divadbh the database handler for the source DIVA
database.
sub findtransfers {
my $sth = $divadbh->prepare('SELECT hst_nr_an, linie_erg_an
, richt_an, wttyp_an, zeit_von_an, zeit_bis_an,
hst_nr_ab, linie_erg_ab, richt_ab, wttyp_ab,
zeit_von_ab, zeit_bis_ab, sitz_blb FROM
TransferProtection');
$sth->execute();
while (my $row = $sth->fetchrow_hashref()) {
# In DIVA, the time frames are calculated in seconds from
midnight. The conversion is encapsulated in a
subroutine that essentially does nothing else than
return (sprintf ("%02d", int($_/60)) . ":" . sprintf
("%02d", $_%60) . ":00")
my $from_starttime = min2gtfs($row->{zeit_von_an});
my $from_endtime = min2gtfs($row->{zeit_bis_an});
my $to_starttime = min2gtfs($row->{zeit_von_ab});
A.3 scripts reference 113
my $to_endtime = min2gtfs($row->{zeit_bis_ab});
# Startroutes look like "87005" or "87004e", or "219E_e"
in DIVA. They translate to 87005_or 87004e or 219E_e
in the GTFS trips table
my $startroute = $row->{linie_erg_an};
$startroute =~ /(?<basename>.{2}.{2}[^_]?)(?<suffix>.*)/;
if ($+{suffix} eq '') {
$startroute = $startroute . '\_';
}
my $endroute = $row->{linie_erg_ab};
$endroute =~ /(?<basename>.{2}.{2}[^_]?)(?<suffix>.*)/;
if ($+{suffix} eq '') {
$endroute = $endroute . '\_';
}
my $from_stop = $row->{hst_nr_an};
my $to_stop = $row->{hst_nr_ab};
# If wttyp_an is A, this transfer is valid for _all_day
types starting with this route/stop combination within
the given time frame.
if ($row->{wttyp_an} eq "A") {
# ALL day types! First, from day type 0.
my %job = ('starttrip' => $startroute.$row->{richt_an}."
0%", 'from_stop' => $from_stop, 'from_starttime' =>
$from_starttime ,'from_endtime' => $from_endtime, '
to_stop' => $to_stop, 'to_starttime' => $to_starttime
, 'to_endtime' => $to_endtime, 'block' => $row->{
sitz_blb});
if ($row->{wttyp_ab} eq "A") {
$job{endtrip} = $endroute.$row->{richt_ab}."0%";
blockhandler(%job);
$job{endtrip} = $endroute.$row->{richt_ab}."2%";
blockhandler(%job);
$job{endtrip} = $endroute.$row->{richt_ab}."3%";
blockhandler(%job);
}
else {
$job{endtrip} = $endroute.$row->{richt_ab}.$row->{
wttyp_ab};
messyblockhandler(%job);
}
114 appendix
# Now, from day type 2, and then from day type 3 (omitted
for brevity)
}
else {
# Handle individual days
my $starttrip = $startroute.$row->{richt_an}.$row->{
wttyp_an}."%";
my %job = ('starttrip' => $starttrip, 'from_stop' =>
$from_stop, 'from_starttime' => $from_starttime ,'
from_endtime' => $from_endtime, 'to_stop' => $to_stop
, 'to_starttime' => $to_starttime, 'to_endtime' =>
$to_endtime, 'block' => $row->{sitz_blb});
# Again, handling of day type A for the departing trips
if ($row->{wttyp_ab} eq "A") {
$job{endtrip} = $endroute.$row->{richt_ab}."0%";
blockhandler(%job);
$job{endtrip} = $endroute.$row->{richt_ab}."2%";
blockhandler(%job);
$job{endtrip} = $endroute.$row->{richt_ab}."3%";
blockhandler(%job);
}
# Otherwise, the transfer applies just to the individual
day type
else {
$job{endtrip} = $endroute.$row->{richt_ab}.$row->{
wttyp_ab}."%";
blockhandler(%job);
}
}
}
}
sub blockhandler {
my %params = @_;
# This subroutine is called to transform transferring
information from $params{starttrip} to $params{endtrip}
at stop area $params{from_stop} to $params{to_stop}.
The transfer protection is valid from $params{
from_starttime} to $params{from_endtime} for the origin
trip, and from $params{to_starttime} to $params{
A.3 scripts reference 115
to_endtime} for the destination trip. If $params{block}
is "Y", the transfer is achieved by staying on the
vehicle.
my $sth = $dbh->prepare('SELECT trips.trip_id AS trip_id,
arrival_time, stop_id, block_id from trips join
stop_times on trips.trip_id = stop_times.trip_id where
trips.trip_id like ? ESCAPE "\" and arrival_time >= ?
and arrival_time <= ? and stop_id LIKE ?');
$sth->execute($params{starttrip}, $params{from_starttime},
$params{from_endtime}, $params{from_stop}."%");
my %block_identifier;
my %triptransfer;
while (my $arrival_triprow = $sth->fetchrow_hashref()) {
my $current_arrival_time = $arrival_triprow->{arrival_time
};
my $current_arrival_trip = $arrival_triprow->{trip_id};
my $current_arrival_stop = $arrival_triprow->{stop_id};
# Handling transfers by staying on the vehicle
if ($params{block} eq "Y") {
# Does the inbound trip already have a block ID? If yes,
we'll use that later on!
if (defined $arrival_triprow->{block_id}) {
$block_identifier{$current_arrival_trip} =
$arrival_triprow->{block_id};
}
# If not, we will just use the current trip as a block
identifier
else {
$block_identifier{$current_arrival_trip} =
$current_arrival_trip;
}
}
# Let's find matching departure trips for this arrival
trip! Look at all departures between the inbound trip'
s arrival time and the end of the transfer time frame.
my $sth = $dbh->prepare('select trip_id, arrival_time,
stop_id from stop_times
where trip_id like ? ESCAPE "\" and arrival_time >= ? and
arrival_time <= ? and stop_id LIKE ?
116 appendix
order by arrival_time asc limit 1;
');
$sth->execute($params{endtrip}, $current_arrival_time,
$params{to_endtime}, $params{from_stop}."%");
while (my $departure_triprow = $sth->fetchrow_hashref()) {
my $current_departure_trip = $departure_triprow->{trip_id
};
my $current_departure_stop = $departure_triprow->{stop_id
};
# Transfer by staying on the vehicle
if ($messyparams{block} eq "Y") {
$block_identifier{$current_departure_trip} =
$block_identifier{$current_arrival_trip};
}
# Else: Write a transfer
elsif ($params{block} eq "N") {
my $transfersth = $dbh->prepare('INSERT INTO TRANSFERS (
from_stop_id, to_stop_id, transfer_type,
from_trip_id, to_trip_id) VALUES (?, ?, ?, ?, ?)');
$transfersth->execute($current_arrival_stop,
$current_departure_stop, 1, $current_departure_trip,
$current_arrival_trip);
}
}
}
# Finally, if the current request was for block transfers,
use the temporary hash to write everything to the GTFS
database!
if ($params{block} eq "Y") {
for (keys %block_identifier) {
my $updatesth = $dbh->prepare('UPDATE trips SET block_id =
? WHERE trip_id = ?');
$updatesth->execute($block_identifier{$_}, $_);
}
}
$dbh->commit();
}
a.3.4Transforming Stop information from DIVA to GTFS
A.3 scripts reference 117
Listing 24: Transforming DIVA stops into GTFS
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use DBI;
my $line;
# Database connection subroutine omitted here, see previous
examples.
my %CS2CS_params = (
NBWT => '+init=epsg:31467 +to +init=epsg:4326'
# further CRS could be taken care of here
);
my $stop_id ="";
my $stop_name = "";
my $stop_lat = "";
my $stop_lon = "";
my $zone_id = "";
# Handling stop areas
my $sth = $divadbh->prepare('SELECT S.hstnr AS stop_id, S.
hstname AS stop_name, group_concat(tz.tzonen,"") AS
zone_id, HK.x AS stop_lat, (-1 *(HK.y - 6160000)) AS
stop_lon, HK.plan AS plan
FROM Stop AS S LEFT OUTER JOIN Stop_hst_koord as HK ON S.
_AutoKey_=HK._FK__AutoKey_AND S.input=HK.input LEFT
OUTER JOIN Stop_tzonen as tz ON S._AutoKey_=tz.
_FK__AutoKey_AND S.input=tz.input
WHERE S._AutoKey_IN (SELECT SHS._FK__AutoKey_FROM
Stop_hst_steig AS SHS WHERE S.input = SHS.input)
GROUP BY stop_id, HK.x');
$sth->execute();
while (my $row = $sth->fetchrow_hashref()) {
$stop_id = $row->{stop_id};
if (defined $row->{stop_name}) { $stop_name = $row->{
stop_name}; }
118 appendix
if (defined $row->{zone_id}) { $zone_id = $row->{zone_id};
}
if (defined $row->{stop_lat} and defined $row->{stop_lon})
{
$stop_lat = $row->{stop_lat};
$stop_lon = $row->{stop_lon};
my @coords1=split(/\s+/, `echo $stop_lat $stop_lon |
cs2cs -f "%.8f" $CS2CS_params{$row->{plan}}`);
$stop_lon = $coords1[0];
$stop_lat = $coords1[1];
my $insertsth = $dbh->prepare('INSERT OR REPLACE INTO
stops VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)');
$insertsth->execute($stop_id,undef,$stop_name,$stop_lat,
$stop_lon,$zone_id,"1",undef,undef);
}else {
# In some instances, stops did not have coordinates
attached for them; this calls for manual re-
inspection
$stop_name = $stop_name . "FIXME!";
my $insertsth = $dbh->prepare('INSERT OR REPLACE INTO
stops VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)');
$insertsth->execute($stop_id,undef,$stop_name,undef,undef
,$zone_id,"1",undef,undef);
}
}
# Handling stop places.
$sth = $divadbh->prepare('SELECT S.hstnr AS stop_id, S.
hstname AS stop_name, group_concat(tz.tzonen,"") as
zone_id, SHS.steig AS steig, SPK.x AS stop_lat, (- 1 *(
SPK.y - 6160000)) AS stop_lon, SPK.plan AS plan
FROM Stop AS S LEFT OUTER JOIN Stop_tzonen as tz ON S.
_AutoKey_=tz._FK__AutoKey_AND S.input=tz.input LEFT
OUTER JOIN Stop_hst_steig AS SHS on S._AutoKey_= SHS.
_FK__AutoKey_AND S.input=SHS.input LEFT OUTER JOIN
StopPlatformKoord AS SPK ON SHS._AutoKey_= SPK.
_FK__AutoKey_AND SHS.input=SPK.input
WHERE SHS.steig NOT LIKE "Eing%"
A.3 scripts reference 119
GROUP BY stop_id, steig, SPK.x');
$sth->execute();
while (my $row = $sth->fetchrow_hashref()) {
$stop_id = $row->{stop_id} . $row->{steig};
if (defined $row->{stop_name}) { $stop_name = $row->{
stop_name}; }
if (defined $row->{zone_id}) { $zone_id = $row->{zone_id};
}
if (defined $row->{stop_lat} and defined $row->{stop_lon})
{
$stop_lat = $row->{stop_lat};
$stop_lon = $row->{stop_lon};
my @coords2=split(/\s+/, `echo $stop_lat $stop_lon |
cs2cs -f "%.8f" $CS2CS_params{$row->{plan}}`);
$stop_lon = $coords2[0];
$stop_lat = $coords2[1];
my $insertsth = $dbh->prepare('INSERT OR REPLACE INTO
stops VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)');
$insertsth->execute($stop_id,undef,$stop_name,$stop_lat,
$stop_lon,$zone_id,"0",$row->{stop_id},undef);
}else {
$stop_name = $stop_name . "FIXME!";
my $insertsth = $dbh->prepare('INSERT OR REPLACE INTO
stops VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)');
$insertsth->execute($stop_id,undef,$stop_name,undef,undef
,$zone_id,"0",$row->{stop_id},undef);
}
}
# Handling solo stops
$sth = $divadbh->prepare('SELECT S.hstnr AS stop_id, S.
hstname AS stop_name, group_concat(tz.tzonen,"") as
zone_id, HK.x AS stop_lat, (-1 *(HK.y - 6160000)) AS
stop_lon, HK.plan AS plan
120 appendix
FROM Stop AS S LEFT OUTER JOIN Stop_tzonen as tz ON S.
_AutoKey_=tz._FK__AutoKey_AND S.input=tz.input LEFT
OUTER JOIN Stop_hst_koord as HK ON S._AutoKey_=HK.
_FK__AutoKey_AND HK.plan="NBWT" AND S.input=HK.input
WHERE S._AutoKey_NOT IN (SELECT SHS._FK__AutoKey_FROM
Stop_hst_steig AS SHS WHERE S.input = SHS.input)
GROUP BY stop_id, HK.x');
$sth->execute();
while (my $row = $sth->fetchrow_hashref()) {
if (defined $row->{stop_lat} and defined $row->{stop_lon})
{
$stop_lat = $row->{stop_lat};
$stop_lon = $row->{stop_lon};
my @coords1=split(/\s+/, `echo $stop_lat $stop_lon |
cs2cs -f "%.8f" $CS2CS_params{$row->{plan}}`);
$stop_lon = $coords1[0];
$stop_lat = $coords1[1];
my $insertsth = $dbh->prepare('INSERT OR REPLACE INTO
stops VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)');
$insertsth->execute($row->{stop_id},undef,$row->{
stop_name},$stop_lat,$stop_lon,$row->{zone_id},"0",
undef,undef);
}else {
$stop_name = $row->{stop_id} . "FIXME";
my $insertsth = $dbh->prepare('INSERT OR REPLACE INTO
stops VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)');
$insertsth->execute($row->{stop_id},undef,$stop_name,
undef,undef,$row->{zone_id},"0",undef,undef);
}
}
$dbh->commit;
$divadbh->commit;
BIBLIOGRAPHY
[1] Regionalisierungsgesetz. (Cited on page 2.)
[2] Gesetz über Urheberrecht und verwandte Schutzrechte. (Cited
on page 78.)
[3]Geographic data files ISO Standard 14825. (Cited on page 24.)
[4]Reference data model for public transport EN Standard 12896:2006,
2005. (Cited on pages 17 and 23.)
[5] Personenbeförderungsgesetz. (Cited on pages 2,3, and 77.)
[6] Identification of fixed objects in public transport (IFOPT), March
2013. URL http://www.dft.gov.uk/naptan/ifopt/. (Cited on
page 24.)
[7]Service interface for real time information. CEN Standard OO278181.
URL http://user47094.vs.easily.co.uk/siri/. (Cited on
page 24.)
[8]9th DIMACS Implementation Challenge. Shortest path,
2005. URL http://www.dis.uniroma1.it/challenge9/data/
tiger/. (Cited on page 6.)
[9] Allgemeines Eisenbahngesetz. (Cited on page 77.)
[10] Aaron Antrim and Sean Barbeau. The many uses of GTFS
data—opening the door to transit and multimodal applications.
Location-Aware Information Systems Laboratory at the University of
South Florida,2013. (Cited on page 4.)
[11] Sibylle Barth. Nahverkehr in kommunaler Verantwortung:
Der öffentliche Personennahverkehr nach der Regionalisierung.
Schriftenreihe für Verkehr und Technik, (90), 2000. (Cited on page 2.)
[12] Hannah Bast. Car or public transport—two worlds. In Efficient
Algorithms. Springer, 2009. (Cited on page 6.)
[13] Hannah Bast, Erik Carlsson, Arno Eigenwillig, Robert Geis-
berger, Chris Harrelson, Veselin Raychev, and Fabien Viger. Fast
routing in very large public transportation networks using trans-
fer patterns. In 18th Annual European Symposium on Algorithms,
2010. (Cited on page 6.)
[14] Michael Bell and Ian Catling. The EC DRIVE programme
halfway through. Computing & Control Engineering Journal,1(6):
247253,1990. (Cited on page 22.)
121
122 bibliography
[15] Zs Berki. The implementation of public transport data models in
Hungary. Acta Technica Jaurinensis,4(2), 2011. (Cited on page 24.)
[16] Bruno Bert, Kasia Bouree, and Lutz Staub. Transmodel, reference
data model for public transport (European prestandard). 1996.
(Cited on page 23.)
[17] Bliksem Labs. Rrrr rapid real-time routing, 2014. URL https:
//github.com/bliksemlabs/rrrr. (Cited on page 6.)
[18] Kasia Bourée, Bruno Bert, Pierre Pietri, and Bert Vervoort. Trans-
model, June 2001. URL http://www.transmodel.org/en/cadre1.
html. (Cited on pages 23 and 24.)
[19] Patrick Brosi. Real-time movement visualization of public transit
data. Master’s Thesis, 2014. (Cited on page 6.)
[20] US Census Bureau. TIGER/Line files, 2013. URL http://www.
census.gov/geo/maps-data/data/tiger-line.html. (Cited on
page 6.)
[21] Avishai Ceder. Public transit planning and operation: theory, model-
ing and practice. Elsevier, Butterworth-Heinemann, 2007. (Cited
on pages 9,10, and 96.)
[22] IG Collaboratory. Offene Staatskunst Bessere Politik durch
Open Government. Internet & Gesellschaft Co:llaboratory, Berlin,
2010. (Cited on page 2.)
[23] Pieter Colpaert. Route Planning Using Linked Open Data. In
The Semantic Web: Trends and Challenges, pages 827833. Springer,
2014. (Cited on page 6.)
[24] European Commission. Directive 96/9/EC of the European Par-
liament and of the Council of 11 March 1996 on the legal protec-
tion of databases, March 1996. (Cited on page 78.)
[25] European Commission. Regulation (EC) no 1371/2007 of the
European Parliament and of the Council of 23 October 2007 on
rail passengers’ rights and obligations, October 2007. (Cited on
page 77.)
[26] Jehiah Czebotar. GTFS data exchange. URL http://www.
gtfs-data-exchange.com/. (Cited on page 33.)
[27] Ian Davis. Transit: A vocabulary for describing transit sys-
tems and routes. URL http://vocab.org/transit/terms/.html.
(Cited on page 39.)
[28] Daniel Delling, Thomas Pajor, and Renato Fonseca Werneck.
Round-based public transit routing. In Sixth Annual Symposium
on Combinatorial Search,2013. (Cited on page 6.)
bibliography 123
[29] Julian Dibbelt, Thomas Pajor, Ben Strasser, and Dorothea Wag-
ner. Intriguingly simple and fast transit routing, pages 4354. Ex-
perimental Algorithms. Springer, 2013. (Cited on page 6.)
[30] Edsger W. Dijkstra. A note on two problems in connexion with
graphs. Numerische Mathematik,1(1):269271,1959. (Cited on
page 5.)
[31] Katrin Dziekan and Karl Kottenhoff. Dynamic at-stop real-time
information displays for public transport: Effects on customers.
In Transportation Research Part A: Policy and Practice,41(6):489501,
2007. (Cited on page 15.)
[32] Peter J. Elkins. Service management systems for public
transport—the German approach. In Vehicle Location and Fleet
Management Systems, IEE Colloquium on, pages 4/1410. IET, 1993.
(Cited on page 12.)
[33] PB Ellson and RGP Tebb. Benefits and costs of providing additional
information about urban public transport services,1981. (Cited on
page 5.)
[34] B. Finn and K. Holmes. Drive 2programme—area group 7re-
port. Public transport management and information. In Towards
an intelligent transport system. Proceedings of the first world congress
on applications of transport telematics and intelligent vehicle-highway
systems, volume 6, November 30–December 3 1994. (Cited on
pages 23 and 24.)
[35] Open Knowledge Foundation. Open Definition, November 2009.
URL http://opendefinition.org/od/. (Cited on page 1.)
[36] Open Knowledge Foundation. The Open Transport Vocabulary,
2014. URL https://github.com/opentransport/vocabulary.
(Cited on page 39.)
[37] Sunlight Foundation. Ten principles for opening up government
information, August 2010. URL http://sunlightfoundation.
com/policy/documents/ten-open-data-principles/. (Cited on
page 1.)
[38] Avichal Garg. Public transit via google, December
2005. URL http://googleblog.blogspot.de/2005/12/
public-transit-via-google.html. (Cited on page 31.)
[39] Christian P. Geiger and Jörn von Lucke. Open government data.
In Conference for E-Democracy and Open Government, page 183,
2011. (Cited on pages 2and 89.)
[40] Marco Felice Gennaro. Eigentums- und Nutzungsrechte im öf-
fentlichen Verkehr. 2010. (Cited on pages 77,78, and 86.)
124 bibliography
[41] R. Goetz. Verbesserung der Leistungsfähigkeit von Systemen des
öffentlichen Personennahverkehrs durch das Betriebsführungs-
und Informationssystem BISON. Straßen und Verkehr 2000 In-
ternationale Straßen- und Verkehrskonferenz Berlin, September 1988.
(Cited on page 18.)
[42] R. Goetz and R. Kirwa. Standardisiertes Datenmodell für den
ÖPNV. Heureka 90 Optimierung in Verkehr und Transport, April
1990. (Cited on pages 17 and 18.)
[43] Google. GTFS best practices—extended GTFS route types.
URL https://support.google.com/transitpartners/answer/
3520902. (Cited on pages 36 and 68.)
[44] Google. General transit feed specification reference. URL https:
//developers.google.com/transit/gtfs/reference. (Cited on
pages 34 and 35.)
[45] Open Transport Working Group. stations.io, 2014. URL http:
//stations.io/. (Cited on page 38.)
[46] Shannon Guymon. Biking directions added to Google
Maps, 2010. URL http://googleblog.blogspot.de/2010/03/
biking-directions-added-to-google-maps.html. (Cited on
page 33.)
[47] HaCon. Hafas Die perfekte Verbindung zum Kunden, 2014.
URL http://www.hacon.de/hafas. (Cited on page 24.)
[48] Chris Harrelson. Happy trails with Google Transit,
2006. URL http://googleblog.blogspot.de/2006/09/
happy-trails-with-google-transit.html. (Cited on page 33.)
[49] Mark Headd. New thinking in how governments deliver ser-
vices. In: Beyond Transparency, pages 277287. Code for America
Press, 2013. (Cited on pages 1and 2.)
[50] SBB Infrastruktur. Download der öffentlichen Fahrplansamm-
lung der Schweiz, 2014. URL http://www.fahrplanfelder.ch/
fahrplandaten/. (Cited on page 25.)
[51] H. Kaufhold. Bison: Management and information system for
mass transit authorities. Glasers Annalen ZEV,110(12), 1986.
(Cited on page 17.)
[52] Donald F. Kettl. The transformation of governance: Public adminis-
tration for twenty-first century America. JHU Press, 2002. (Cited
on page 1.)
[53] Nick Kizoom and Peter Miller. A Transmodel based XML
schema for the Google transit feed specification—with a GTF-
S/Transmodel comparison. 2008. (Cited on pages 24 and 36.)
bibliography 125
[54] Thomas Koch. GTFS feed for the Netherlands. Google Groups
discussion thread, 01 2014. URL https://groups.google.com/
d/msg/transit-developers/MbGRNM9keJ8/Z9ExR65YZHsJ. (Cited
on page 87.)
[55] Jonathan Lazar, Aaron Allen, Jason Kleinman, and Chris
Malarkey. What frustrates screen reader users on the web: A
study of 100 blind users. International Journal of Human-Computer
Interaction,22(3):247269,2007. (Cited on page 4.)
[56] Felix Mata, Andres Jaramillo, and Christophe Claramunt. A
mobile navigation and orientation system for blind users in a
metrobus environment. In Web and Wireless Geographical Informa-
tion Systems, pages 94108. Springer, 2011. (Cited on page 3.)
[57] Bibiana McHugh. Pioneering open data standards: The GTFS
story. In: Beyond Transparency, pages 125135. Code for America
Press, 2013. (Cited on pages 31,33, and 86.)
[58] MentzDV. About us, 2013. URL http://www.mentzdv.de/
englisch/company/about-us/. (Cited on page 27.)
[59] MentzDV. References, 2013. URL http://www.mentzdv.de/
englisch/company/references/. (Cited on page 27.)
[60] Ted Miller and Tom Neumayr. App store sales top $10 billion in
2013,2014. URL http://www.apple.com/pr/library/2014/01/
07App-Store-Sales-Top-10-Billion-in-2013.html. (Cited on
page 1.)
[61] Facharbeitsgruppe Mobilitätskonzept. Mobilitätskonzept für
Menschen mit Behinderung. 2009. (Cited on page 4.)
[62] Sebastian Mygo. Barrierefreiheit im öffentlichen Nahverkehr: Eine
Voraussetzung fuer die gesellschaftliche Teilhabe. AV Akademikerver-
lag, 2012. (Cited on page 3.)
[63] Bengt Müller. Visualisierung von Fahrplandaten in Kartenan-
wendungen. Der Nahverkehr, (1):1921,2014. (Cited on pages 37
and 74.)
[64] DB Netze. Übersicht der Betriebsstellen und deren Abkürzun-
gen aus der Richtlinie 100, February 2014. URL http://fahrweg.
dbnetze.com/file/2361656/data/betriebsstellen.pdf. (Cited
on page 38.)
[65] n.n. Bahn will offenen Rechner-Verbund schaffen. Com-
puterwoche,11/1988. URL http://www.computerwoche.de/
a/bahn-will-offenen-rechner-verbund-schaffen,1157261.
(Cited on page 24.)
126 bibliography
[66] HM Department of Transportation. TransXChange. URL https:
//www.gov.uk/government/collections/transxchange. (Cited
on page 24.)
[67] OpenStreetMap. Copyright and license. URL http://www.
openstreetmap.org/copyright/en. (Cited on page 57.)
[68] Tim O’Reilly. Government as a platform. Innovations,6(1):1340,
2011. (Cited on page 1.)
[69] OVapi. Dutch GTFS repository. URL http://gtfs.ovapi.nl/.
(Cited on page 87.)
[70] Google Transit Partners. Google transit extensions to
GTFS. URL https://support.google.com/transitpartners/
answer/2450962. (Cited on pages 36 and 61.)
[71] John Pucher and Stefan Kurth. Verkehrsverbund: the success of
regional public transport in Germany, Austria and Switzerland.
Transport Policy,2(4):279291,1995. (Cited on pages 2and 27.)
[72] Eric Raymond. The cathedral and the bazaar. Knowledge, Technol-
ogy & Policy,12(3):2349,1999. (Cited on page 1.)
[73] Ole Reißmann. Routenplaner im Web: Google
sichert sich den Nahverkehr, September 2012.
URL http://www.spiegel.de/netzwelt/web/
google-bekommt-fahrplaene-fuer-den-nahverkehr-a-856802.
html. (Cited on page 7.)
[74] Helen Roach. Public transport data modelling. In Public Trans-
port Information and Management Systems, pages 6/16/3. IET,
1993. (Cited on page 9.)
[75] Helen Roach. The example of Eurobus/Transmodel in integrat-
ing applications for public transport. In Vehicle Navigation and
Information Systems Conference, pages 497502. IEEE, 1994. (Cited
on page 9.)
[76] Wade Roush. Welcome to Google transit: How (and why) the
search giant is remapping public transportation. Community
Transportation,2012. (Cited on pages 31 and 33.)
[77] Peter Sanders and Dominik Schultes. Engineering fast route
planning algorithms, pages 2336. In: WEA07 Proceedings of
the 6th international conference on Experimental algorithms. Springer,
2007. (Cited on pages 5and 6.)
[78] Andreas Schildbach. Öffi. URL http://oeffi.schildbach.de/.
(Cited on page 2.)
bibliography 127
[79] Y. Shafranovich. RfC 4180. Common format and mime type for
comma-separated values (CSV) files. URL http://tools.ietf.
org/html/rfc4180. (Cited on pages 19 and 35.)
[80] Inês Soares and Paulo Matos Martins. Public transport standard-
ization: A contribution to the state of the art review. Rio de
Janeiro, July 15-18 2013. (Cited on page 23.)
[81] Brian Stelter. A Pulitzer winner gets Apple’s reconsideration.
The New York Times, 2010. URL http://www.nytimes.com/
2010/04/17/books/17cartoonist.html. (Cited on page 1.)
[82] QGIS Development Team. QGIS, 2013. URL http://qgis.org/.
(Cited on page 59.)
[83] SQLite Development Team. SQLite. URL http://sqlite.org/.
(Cited on page 61.)
[84] WMK Tizani. A review of trip planning systems. 1992. (Cited
on pages 4and 22.)
[85] Verband Deutscher Verkehrsunternehmen. VDV-451: Dateifor-
mat für die Datenübertragung zwischen ÖPNV-Anwendungen,
1999. URL http://www.vdv.de/oepnv-datenmodell.aspx. (Cited
on page 18.)
[86] Verband Deutscher Verkehrsunternehmen. VDV-452: Stan-
dardschnittstelle Liniennetz, 2013. URL http://www.vdv.de/
oepnv-datenmodell.aspx. (Cited on pages 20 and 22.)
[87] Kari Edison Watkins, Brian Ferris, Alan Borning, G. Scott Ruther-
ford, and David Layton. Where is my bus? impact of mobile
real-time information on the perceived and actual wait time of
transit riders. Transportation Research Part A: Policy and Practice,
45(8):839848,2011. (Cited on page 15.)
[88] Stefan Wehrmeyer. Mapnificent. URL http://www.mapnificent.
net. (Cited on page 4.)
[89] Stefan Wehrmeyer. HAFAS raw data format to GTFS conversion.
Mailinglist Discussion, 05 2013. URL https://lists.okfn.org/
pipermail/open-transport/2013-May/000221.html. (Cited on
page 25.)
colophon
This document was typeset using the typographical look-and-feel
classicthesis developed by André Miede. The style was inspired
by Robert Bringhurst’s seminal book on typography The Elements of
Typographic Style”. classicthesis is available for both L
A
TEX and LYX:
http://code.google.com/p/classicthesis/
Happy users of classicthesis usually send a real postcard to the
author, a collection of postcards received so far is featured here:
http://postcards.miede.de/
Final Version as of May 22,2014 (classicthesis version 1.0).
DECLARATION
I hereby certify that this diploma thesis is my original work and has
been written by me in its entirety. I have faithfully and properly cited
all sources used in the thesis.
Ulm, 2014-05-23
Stefan Kaufmann