2 RELATED WORK
The study driven by (As and Mine, 2016) starts de-
fends that the most valuable resources which can be
given to the bus passengers is the estimated time of
arrival, an argument that is supported by the obser-
vation that, if bus passengers know “their departure
time and arrival time at the destination” they can ”re-
duce their waiting time at the bus stop”.
Under the same topic, (Uno et al., 2009) argument
that the fast progress of the information technology
are leading to new insights about traffic phenomena
that can be solved. Also, they refer that the identifica-
tion of particles moving through the city is one of the
key areas of application on the traffic and transporta-
tion study areas. This work focuses on presenting a
methodology for using GPS data with the aim of turn-
ing it meaningful to transportation analysis. It sum-
marizes also methodologies and transformation tech-
niques that can be applied to the GPS data. The final
aim of this study is to propose an approach for eval-
uating the bus lines quality of service, from the point
of view of the travel time stability and reliability.
Regarding time estimation and prediction, the re-
view written by (Mori et al., 2015) presents the state
of the art of this topic, going even further by explain-
ing the main definitions and how the advanced trav-
eller information systems work.
Finally, in terms of methods for long term travel
time prediction, the study conducted by (Mendes-
Moreira et al., 2012) highlights the importance of this
prediction and how it can be an important measure
for public transportation companies. They also make
a comparison of three non-parametric popular regres-
sion methods, namely, Support Vector Machine, Ran-
dom Forests and Projection Pursuit Regression using
the data from the same public transportation company
of our study. As remarking conclusions, Random For-
est is elected as the best method and it is advised to
use ensemble learning methods for improving the ac-
curacy of the predictions.
This work borrows some important ideas from the
study in (Uno et al., 2009), such as how to separate the
pipeline (like map-matching, data reduction, data pro-
cessing and data reporting) and the important func-
tionalities on each of the stages.
The related work also shows that some studies are
very different in nature, like the study conducted (As
and Mine, 2016), where the granularity of the avail-
able data is larger (1 second) and the information of
the route, number of bus stops, travel direction and
bus performance history is present as probe data used
for the study.
3 PROPOSED SYSTEM AND
ARCHITECTURE
The proposed system architecture, represented in the
figure 1, includes several modules and data sources to
support the processing pipeline.
From the bottom of the diagram to the top, please
find the description of each element:
• vanetV3 and the STCP Website are the main
raw data sources. vanetV 3 contains the location
information of each bus every 15 seconds, gath-
ered through the vehicular network, and the STCP
website contains the information on bus lines and
schedule.
• The Extraction Scripts transform the raw data,
existing on the website, in a set of well-known
and well structured files containing all the context
information such as the bus lines, bus stops, etc.
• The Matching Unit consists in a Python program
which conforms the log position data with the bus
network data.
• The Matches Database is responsible for holding
the Matching Unit results (bus line identification,
matched bus stops, etc).
• The Estimation Database helps to gather perfor-
mance metrics from bus delays.
• The Synchronization Script performs the syn-
chronization between the Matches Database and
the Estimation Database.
• The Bus Network Information Database holds
information about the bus carrier such as line, bus
stops and their relationship.
• Bus Network Information API delivers informa-
tion about the bus carrier infrastructure (the lines,
stops, the relation between them).
• Matches API delivers information about the
matches, data which is stored in the Matches
Database.
• Estimation API delivers only one endpoint for
gathering the estimated times of arrival given a
stop, a line and a date, data which is stored in the
estimation database.
• Prediction API delivers an endpoint for making
predictions, using the prediction module.
• The Line Performance Dashboard is a dash-
board for the bus carrier manager to consult the
performance of the carrier’s lines. It is a decision
support dashboard because it provides hints about
the health of the bus transportation system.
VEHITS 2018 - 4th International Conference on Vehicle Technology and Intelligent Transport Systems
396