Forecasting Travel Times with Space Partitioning Methods

Jhonny Pincay

1 a

, Alvin Oti Mensah

, Edy Portmann

and Luis Ter

1,3 b

Human-IST Institute, University of Fribourg, Boulevard de P

erolles 90, Fribourg, Switzerland

University of Bern, Hochschulstrasse 6, Bern, Switzerland

Universidad de las Fuerzas Armadas ESPE, Av. General Rumi

nahui S/N, Sangolqu

ı, Ecuador

Keywords:

Travel Time, Spatio-temporal Data, Transportation, Smart Logistics, Geohash, Geogrid.

Abstract:

Roads and streets are more and more crowded. For delivery companies that use road transportation, this is a

concerning issue as longer times spent on roads mean higher operational costs and less customer satisfaction.

Nevertheless, the data captured during operation hours of their vehicles can be leveraged to address such issues.

This, however, is not a straightforward task given the possible low number of vehicles covering one route and

the complexities introduced by the delivery business nature. The present research work proposes an approach

to forecast travel time through the use of probe data from logistic vehicles and simple mathematical models.

The delivery operations of ﬁve months of a vehicle from the Swiss Post, the national postal service company

of Switzerland, were studied in a segment-to-segment manner, following a four-step method. Moreover, the

results of the forecasting were evaluated calculating the mean absolute percentage error and mean absolute

error metrics. The results obtained indicate that is possible to achieve a considerable forecasting accuracy

without the deployment of a large number of vehicles or the implementation of complex algorithms.

1 INTRODUCTION

The number of vehicles on roads and streets has mas-

sively increased over the decades, which translates

into more frequent trafﬁc congestion. This has led to

the need for people to plan their journeys pre-trip and

en-route. Moreover, for companies, considering vehi-

cle transit and their impact on their supply chain has

become vital to keep their operational costs as low as

possible. Factors as the aforementioned have sparked

a growing interest in trafﬁc modeling and forecast-

ing of travel times. The technological advent and the

data availability of recent years have also eased the

development of systems that provide travel time in-

formation to commuters. Common data sources used

for such systems include sensors (e.g., point and loop

detectors), studies on-site, and global positioning sys-

tems (GPS) equipped in vehicles circulating on roads

(Mori et al., 2015; Zhou et al., 2012).

Trafﬁc and travel time modeling using GPS data

has gained considerable attention as it is one of the

less costly methods. A growing number of studies are

devoted to developing such models employing data

https://orcid.org/0000-0003-2045-8820

https://orcid.org/0000-0002-0503-511X

that come from ﬁtted vehicles used for trafﬁc data

gathering, taxis, public transportation, among others.

Common methods to process such data are machine

learning and deep learning-based, which however of-

fer good results, require vast amounts of data and

computational resources (Pu et al., 2009; Zhou et al.,

2012; Yuan et al., 2010).

On the other hand, little to no attention is given

to trafﬁc data from logistics vehicles, given the com-

plexities introduced as a consequence of business op-

erations. Road and speed restrictions, multiple deliv-

ery stops, and waiting on customers are some of the

events recorded in delivery probe data which need to

be properly handled when building travel time mod-

els. Another issue is the low sample rate as logis-

tic companies might have unique vehicles covering

certain routes, yet it is also possible that this vehi-

cle circulates through the same route every day and

thus, large amounts of data are produced. If the data

collected by the logistic vehicles is properly studied,

important insights can be obtained which could be

used to draw insights from enterprise supply chains

for strategic planning.

This research project proposes a novel approach

for network-wide travel time estimation, in sight of

the constraints previously highlighted. In this con-

Pincay, J., Mensah, A., Portmann, E. and Terán, L.

Forecasting Travel Times with Space Partitioning Methods.

DOI: 10.5220/0009324601510159

In Proceedings of the 6th International Conference on Geographical Information Systems Theory, Applications and Management (GISTAM 2020), pages 151-159

ISBN: 978-989-758-425-1

151

text, a data-driven approach for travel time estima-

tion, using data from a company’s probe-vehicles and

geospatial indexing is proposed. As a result, it is ex-

pected to deﬁne a straightforward method that allows

forecasting travel times with acceptable levels of ac-

curacy. For this effect, an artifact was developed fol-

lowing the principles of the design science methodol-

ogy.

This article is structured as follows: Section 2

presents the theoretical background on which this re-

search work is grounded. Then, the methods used in

this study are described in Section 3. Results are pre-

sented in Section 4. Section 5 ﬁnalizes the article with

a summary and concluding remarks.

2 THEORETICAL BACKGROUND

This section presents the concepts used in this re-

search work. Some previous research efforts that at-

tempted to achieve similar goals are also examined.

2.1 Trafﬁc and Travel Time Estimation

Lin et al. (2005) deﬁned the main components of a

road trafﬁc environment as humans, vehicles, and fa-

cilities (e.g., roads and signaling). Humans and ve-

hicles constitute trafﬁc demand, whereas the facilities

provide the supply. According to this notion, travel

time is dependent on the dynamism and interactions

between the demand and supply and the conditions

affecting any of them (e.g., road nature and weather).

Furthermore, road trafﬁc can be classiﬁed into two

states: (i) congested/jam (ii) uncongested/free ﬂow

(Treiber and Kesting, 2013). There is a set of mea-

surable trafﬁc characteristics or variables, capable of

describing the trafﬁc in any of these two states. These

variables are referred to as trafﬁc state variables.

The fundamental trafﬁc variables include ﬂow, vehi-

cle density, and speed.

Aside from these three variables, there are other

equally important trafﬁc variables such as the travel

time (Nanthawichit et al., 2003; Van Lint and

Van Hinsbergen, 2012). The majority of methods

dealing with trafﬁc and travel time analysis depend

on the full availability of the aforementioned vari-

ables. Data can be collected using externally localized

trafﬁc measuring instruments, which record a com-

prehensive state of the trafﬁc conditions within their

coverage range (Treiber and Kesting, 2013). Data

captured from these stationary devices is known as

trajectory data. Even though this approach allows

having a full picture of trafﬁc at any given point in

time, the number of devices that need to be deployed

is rather high and therefore expensive (Ruppe et al.,

2012; Yoon et al., 2007).

Nevertheless, some methods allow working with

partially observed or incomplete data. These methods

are known as trafﬁc state estimation (TSE). Accord-

ing to Seo et al. (2017), TSE is the process of deduct-

ing trafﬁc state variables on road segments (portion

of a road) using partially observed data. Such meth-

ods can be model-driven, data-driven or streaming-

data-driven. The general approach of TSE methods

in performing trafﬁc data analysis is characterized

by D’Andrea and Marcelloni (2017) and Wang et al.

(2013) into three phases:

• Segmentation. Divide roads into ﬁner spatial

and/or temporal units (segments).

• Annotation. Annotate segments with an expected

behavior (e.g., vehicle density, travel time).

• Estimation. Inference with with respect to the

expected behavior for each segment

In TSE methods where estimations of travel times

are performed at ﬁner spatio-temporal resolution,

travel time is deﬁned as the amount of time taken to

traverse a unit space of a road segment, usually mea-

sured in minutes per kilometer (min/km) (Seo et al.,

2017). At a micro-scale, travel time is calculated for

individual vehicles given their respective entry and

exit times in a segment. The travel time, therefore,

is calculated using Equation 1.

T T

out

− T

min/km (1)

where, T T

is the microscopic-scale travel time for

a vehicle i. T

out

is the timestamp at which the vehi-

cle exits the segment. T

is the timestamp at which

the vehicle entered the segment. D is the length of

the segment. Aggregating individual travel times in

a segment estimates the travel time for a segment at

a macro-scale. A macro-scale segment’s travel time

T T

is computed using Equation 2.

T T

∑

i=1

T T

(2)

Data-driven TSE approaches for travel time estima-

tion and prediction aim at leveraging the relationship

(model) between the supply and the trafﬁc demand at

various road segments, to approximate any of the traf-

ﬁc variables.

2.2 Geospatial Indexing

Geospatial data depict geographical information such

as longitudes and latitudes. Geospatial indexes are

GISTAM 2020 - 6th International Conference on Geographical Information Systems Theory, Applications and Management

152

data structures developed for efﬁcient handling, stor-

age, retrieval and processing of data with spatial at-

tributes, and they are developed from well-known

structures such as sorted arrays, binary trees, B-trees,

and hashing (Lu and Ooi, 1993).

One geospatial indexing approach is geohash.

Geohash is a hierarchical spatial data structure, which

subdivides spatial regions into bounding boxes or grid

buckets at different granulation and precision levels

(Niemeyer, 2019; Vukovic, 2016). Geohash uses a

base-32 (32-bits) alphanumeric character encoding, to

produce unique ASCII strings. This string serves as

an identiﬁer representing a bounding box containing

speciﬁc GPS coordinates. Moreover, geohash is hier-

archical hashing algorithm with twelve levels known

as precision levels. Each level deﬁnes a bounding box

size given a spatial region. The bigger the size of the

bounding box, the larger the number of GPS points

contained (La Valley et al., 2017). Table 1 presents

details about the twelve geohash precision levels and

their bounding boxes size.

Table 1: Geohash Precision Levels and Their Bounding Box

Size (Levels One (1) to Seven (7)).

Geohash spatial indexing

Precision level Bounding box area

1 ≤ 5,000 km × 5,000 km

2 ≤ 1,250 km × 1,250 km

3 ≤ 156 km × 156 km

4 ≤ 39.1 km × 19.5 km

5 ≤ 4.89 km × 4.89 km

6 ≤ 1.22 km × 0.61 km

7 ≤ 153 m × 153 m

It is easy to move between the levels of precision.

Higher precision levels have longer geohash codes

and the bounding boxes containing them have a geo-

hash string with the same preﬁx. For instance, the

coordinates x = (46.9466,7.4426) are mapped to the

level six geohash um716 and to the level seven geo-

hash um7167. Figure 1 depicts the aforementioned

coordinates at the two levels. Note the difference in

the size of the bounding boxes, the smaller the bound-

ing box, the higher the precision and longer the length

of the geohash code.

2.3 Related Works

Previous research efforts addressing the task of es-

timating and predicting travel time using probe data

from logistic vehicles are presented in this section.

Zhang et al. (2017) constructed multiday spa-

tiotemporal speed diagrams with probe data collected

(a) Level Six (6) Geohash for the Point x.

(b) Level Seven (7) Geohash for the Point x.

Figure 1: The Coordinates x = (46.9466,7.4426) Mapped

to Geohash Codes in Levels Six (6) and Seven (7).

from logistic vehicles in Beijing, China. They made

use of correlation trafﬁc features in space and time

by constructing a gray-level co-occurrence matrix

(GLCM). A similarity measure was calculated with

normalized square differences (NSD) between current

and historical GLSMs, to select candidate trafﬁc pat-

terns. The future travel times were estimated by com-

paring current conditions to similar experienced travel

times.

Another related initiative is one of Yoon et al.

(2007). A novel approach for aggregating data tem-

porally for an identiﬁed spatial region was introduced.

Relying on probe data from a single taxi on a ﬁxed

road segment, their approach attempted to character-

ize trafﬁc patterns and identify trafﬁc states, as well

to address the low sampling rate problem due to the

limited number of vehicles per road segment. The

authors found that the trafﬁc patterns obtained by

studying the behavior of the taxi were consistent over

time, thanks to the segment-oriented analysis that they

performed. Even though estimating and forecasting

travel time was not the intention of this research work,

it contributes to ﬁnding a solution to the low penetra-

tion/sampling rate problem of delivery vehicles.

In the work of Wang et al. (2014), the devel-

opment of a real-time model for estimating travel

time within a city was proposed. The researchers

addressed the problems of data sparsity that work-

ing with probe data brings and responding quickly to

users, by modeling travel times in different road seg-

ments and making use of three-dimensional tensors.

Forecasting Travel Times with Space Partitioning Methods

153

In contrast to the aforementioned research efforts,

this work proposes a scalable approach to estimate

and calculate near future travel times by using logistic

probe data, pattern searching and geospatial indexing

on temporally aggregated data, while having accept-

able levels of accuracy.

3 METHODOLOGY AND USE

CASE

The guidelines of the design science research for in-

formation systems methodology were followed in the

development of this work. This research methodol-

ogy was selected because its application entails the

development of an artifact while extending existing

knowledge (Hevner and Chatterjee, 2010). Moreover,

the fact that this project was developed in collabo-

ration with Swiss Post, the national postal company

of Switzerland, eased the selection of the research

methodology, as this company supports investigation

but is also interested in obtaining practical solutions

in the process.

The method in this study encompassed four main

phases: i) data cleaning and selection; ii) travel time

approximation; iii) travel time prediction; and, iv)

evaluation. Figure 2 depicts theses phases as well as

intermediate steps.

Figure 2: Methodology.

3.1 Data Cleaning and Selection

Data of ﬁve months of operations of logistic trucks

of Swiss Post was used. The initial database con-

sisted of 353.1 million records and each record was

described in terms of twenty-six ﬁelds. The infor-

mation described in the records corresponded to GPS

location of the vehicles during their delivery routes,

speed, mileage, events (e.g., parked, motor on and

off), driver and vehicle identiﬁcation, street name,

among others. The operations took place from July to

November of 2018 and only the data points registered

on the area of Bern-Ostermundingen (Switzerland)

were considered, given the limitations of our com-

putational resources and as it was a familiar area for

the Swiss Post representatives supporting this project.

Moreover, duplicates, inconsistent and invalid records

were removed.

3.2 Travel Time Approximation

The goal of this phase was to calculate travel time

through the sum of cruising time within segments that

constitute a journey.

The analysis carried out in this project was based

solely on probe data from delivery vehicles. The

data contained detailed timestamped location records,

which are useful for deducing travel times between

any two arbitrary points. However, interruptions

caused by business activities (i.e., delivering a pack-

age) need to be properly handled as well as the low

penetration rate of delivery vehicles in estimating

travel times.

Regarding the usage of historical trafﬁc data, two

assumptions about road trafﬁc and travel time were

made:

1. Historical data contains latent trafﬁc relationship

valid for current and future trafﬁc conditions. This

assumption follows the general approach of pat-

tern matching methods, which establishes that

trafﬁc patterns are recurrent in nature and there-

fore similar historical events can be used to pro-

vide estimates on current conditions (Zhang et al.,

2017).

2. With a large amount of data from any given seg-

ment, an expected value for the travel time can

be approximated with the average obtained from

past trips in that segment. This assumption fol-

lows the strong law of large numbers (See Equa-

tion 3) (Lo

eve, 1997) .



lim

n→∞

= µ



= 1 (3)

which asserts that the probability Pr that the aver-

age of the observations converges to the expected

value as the number of points n becomes larger, is

equal to one.

An estimation model was derived based on the three

phases of TSE methods (refer to Sec. 2.1).

GISTAM 2020 - 6th International Conference on Geographical Information Systems Theory, Applications and Management

154

3.2.1 Geohash Segmentation

The low penetration rate of probe vehicles per de-

livery routes (speciﬁcally 1 per route) in the area of

Bern-Ostermundingen, led to ﬁnding ways to over-

come this limitation; thus, temporal aggregation and

spatial segmentation were applied. The temporal ag-

gregation implied studying the behavior of the vehicle

circulating on different days, whereas the spatial seg-

mentation entailed dividing the space into segments

and analyze the trafﬁc data on a segment-to-segment

basis, following common practices of TSE methods.

Geohash was the approach employed to segment

the space, as it allows grouping points to a com-

mon spatial bin (or bounding box) of a ﬁxed size.

The chosen geohash level was eight meaning that the

bounding boxes for the segmentation had a size of

∼ 38m × 19m, which was decided given the distribu-

tion of the GPS points and the level of detail that the

researchers hoped to achieve.

3.2.2 Annotation

The annotation consisted of approximating the ex-

pected speed for each segment. This speed was calcu-

lated as follows:

For a segment r of length l (length deducted from

the size of the geohashes), there is a travel time ex-

pectation T T

that can be calculated in terms of the

average time of past trips in the segment r. Thus, the

expected (mean) speed ¯s is calculated as expressed in

Equation 4.

¯s =

T T

(4)

The speed ¯s corresponds to the segment’s mean speed;

however, applying the strong law of larger number

this mean speed is assumed to be the expected speed

for the segment r.

3.2.3 Estimation

The expected travel time for a given segment was

computed using the average travel time of past trips.

To compute the actual travel time for a current trip,

the real-time average speed in the segment s needs to

be used. Thus, considering the length l of the seg-

ment, for a vehicle i, its actual travel time T T

can be

determined using Equation 5.

T T

(5)

Moreover, unusual trafﬁc conditions produce devia-

tions from the expected mean speed. Lower average

speed than the expected segment’s speed signals un-

favorable conditions, while higher average speed sug-

gests better trafﬁc dynamics than usual. Thus, differ-

ences in the actual travel time and the expected travel

time could be the result of changes in the trafﬁc sit-

uation, stops due to pedestrian crossing, and driver

behavior. This time difference is expressed in terms

of ε (segment delay) and therefore, the actual travel

time can be reformulated with Equation 6.

T T

= T T

+ ε (6)

The segment delay ε can be expressed using Equa-

tion 7.

ε = T T

− T T

ε =

−

¯s

ε =



s − ¯s



× l

s × ¯s

(7)

Since l,T T

and ¯s are known values, the actual travel

time is solely dependent on the current speed s which

captures other road conditions. At each point in time,

the instantaneous speed or the average of the recorded

speed are used to calculate ε. It should be pointed out

that negative error terms may occur as a consequence

of possible favorable trafﬁc conditions, and therefore,

decreased travel times than the expectation.

3.3 Travel Time Prediction

Predicting arrival time at delivery targets epitomize

travel time prediction in the delivery business. In

terms of existing prediction models, there are four

groups (Mori et al., 2015): i) naive, methods that do

not model trafﬁc data but make diverse assumptions

to deliver a fast prediction; ii) trafﬁc ﬂow-based, tech-

niques that rely on mathematical relations between

trafﬁc ﬂow, density, and speed; iii) data-based, ap-

proaches that rely on historical data to ﬁnd relation-

ships between trafﬁc variables; and, iv) hybrid, meth-

ods that combine concepts from the aforementioned

groups.

The complexities in predicting travel with probe

data are further aggravated due to the irregular and

complex business activities, which involve numerous

external waiting times besides the usual trafﬁc behav-

ior. As no particular model was found suitable in our

case study, a hybrid approach was adopted and the

following steps were followed.

For pre-processing purposes, a non-parametric

pattern searching approach was used to ﬁlter out ex-

ternal waiting times. Estimations of segment delay

(ε), expected travel time (E[T T

]) and current travel

Forecasting Travel Times with Space Partitioning Methods

155

time T T

, were deduced using the expressions de-

ducted and presented in Section 3.2.3.

To predict the travel time from a current segment

r to a target segment t, two scenarios were consid-

ered: the ﬁrst scenario (see Fig. 3), illustrates the case

where multiple vehicles are present in segments of the

same route. With live data, ε terms are calculated and

a dynamic travel time prediction can be computed us-

ing Equation 8.

T T (r → t) =

∑

i=1

(E[T T

]

+ ε

) (8)

where k is the number of segments within a predic-

tion horizon from the current segment to the target.

In simple words, the individual travel times and de-

lays per each vehicle and segment are computed and

summed out to predict the travel time to the target t.

Figure 3: Prediction Scenario with Real-Time Information

from Constituent Segments.

The second scenario (see Figure 4), corresponds to the

case where only one vehicle is present in a particular

route. A multi-step gradual approach is undertaken to

compensate for the unknown future delays within seg-

ments. The prediction can be framed in terms of the

sums of the delays ε within segments from a source to

destination, given each segment’s E[T T

Figure 4: Multi-Step Gradual Prediction Approach. Delays

from the Current Segment Are Used to Adjust Expected

Travel Time to the Destination Segment.

At each current segment, real-time data is used to cal-

culate the current ε term and the travel time is adjusted

accordingly. ε terms for segments without informa-

tion is assumed to be zero, and therefore the latest ex-

pectation values are used. Under these assumptions,

the travel time is predicted using Equation 9.

T T (s → t) = ε

∑

j=1

E[T T

]

(9)

where s indicates the current segment and its corre-

sponding delay term ε

, j represents subsequent seg-

ments from the current segment until the destination

segment. This approach, however, takes into consid-

eration only cruising time. Additional waiting times

in each segment are neglected, thus, the closer the tar-

get is approached, the more accurate the prediction.

3.4 Evaluation

The evaluation stage consisted of assessing the relia-

bility of the deduction of the segment speed expecta-

tion and the travel time estimation and prediction. For

the evaluation, the dataset was split into two parts:

• Historical Data. This is the dataset used for the

artifact prototyping and model reﬁnements. The

historical data consisted of data recorded during

a four-month delivery period, from July to Octo-

ber. The 4-month data contained 85% of the total

cleaned data (approx. 140,000 records).

• Test Data. A separate dataset was prepared for

testing purposes only and not used during the

modeling process. It consisted of delivery data

for the month of November, being approximately

15% of the total cleaned dataset.

3.4.1 Travel Time Estimation

Given that records in the dataset contained a ﬁeld

with instantaneous speed, to provide evaluations for

the deduced segment expected speed, the mean aver-

age percentage error measure (MAPE) was chosen to

compare the accuracy of the travel time forecasting

using the instantaneous speed and the deduced seg-

ment mean speed from our approach. MAPE is a

dimensionless measure and a common approach for

comparing different forecasting models (Zhang et al.,

2017). MAPE expresses the magnitude of the error

relative to the ground truth as a percentage and is de-

ﬁned using Equation 10.

MAPE =

∑

t=1



− F



× 100% (10)

where n is the number of observations, A

is the ac-

tual value and f

is the forecasted value. Lewis (1982)

deﬁned four ranges with their interpretations for typ-

ical MAPE values found in industrial and business

data: an error smaller than 10%, then the forecast-

ing is accurate; values between 10% and 20% indicate

that the forecasting is good; values between 20% and

50% show that the forecasting is not inaccurate but

not good either, and ﬁnally, a value greater than 50%

indicates that the forecasting is inaccurate.

GISTAM 2020 - 6th International Conference on Geographical Information Systems Theory, Applications and Management

156

3.4.2 Travel Time Prediction

To asses the accuracy of the proposed prediction

model, a naive model was implemented using the test

dataset and served as a baseline for comparison. Fur-

thermore, the metrics of MAPE and the mean abso-

lute error metric (MAE) (see Equation 11) were used

to measure the magnitude error in time from the pre-

diction models.

MAE =

∑

t=1

− F

(11)

where A

corresponds to the prediction and F

to the

true value.

4 RESULTS

4.1 Data Cleaning and Selection

Once the data cleaning took place, it was found that

45% of the records were duplicates or were not use-

ful for the goals of this work, resulting in a dataset

of 315,014 records. Moreover, after discarding ﬁelds

that were not relevant for analysis, each record was

described in terms of fourteen ﬁelds.

4.2 Travel Time Approximation

After the segmentation, annotation and estimation

steps were executed, additional ﬁelds were added to

the records in the dataset: i) geohash, the string refer-

encing the coordinates; ii) distance, in kilometers un-

til the next destination point in the next segment; iii)

duration, the time employed to reach the next segment

point; iv) waiting time, the time elapsed at a location

within a segment; and v) mean speed, the deduced

speed calculated applying the Equation 4.

Furthermore, the MAPE was calculated to com-

pare the results of the travel time estimation using the

instantaneous speeds (from the historical data) and the

deducted expected mean speeds. Relying on the de-

duced mean speed, the MAPE value of 6.03 was ob-

tained, suggesting that our approach is near highly ac-

curate; relying on the instantaneous speed, the MAPE

value of 128.68 is obtained which signify inaccurate

estimates.

These results signal that instantaneous speed does

not provide an accurate generalization for the mean

speed within segments. As such, the estimation us-

ing point values results in overestimation or underes-

timation and consequently, in reduced accuracy in the

travel time estimates.

4.3 Travel Time Prediction

The travel time prediction accuracy was evaluated ap-

plying the MAPE and MAE metrics. The proposed

approach and the deﬁned baseline were compared to

the dataset values. As per the results presented in

Table 2, this approach had a reasonable forecasting

performance (MAPE value of 23.6), with a mean ab-

solute error of fourteen minutes and thirty-three sec-

onds. In contrast, the baseline (naive model) had a

poorer performance.

Table 2: MAE and MAPE Travel Time Prediction Accu-

racy Comparison between the Baseline and the Proposed

Approach.

MAE MAPE

Baseline 39” 08’ 43.9

This Approach 14” 33’ 23.6

Although the MAE measurement of this proposed ap-

proach seems daunting, the authors argue that it is a

very acceptable result considering the simple meth-

ods that were used. Moreover, when calculating the

MAE values from segments closer to the destination,

they decreased, meaning that the forecasting becomes

more accurate as illustrated in Figure 5 in two ran-

domly selected trips.

5 SUMMARY AND

CONCLUSIONS

This research work proposes a data-driven approach

to forecasting travel times of en-route logistics ve-

hicles. Data on the operations of a Swiss Post de-

livery vehicle were analyzed. By means of the de-

sign science research methodology, an artifact was

implemented following a procedure that consisted of

four stages: i) data cleaning and selection, ii) travel

time approximation, iii) travel time prediction, and iv)

evaluation.

The data cleaning and selection stage allowed the

removal of inconsistent records and disregard ﬁelds

that were not of interest. After this stage was com-

pleted, the dataset was composed of 315,014 records

which were studied. Later, the travel time approxima-

tion stage took place, which entailed aggregating the

data of the different days and the intermediate steps

of segmentation, annotation, and estimation. The seg-

mentation step was conducted through the application

of geohash to perform a segment-to-segment analy-

sis of the behavior over time of the logistic vehicles;

following, the annotation step encompassed estimat-

ing the expected mean step for each segment; lastly,

Forecasting Travel Times with Space Partitioning Methods

157

(a) Date: 2018-11-01 from 07:00 to 08:00.

(b) Date: 2018-11-12 from 12:00 to 13:00.

Figure 5: Prediction Error Patterns in an Hour Interval De-

noting Source and Destination Segments from Two Ran-

domly Selected Trips on Different Days and Different Peri-

ods.

the estimation step implied deducting the expressions

that allowed estimating the travel time of a vehicle for

a current trip. Afterward, the travel time prediction

stage took place, whose purpose was to deﬁne the ex-

pressions that allowed forecasting a vehicle’s arrival

travel time to a destination. Finally, the evaluation

stage aimed at assessing the accuracy of the estima-

tion and prediction phases by calculating MAPE and

MAE metrics.

The results of the evaluation stage suggest that

this approach is a feasible implementation, as they

were nearly accurate and showed higher accuracy

than the baseline methods. Even though the MAE

value showed that there was a difference of fourteen

minutes in average between the prediction and the his-

torical data, the authors consider that these results are

satisfactory considering the low penetration rate of

the probe vehicles studied.

Moreover, this initiative differentiates itself from

other methods that rely on map-matching as geohash-

ing requires low computational power in comparison

to the computationally intensive map-matching algo-

rithms. In addition, the geohashing indexing allows

concurrent modeling and analysis at different levels,

in a course-to-ﬁne manner and vice-versa, which fa-

cilitates analysis tasks.

In terms of the travel time expectation and predic-

tion computation, this approach employs rather sim-

plistic models that are low in complexity and eas-

ily scalable unlike other methods based on machine

learning algorithms which require massive resources.

The authors argue that efforts as this one are promis-

ing alternatives that deserve to be explored towards

developing less complex and more efﬁcient solutions.

Furthermore, the results obtained in this work

could be translated to improvement in the quality

of delivery services and even the development of

new ones. Besides, segment-to-segment and granu-

lar analyses allow getting detailed insights about what

happens on the roads, that can serve as a basis to op-

timize the routing of vehicles and therefore, fewer re-

sources consumption.

To close the curtains on this research effort, it

should be highlighted that the expectation of trafﬁc

conditions is time and context-dependent. For ex-

ample, during rush hours one expects less favorable

trafﬁc conditions and therefore, the expectation for

trips at rush hour and non-rush hour periods should be

modeled differently. Furthermore, weather conditions

and special events (e.g., concerts and public demon-

strations) incise the time needed to reach a destina-

tion. Future improvements to this initiative will be

directed towards incorporating such contextual infor-

mation, to improve the developed models and provide

more accurate results.

ACKNOWLEDGEMENTS

The authors would like to thank the members of

the Human-IST Institute at the University of Fri-

bourg for contributing with valuable thoughts and

comments. We especially thank the Secretariat of

High Education, Science, Technology, and Innovation

(SENESCYT) of Ecuador and the Swiss Post for their

support to conduct this research.

REFERENCES

D’Andrea, E. and Marcelloni, F. (2017). Detection of traf-

ﬁc congestion and incidents from gps trace analysis.

Expert Systems with Applications, 73:43–56.

Hevner, A. and Chatterjee, S. (2010). Design research in

information systems: theory and practice, volume 22.

Springer Science & Business Media.

La Valley, R., Usher, A., and Cook, A. (2017). Detection of

behavior patterns of interest using big data which have

GISTAM 2020 - 6th International Conference on Geographical Information Systems Theory, Applications and Management

158

spatial and temporal attributes. ISPRS Annals of Pho-

togrammetry, Remote Sensing & Spatial Information

Sciences, 4.

Lewis, C. D. (1982). Industrial and business forecasting

methods: A practical guide to exponential smoothing

and curve ﬁtting. Butterworth-Heinemann.

Lin, H.-E., Zito, R., Taylor, M., et al. (2005). A review

of travel-time prediction in transport and logistics. In

Proceedings of the Eastern Asia Society for trans-

portation studies, volume 5, pages 1433–1448.

eve, M. (1997). Probability Theory, Vol. I. Springer,

fourth edition edition.

Lu, H. and Ooi, B. C. (1993). Spatial indexing: Past and

future. IEEE Data Eng. Bull., 16(3).

Mori, U., Mendiburu, A.,

Alvarez, M., and Lozano, J. A.

(2015). A review of travel time estimation and fore-

casting for advanced traveller information systems.

Transportmetrica A: Transport Science, 11(2):119–

157.

Nanthawichit, C., Nakatsuji, T., and Suzuki, H. (2003). Ap-

plication of probe-vehicle data for real-time trafﬁc-

state estimation and short-term travel-time predic-

tion on a freeway. Transportation research record,

1855(1):49–59.

Niemeyer, G. (2019). geohash.org. Retrieved July, 15.

Pu, W., Lin, J., and Long, L. (2009). Real-time estimation of

urban street segment travel time using buses as speed

probes. Transportation Research Record, 2129(1):81–

89.

Ruppe, S., Junghans, M., Haberjahn, M., and Troppenz, C.

(2012). Augmenting the ﬂoating car data approach

by dynamic indirect trafﬁc detection. Procedia-Social

and Behavioral Sciences, 48:1525–1534.

Seo, T., Bayen, A. M., Kusakabe, T., and Asakura, Y.

(2017). Trafﬁc state estimation on highway: A

comprehensive survey. Annual Reviews in Control,

43:128–151.

Treiber, M. and Kesting, A. (2013). Trajectory and ﬂoating-

car data. In Trafﬁc Flow Dynamics, pages 7–12.

Springer.

Van Lint, J. and Van Hinsbergen, C. (2012). Short-term traf-

ﬁc and travel time prediction models. Artiﬁcial Intelli-

gence Applications to Critical Transportation Issues,

22(1):22–41.

Vukovic, T. (2016). Hilbert-geohash-hashing geographical

point data using the hilbert space-ﬁlling curve. Mas-

ter’s thesis, NTNU.

Wang, Y., Zheng, Y., and Xue, Y. (2014). Travel time esti-

mation of a path using sparse trajectories. pages 25–

34.

Wang, Z., Lu, M., Yuan, X., Zhang, J., and Van De We-

tering, H. (2013). Visual trafﬁc jam analysis based

on trajectory data. IEEE transactions on visualization

and computer graphics, 19(12):2159–2168.

Yoon, J., Noble, B., and Liu, M. (2007). Surface street traf-

ﬁc estimation. pages 220–232.

Yuan, J., Zheng, Y., Zhang, C., Xie, W., Xie, X., Sun, G.,

and Huang, Y. (2010). T-drive: driving directions

based on taxi trajectories.

Zhang, Z., Wang, Y., Chen, P., He, Z., and Yu, G. (2017).

Probe data-driven travel time forecasting for urban ex-

pressways by matching similar spatiotemporal trafﬁc

patterns. Transportation Research Part C: Emerging

Technologies, 85:476–493.

Zhou, P., Zheng, Y., and Li, M. (2012). How long to wait?:

predicting bus arrival time with mobile phone based

participatory sensing.

Forecasting Travel Times with Space Partitioning Methods

159