Forecasting Public Transportation Capacity Utilisation Considering

External Factors

Fabian Ohler

1,2

, Karl-Heinz Krempels

1,2

and Sandra M

obus

Informatik 5 (Information Systems), RWTH Aachen University, Aachen, Germany

CSCW Mobility, Fraunhofer FIT, Aachen, Germany

Keywords:

Passenger Demand Forecast, Public Transportation Forecast, Transit Passenger Volume Prediction.

Abstract:

Using a forecast of the public transportation capacity utilisation, the buses can be adapted to the demand to

avoid overfull buses leading to delays. An efﬁcient utilisation of the buses at disposal can improve customer

satisfaction as well as economic efﬁciency. The basis for our forecasts provide fragmentary measurements of

passengers boarding and alighting buses at stops over the year 2015. In an attempt to improve the accuracy of

the forecast, several external factors (e. g. weather, holidays, cultural events) were incorporated. We tackle the

problem of forecasting public transportation capacity utilisation by forecasting the number of boarding and

alighting passengers. Then we use these to adjust previous passenger count and the result as input for next

forecast. Using multiple linear regression, support vector regression, and neural networks we evaluate different

ways to model the external factors. Best results were achieved by neural networks with a median absolute error

of ≈4.16 in the forecast passenger count. They were able to keep more than 80% of the forecasts within a

tolerance of 10 passengers. Since the error in the forecasts does not accumulate along the trips, chaining the

forecasts in the described way is a viable approach.

1 INTRODUCTION

In many domains, forecasts are important for plan-

ning and optimization. For public transportation com-

panies, forecasts of passenger load may be used to op-

timize their service planning. Their customers often

complain about crammed buses leading to crowding

and bad air during travels. Overfull buses also lead to

customers not being able to board the bus and having

to wait for follow-up buses. Additionally, the duration

of stays at bus stops is prolonged potentially leading

to delays. Thus, people switch to alternate modes of

travel like using a bike or car. For the bus service

providers, a loss of customers usually results in a ﬁ-

nancial deﬁcit. On the other hand, a lack of infor-

mation about the transportation demand may lead to

wasted capacities during times of low utilisation. In-

formation about passenger demand is a basis for bus

scheduling (Salzborn, 1972) and can help avoid the

aforementioned problems improving customer satis-

faction (Eboli and Mazzulla, 2007).

In times of interconnected vehicles, automatic ve-

hicle location systems, advanced traveller informa-

tion systems, etc., customers are used to being pre-

sented an expected time of arrival / departure for their

means of transportation. Enhancements regarding the

ﬁdelity of this information might also be based on

a more accurate forecast of the passenger load espe-

cially during demand peaks.

In the project Mobility Broker

, multiple mobil-

ity services (e. g. bus, train, car-sharing, bike-sharing)

were integrated into one platform (Beutel et al.,

2016). For the sharing services, the limited availabil-

ity of its resources can prevent users from satisfying

their mobility needs, e. g. in case no bike is available

at the time the user wants to rent it. Therefore, the

user is informed about bike and vehicle availabilities

at the corresponding sharing stations via the mobile

app or the web browser. On the other hand, during de-

mand peaks also buses with high passenger capacity

reach their maximum load leading to unsatisﬁed mo-

bility needs. Including the passenger load into trav-

eller information systems can thus improve the travel-

ling experience by allowing the users of the system to

make informed choices regarding the mobility modes.

Therefore, in the context of the project Mobility Bro-

ker, we decided to investigate the possibility of fore-

casting the passenger load for buses.

https://mobility-broker.com/

300

Ohler, F., Krempels, K-H. and Möbus, S.

Forecasting Public Transportation Capacity Utilisation Considering External Factors.

DOI: 10.5220/0006345703000311

In Proceedings of the 3rd International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2017), pages 300-311

ISBN: 978-989-758-242-4

Due to the potential of predictions including the

aforementioned reasons, a lot of forecasting ap-

proaches have been developed ranging from simple

regression to time series models and data mining tech-

niques. In this paper, we compare different methods

to forecast the passenger load in buses. To do so, we

consider the number of passengers in a bus as the dif-

ference between the number of boarding and alighting

people in addition to the previous passenger count.

We develop a model to forecast the number of board-

ing and alighting people at a bus stop. The model

is trained using historical data exhibiting a relatively

low coverage compared to the amount of data that

could have been collected during the corresponding

time span. Additionally, we integrate multiple factors

that are likely to inﬂuence the passenger load into our

model.

After a short problem description in Section 2, we

survey the related work in Section 3. In Section 4,

we present our approach, which we thoroughly eval-

uate in Section 5. Section 6 gives a conclusion and

outlook.

2 PROBLEM DESCRIPTION

Forecasting means making a statement about future

events based on past observations. While deﬁnitive

statements about the future are rarely possible, math-

ematical models can be used to obtain an approxima-

tion. A forecasting model to determine the number

of passengers in a bus can be reduced to a model esti-

mating the number of people boarding a bus at a given

point in time. Using the same approach, one can fore-

cast the number of people alighting a bus at a given

point in time. Combining both information yields the

change in the passenger count at the considered stop

and tracking these changes starting from the ﬁrst stop

accounts for the absolute passenger count.

The number of passengers in a bus at a speciﬁc

point in time is inﬂuenced by many different aspects,

some of them speciﬁc to the means of transport (e. g.,

position in route), others rather speciﬁc to the circum-

stances like bad weather (Singhal et al., 2014) or big

events (Friedman et al., 2001).

Commuters lead to a high transportation demand

on work days at speciﬁc times of the day. Similarly

pupils lead to demand peaks outside of holidays. Es-

pecially at the start of each semester, college students

tend to use public transportation a lot. Over the course

of the semester, the demand may vary.

Nice weather often encourages people to reach

their target by foot or bike instead of taking the bus.

The contrary is the case whilst rain or frost. Big

events may generally lead to a high demand and ﬂuc-

tuation, but also to trafﬁc jams, both of which has po-

tential to delay buses. Delays in turn also inﬂuence

the number of passengers in a bus, since some people

might miss connections or use other buses while oth-

ers reach the bus stop in addition to the usual demand.

Finally, the number of people in a bus can inﬂuence

the number of people able to board, e. g., since buses

have limited capacity, and to exit.

The relevant factors have to be modelled in a way

compatible with the employed algorithms yet retain-

ing enough information to be of value.

To train the models, we use historic data collected

via sensors above the bus doors counting the number

of passengers boarding and alighting the buses. Yet,

since not all vehicles serving the observed routes were

equipped with those sensors, our data set is a random

sample. The problem of choosing suitable algorithms

to work with the fragmented observation is also tack-

led in this paper.

3 RELATED WORK

Forecasting the number of passengers in a bus is an

example for demand forecasting. It is similar to fore-

casting energy demands in that both are time-variant,

periodic, and inﬂuenced by weather and holidays. In

the area of energy demand forecasting, many attempts

have been made since good forecasts can save huge

amounts of money in that domain and we thus lend a

relevant part of the literature from them.

We shortly present well-known forecasting ap-

proaches before weighing them up against each other

with respect to the problem instance. In (Alfares and

Nazeeruddin, 2002) suitable approaches are divided

into nine categories, augmented by additional two in

(Mansouri et al., 2014).

Multiple Regression. Multiple Regression models

statistical relations between the demand and exter-

nal factors via a linear combination. The regres-

sion coefﬁcients can be determined using e. g., the

least squares method (Montgomery et al., 2015).

Yet, good results are usually only to be expected

in case of linear dependence.

Exponential Smoothing. Exponential Smoothing

relies on the assumption, that future observations

are more similar to observations of the recent past

than of those less recent. Based on historic data,

a function is modelled to predict future values

(Neusser, 2011).

Stochastic Time Series. Forecasting can also be

modelled using time series analysis. Here, the

Forecasting Public Transportation Capacity Utilisation Considering External Factors

301

prognosis is only based on past demand values

and external factors are not included into the

model. The most important model is the ARMA

model composed of the autoregressive (AR) and

the moving-average (MA) model. Using the MA

model to eliminate the white noise, the AR model

performs a regression based on the demand values

of the past. In case of non-stationary processes,

the ARIMA (autoregressive integrated moving-

average) model transforms it to a stationary pro-

cess by differentiation (Neusser, 2011).

Iteratively Reweighted Least-Squares The Iter-

atively Reweighted Least-Squares method is a

modiﬁcation of the least-squares method, simi-

larly applicable to determine model parameters.

In (Mbamalu and El-Hawary, 1993), the authors

used this method to compute the coefﬁcients of

an autoregressive model.

Adaptive Load Forecast. In Adaptive Load Fore-

cast models, the model parameters are automat-

ically adjusted to changing demand. An exam-

ple for a well-known model of this category is the

Kalman ﬁlter (Bastian, 1985).

ARMAX Model based on Genetic Algorithms.

The ARMAX model is an extension of the

ARMA model including external factors via

exogenous variables. In (Yang et al., 1995),

evolutionary programming is used to identify

the parameters of the model. Evolutionary

programming simulates the natural evolutionary

process to heuristically minimize the error of the

model.

Knowledge-based Expert Systems. Knowledge-

based Expert Systems are an artiﬁcial intelligence

approach to bestow upon a system the ability to

reason on its own. Based on facts and if-then-

rules processing the facts, these systems are able

to deduce new information. These systems can

use their rule set to forecast information inferred

from the encoded knowledge (Ertel, 2013).

Fuzzy Logic. Fuzzy Logic systems can model un-

known dynamic systems similar to expert systems

based on rules. Yet, instead of mapping values to

true or false, a membership function assigns val-

ues between 0 and 1. Similarities in the input data

are identiﬁed using ﬁrst- and second-order differ-

ences (Adamy, 2007; Sachdeva and Verma, 2008).

Neural Networks. Neural Networks imitate the way

the human brain works. These networks consist of

nodes representing neurons and weighted edges.

Inputs are propagated through the network and the

output layer represents the result. Using (historic)

training data, the weights are adjusted to minimize

the deviation in the output – the network ‘learns’.

Afterwards, the network can be used for forecast-

ing. A downside of this approach is its black box

design – usually, the user is not able to recon-

struct how the network comes to its conclusions

(Adamy, 2007; Dai and Wang, 2007).

Support Vector Machines. Support Vector Ma-

chines are used for classiﬁcation as well as for

regression. For classiﬁcation purposes, the ma-

chine uses historic data to determine a hyperplane

that separates two classes as well as possible.

Regression is done by ﬁnding a region that is as

small as possible and concentrates all historic

data. Using a kernel function, even non-linear

regression is possible (Guo et al., 2006).

Hybrid Methods. Over the years, many of the afore-

mentioned approaches have been combined into

hybrid systems. Particularly successful were ap-

proaches combining neural networks and fuzzy

logic to so-called neuro-fuzzy systems (Jang,

1993). Furthermore, combinations of neural net-

works and support vector machines (Niu et al.,

2005) or fuzzy logic and expert systems have been

used for demand forecasting.

The methods presented are also used to forecast

demand for public transportation. For example, in

(Zhou et al., 2013) the ARIMA model and in (Xue

et al., 2015) the Kalman ﬁlter is used to forecast pas-

senger demand for buses. Both models are geared to-

wards time series and analyse stationarity, periodicity,

and volatility.

Yet, since the data available to us has the previ-

ously described characteristics of a random sample,

we don’t expect good results from using time series

analysis. The missing data would have to be interpo-

lated and the model would be trained with partially

defective data. We therefore don’t consider Expo-

nential Smoothing, Stochastic Time Series, ARMA,

ARIMA, and ARMAX models any further. As Itera-

tively Reweighted Least-Squares and Adaptive Load

Forecasting are just alternative methods to determine

the parameters for, e. g., Multiple Regression or the

ARMA model, these are neglected here, too.

Knowledge-based Expert Systems as well as

Fuzzy Logic have successfully been applied to energy

demand forecasting (Alfares and Nazeeruddin, 2002).

However, both systems heavily depend on knowledge

of domain experts which is not available to us at the

time of writing.

Using neural networks to forecast seems consid-

erably more promising, since the approach is toler-

ant with respect to vagueness, missing data and non-

linearity. In (Tsai et al., 2009), neural networks have

already been applied to forecast passenger load for

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

302

trains, yet only considering a very limited set of rather

coarse external factors. Furthermore, in (Mo and Su,

2009) a forecast for passenger demand for buses using

neural networks has been drafted incorporating time,

weekday and weather. In this paper, we include addi-

tional external factors into the model to evaluate the

enhancements with with respect to prognosis quality.

As mentioned before, Multiple Linear Regression

preforms best in case the results linearly depend on

the inputs. Even though this is not to be expected for

all factors considered, we will include this approach

to compare it to the more complex ones.

Non-linear dependencies can be modelled using

Support Vector Regression. In contrast to the neural

networks, this approach minimizes the upper bound

of the error instead of its mean. This can lead to better

results in many cases (Jang, 1993).

Even though hybrid systems are gaining more and

more attention in research (cf. (Alfares and Nazeerud-

din, 2002)), the application of these more complex

systems goes beyond the scope of this paper.

In the following, we will thus compare Neural

Networks, Support Vector Regression, and Multiple

Linear Regression.

4 APPROACH

Our forecast is based on models trained using historic

data compiled from several sources. Since the mea-

surement data we have at our disposal are for the city

of Aachen (Germany), the points considered as pos-

sible inﬂuences with respect to passenger demand are

speciﬁc to Aachen. The following aspects are consid-

ered as factors in the model:

• line and bus stop,

• number of passengers in the bus,

• delay,

• time,

• weekday,

• public holidays,

• school holidays,

• semester breaks of the RWTH Aachen University,

• weather,

• cultural events (CHIO

, Christmas market, fairs,

carnival, Weinsommer

, SeptemberSpecial

), and

http://www.chioaachen.de

http://www.weinsommer.de/aachen

http://www.aachenseptemberspecial.de

• home games of the local soccer club (Alemannia

Aachen).

The local transportation company ASEAG (Aach-

ener Straßenbahn und Energieversorgungs-AG) pro-

vided us with measurement data acquired in 2015 by

infrared sensors mounted above the doors of some

of their buses to determine the number of people

in the bus. This data also contains information

about the bus line and current stop, the delay of the

bus as well as current time and date. Times and

dates for public/school holidays, semester breaks and

the aforementioned cultural events and soccer games

were added manually. Using data from the German

Weather Service (DWD)

, we augmented the input

data with information about the weather. Accumu-

lating the data from the different sources and bring-

ing it into a homogeneous form ﬁnalised the data pre-

processing step.

In the data transformation and modelling steps,

the following points were taken into consideration:

Bus line and stop are used to partition the data. The

rest of the data has to be represented as real num-

bers. The number of people in the bus and the delay

are already given as natural numbers and the time is

modelled as the number of minutes since midnight.

For the weekday, we consider two different represen-

tations: It can either be modelled as a dummy vari-

able that is 1 if the data corresponds to a weekday and

0 if it belongs to a weekend. As an alternative, ev-

ery weekday can be considered on its own via seven

dummy variables for the seven different weekdays

(Monday to Sunday) and it is always the case, that

exactly one of them is 1. Similarly, the public holi-

days can be modelled using a dummy variable. Yet,

since we expect that demand in front of and after pub-

lic holidays differs from the usual demand, we also

consider modelling them using two variables holding

the number of days since the last and until the next

public holiday. School holidays, semester breaks, and

cultural events last for several days or even weeks.

As we expect increased demand at the start and end

of these periods (and their complements), in addi-

tion to the aforementioned dummy variable approach,

we consider the following alternative modelling: Us-

ing school holidays as an example, we create four

variables representing the amount of days since the

start of the holidays, days left of the current holidays,

days until the next holidays, and days since the end

of the last holidays. During holidays, the ﬁrst two

of them are non-negative and the others are zero and

vice versa. The weather is modelled via temperature

in degree Celsius, relative humidity and precipitation

http://www.dwd.de/DE/leistungen/klimadatendeutsch

land/klarchivtagmonat.html

Forecasting Public Transportation Capacity Utilisation Considering External Factors

303

measured directly as real numbers. Additionally, a

variable holding the amount of minutes until kick-off

for soccer home games is introduced with values be-

coming negative after kick-off and being zero on days

without home games.

Hereby, we introduced several factors with two

different modelling strategies each (weekday, public

holidays, school holidays, semester breaks, cultural

events) resulting in different ways to model our input

data.

5 EVALUATION

We evaluated our approach using the statistics soft-

ware R (R Core Team, 2016). Various packages pro-

viding implementations for lots of statistical models

are available for R. Part of this collection is the mul-

tiple linear regression, which is implemented as lm

(linear models) in the stats package.

For the support vector regression, multiple imple-

mentations exist (Hornik et al., 2006). Because of its

additional function tune, the package e1071 (Meyer

et al., 2015) was chosen providing the function svm,

which also internally handles data scaling. Of the

four available kernels, the linear and radial kernels

were chosen for evaluation. Using predict, the trained

model can be used to forecast.

For neural networks, again, multiple implementa-

tions exist, out of which neuralnet (Fritsch and Guen-

ther, 2016) was chosen, since it is tailored to regres-

sion and can handle more than one hidden layer.

The aforementioned local public transport opera-

tor provided us with measurements for two bus lines:

For bus line 3A, there are 60682 measurement read-

ings, corresponding to about 2100 readings per stop.

33131 measurement readings were available for bus

line 3B, yielding about 1100 readings per stop. As al-

ready stated, not all vehicles were equipped with the

measurement devices.

After preprocessing the data and consolidating it

in a data warehouse it turned out that for public holi-

days, disproportionally few measurements were avail-

able even when considering that the bus frequency is

usually lower on holidays. Therefore, public holidays

were not considered as a separate factor.

This leaves us with sixteen ways to model our

factors. We numbered them consecutively from 1 to

16 such that they encode, how the factors are repre-

sented:

n = 1 +2

w + 2

h + 2

b + 2

Here, w, h, b, and c correspond to weekday, school

holidays, semester breaks, and cultural events, respec-

tively. In case a factor is modelled in the more elabo-

rate way, its variable is set to 1, otherwise (when it is

modelled via a dummy variable) it is 0. Hence, rep-

resentation 1 is the most simple and representation 16

the most complex one.

We evaluated ﬁve models using the approaches se-

lected with the ﬁrst method being the multiple lin-

ear regression (MLR). Furthermore, we use the ε-

SVR as described in (Sch

olkopf et al., 2000) with

the default value ε = 0.1 and the linear (SVR-L) as

well as the radial kernel (SVR-R). Since the parame-

ter C (and additionally γ for the radial kernel) signif-

icantly inﬂuence the results, we evaluate the models

for different parameter values. We choose C ∈ N

≤10

γ ∈ {0.2, 0.4, 0.6, 0.8, 1} and plot the value for the best

parameter (pair). In addition, we consider two neural

networks. A rather simple one (nnet1) with one hid-

den layer containing two neurons and a second one

(nnet2) with two hidden layers containing ﬁve neu-

rons in the ﬁrst and three in the second layer. RPROP

was used to train the networks with a tolerance thresh-

old of 0.01 and at most 10 million iterations.

5.1 Forecasting Boarding and Alighting

Passenger Count per Stop

To compare the models and the different representa-

tions, for all stops of the bus lines 3A and 3B we

used a 10-fold cross-validation (cf. (Arlot and Celisse,

2010)) to determine the mean absolute error (MAE)

as well as the maximum and median of the absolute

error. The ﬁrst and last stop were left out, since we

assume empty buses at the start and end of trips.

To allow for more general conclusions, the mean

of the MAE over all stops was determined for ev-

ery factor representation. Additionally, we trained the

models for the simplest factor representation using a

reduced set of data (5%, 10%, 25%, and 50% of the

measurement data). To evaluate the inﬂuence of the

amount of data available, we determined the MAE

for all stops of the bus line 3A using a 10-fold cross-

validation based on the reduced data sets.

For the various stops of a line, different pairs of

models and factor representations produce the lowest

MAEs. This is illustrated in Fig. 1 showing the MAE

of the forecast of alighting passengers at two exam-

ple stops of bus line 3B. For stop 2102, the best re-

sult is produced by the SVR using a linear kernel and

representation 7 while at stop 2133, the simple neural

network in combination with representation 9 leads to

the best results.

When considering the average MAE over all

stops, Fig. 2 illustrates that there are no major dif-

ferences between the various ways to represent the

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

304

Figure 1: Mean absolute error in the forecast of the number of alighting passengers at three example stops of bus line 3B for

all 16 factor representations.

external factors. Only for the larger neural network

(nnet2), some of the errors are so large that they would

degrade the readability of the chart when fully plot-

ted. That’s why the average error values for represen-

tation 9 of the boarding passenger count for line 3A

(≈12.49) and for representation 13 of the boarding

passenger count for line 3B (≈7.96) are only hinted

at. In every case, choosing a more complex represen-

tation improves the MAE at most by a value of 0.03

when compared to the simplest representation.

For the more detailed comparison of the models,

we restrict ourselves to factor representation 1 tak-

ing into account the marginal differences between the

possible representations. Figure 3 illustrates the MAE

in the forecast of the alighting passenger count for

four exemplary stops of line 3B. As one can see, dif-

ferent models produce the smallest errors at the stops.

For the stops 2100 and 3171, the neural networks

seem favourable, but the support vector regression

performs better for the stops 3178 and 3508. How-

ever, the differences between the models in the MAE

values are rather small again.

Thus, we consider the overall values for all stops

once more. Table 1 shows the average values over all

stops of the mean, median, and maximum errors over

all trips for the simplest representation. For both bus

lines, the best mean and median error values are pro-

duced by the support vector regression using a radial

kernel. Here, most of the stops favoured the param-

eters C = 1 and γ = 0.2. Multiple linear regression

and the large neural network (nnet2) lead to the worst

results with respect to mean and median. When con-

sidering the average of the maximum error, the small

neural network (nnet1) performs best in three out of

four situations.

To evaluate the inﬂuence of the number of mea-

surement readings available, we randomly reduced

the available readings for bus line 3A to 5, 10, 25, and

50 percent of our data. Figure 4 shows the average

absolute errors in the forecast for the different mod-

els and data fractions. In most cases, smaller training

sets lead to worse results. This is especially true for

the neural networks, which degrade heavily for small

data sets. The least inﬂuence can be seen for the sup-

port vector regression.

5.2 Forecasting the Number of

Passengers in a Bus

In the following, we combine the information about

boarding and alighting passengers to determine the

number of passengers in a bus over a trip. To eval-

uate this approach, we examine 20 trips of bus line

3A and 19 trips of bus line 3B. The training set used

for a forecast consists of all historic information for

the corresponding stop and bus line minus the one

to be determined. For the origin stop of a trip, the

number of passengers in the bus when arriving at the

stop (possibly from previous trips of the vehicle; the

bus is empty most of the time) is taken from the mea-

surement readings. The forecast number of boarding

passengers is added, the forecast number of alighting

passengers is subtracted and the resulting value serves

as the number of passengers in the bus when arriv-

Forecasting Public Transportation Capacity Utilisation Considering External Factors

305

Figure 2: Average over the MAE of all stops with respect to the forecast of boarding and alighting passengers for the bus

lines 3A and 3B for all factor representations.

ing at the follow-up stop. For the remaining stops of

the trip, the forecast value resulting from the previ-

ous stop is used. In case the stop is a ﬁnal stop where

no passenger ever entered according to historic data

or is an origin stop where no passenger ever left the

bus, this is incorporated in the forecast. Taking the

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

306

Table 1: Averages of the absolute errors in the forecast of boarding and alighting passengers for the bus lines 3A and 3B over

all stops for representation 1.

boarding passengers alighting passengers

mean median max mean median max

line 3A

MLR 2.01 1.55 24.48 1.81 1.35 22.31

SVR-L 1.91 1.32 25.50 1.73 1.20 23.92

SVR-R 1.86 1.30 25.37 1.66 1.16 25.73

nnet1 1.92 1.46 24.16 1.67 1.22 21.55

nnet2 2.07 1.44 153.68 1.69 1.20 44.91

line 3B

MLR 2.10 1.61 22.13 1.62 1.23 16.18

SVR-L 1.99 1.39 22.95 1.54 1.07 17.18

SVR-R 1.93 1.35 22.58 1.54 1.07 19.00

nnet1 2.09 1.53 21.96 1.57 1.11 16.47

nnet2 2.24 1.51 63.93 1.62 1.11 25.43

results from the previous section into account, the pa-

rameters for the support vector regression were set to

C = 1 and γ = 0.2, all other parameters and models

are unchanged.

Figure 3: Mean absolute error in the forecast of the number

of alighting passengers at four example stops of bus line 3B

for the simplest factor representation.

To compare the models, two trips of bus line 3A

were forecast and plotted together with the exact val-

ues in Figure 5. The external factors were represented

in the simplest way. For trip 41899, all models tend

to slightly underestimate the number of passengers

in the bus while for trip 54627, the opposite is the

case. Note that at the end of trip 41899, the bus is

not empty and the forecast is therefore not overridden

to zero in either case. Overall, the forecast does not

diverge from the exact values – so the error doesn’t

grow over time. Thus, our approach seems viable also

over longer time spans.

When factoring in the different representations,

the results for the two trips differed again (see Fig-

ure 6). The combination of representation and model

achieving best results depended on the trip. Com-

pared to the support vector regression using a ra-

dial kernel and the simplest representation which was

favoured in the previous stage, the MAE could be im-

proved by using another combination by about 4.44

and 3.66, respectively.

In Figure 7, all considered trips of both bus lines

are evaluated for all models and representations. The

median, 25%- and 75%-quartiles are plotted. For bus

line 3B, the best median absolute error was achieved

using SVR-R and representation 8 (an improvement

of about 14% compared to representation 1). When

looking at bus line 3A, the large neural network us-

ing representation 10 performed best gaining approx-

imately 27% accuracy over SVR-R in representation

Since the capacity of a bus seems large compared

to the error values considered here, we also evaluated

the number of predictions that are within a tolerance

of up to n ∈ [1, 10] passengers. Representation 10 led

to the best results in nearly all situations including

n = 10. Therefore, Figure 8 only covers those values.

Table 2: Inﬂuence of integrating external factors on absolute

error in forecasting the number of people in the bus.

mean median

line 3A

nnet2 5.89 4.16

simple 7.60 5.82

minimal 10.49 8.00

line 3B

nnet2 6.74 5.05

simple 6.60 4.83

minimal 10.23 7.69

While the smaller neural network performed best

for bus line 3B for larger tolerance values, it was dom-

inated by SVR models for small tolerance values. For

bus line 3A, the larger neural network outperformed

all other models for all values of n.

We also evaluated an approach that for each stop

uses that pair of representation and model which min-

imizes the overall MAE for this stop. Yet, this ap-

proach yielded results similar to using a single model

and representation for all stops regarding the average

absolute error over the considered trips. Addition-

ally, we evaluated the inﬂuence of integrating exter-

Forecasting Public Transportation Capacity Utilisation Considering External Factors

307

Figure 4: Average over the MAE of all stops with respect to the forecast of boarding and alighting passengers for bus line 3A

in the simplest factor representation using reduced training data.

Figure 5: Exact and forecast number of passengers over the time of two example trips of line 3A (simplest representation).

Figure 6: MAE in the forecast of passengers in the bus over two example trips of line 3A for all representations.

nal factors. For this purpose, two further models were

trained only including time and weekday in its sim-

ple representation. The ‘simple’ model used radial

SVR with the aforementioned parameter values and

the ‘minimal’ model used MLR. Table 2 contains the

mean and median error values of the two models for

both bus lines and (for comparison) the values for the

larger neural network using representation 10. Here,

the error values for bus line 3B were slightly worse

for nnet2 than for the ‘simple’ approach. For line

3A (where about twice as many measurement read-

ings were available), the more sophisticated models

performed signiﬁcantly better than the simpler ones.

6 CONCLUSION

In this paper, we evaluated multiple forecasting mod-

els to determine the number of passengers in a bus

over a trip. Several external factors (such as weather

and public holidays) were considered and different

ways to model them were presented. Using measure-

ment data for two bus lines, we evaluated the perfor-

mance of the models and the inﬂuence of the external

factors and the way they were represented. We started

by forecasting the number of boarding and alighting

passengers at a bus stop. Combined with the number

of passengers in the bus previous to the stop, these

numbers give us the passenger count in the bus after

the stop. Using this approach along trips does not lead

to accumulated errors and is thus feasible. In conclu-

sion, the neural network with two hidden layers using

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

308

Figure 7: Quartiles (25%, 50%, 75%) of the absolute error values in the forecast of passengers in the bus for both bus lines.

Figure 8: Average percentage of forecasts correct within

increasing tolerance values for representation 10.

representation 10 seems to be a good ﬁt for the avail-

able data set for bus line 3A. Representation 10 mod-

els school holidays and semester breaks via dummy

variables, but uses the more elaborate version for the

factors weekday and cultural events. With more data

available (also for bus line 3B), neural networks are

promising to perform even better (see Figure 4). Con-

sidering the external factors, the different representa-

tions have an impact on the accuracy of the predic-

tions, especially for the larger neural network. To

integrate the external factors into the models seems

to be beneﬁcial especially when more data is avail-

able to overcome possible overﬁtting. When allowing

for a tolerance of 10 passengers for the forecast, the

neural networks again achieve good results and out-

perform the other models reaching adequate results in

over 80% of the cases (having passenger counts of up

to 139 in the data, a tolerance of 10 passengers seems

sufﬁciently small).

While we only considered two different neural

networks, other variations in the number of layers and

neurons are possible. Additionally, hybrid methods

often lead to amazingly precise forecasts. Thus, in-

vestigating well-ﬁtting candidates constitute the next

step in our future research. Selecting suitable training

data could also be improved, possibly by employing

classiﬁcations methods. Furthermore, other represen-

tations for external factors could be studied, e. g., cat-

egorising the weather instead of taking raw inputs. On

the other hand, it might me worthwhile to differenti-

ate between the cultural events considered.

As we only had data for two bus lines going into

Forecasting Public Transportation Capacity Utilisation Considering External Factors

309

opposite directions, we treated them separately. Con-

sidering the whole bus network at once may enable

the model to learn about the interdependencies be-

tween different bus lines. Additionally, the scope can

be magniﬁed by integrating trains or other modes of

transportation.

ACKNOWLEDGMENTS

This work was partially funded by German Federal

Ministry of Economic Affairs and Energy (BMWi)

for the project Mobility Broker (01ME12136) as well

as for the project Digitalisierte Mobilit

at – Die Offene

Mobilit

atsplattform (DiMo-OMP).

REFERENCES

Adamy, J. (2007). Fuzzy Logik, Neuronale Netze und Evo-

lution

are Algorithmen. Shaker.

Alfares, H. K. and Nazeeruddin, M. (2002). Electric load

forecasting: Literature survey and classiﬁcation of

methods. Int. J. Systems Science, 33(1):23–34.

Arlot, S. and Celisse, A. (2010). A survey of cross-

validation procedures for model selection. Statist.

Surv., 4:40–79.

Bastian, J. (1985). Optimale Zeitreihenprognose: empir.

Probleme u. L

osungen. PhD thesis, University of

Giessen, Gießen, Germany.

Beutel, M. C., G

okay, S., Kluth, W., Krempels, K.-H.,

Ohler, F., Samsel, C., Terwelp, C., and Wiederhold,

M. (2016). Information integration for advanced travel

information systems. Journal of Trafﬁc and Trans-

portation Engineering, 4(4).

Dai, W. and Wang, P. (2007). Application of pattern recog-

nition and artiﬁcial neural network to load forecasting

in electric power system. In Third International Con-

ference on Natural Computation (ICNC 2007), vol-

ume 1, pages 381–385.

Eboli, L. and Mazzulla, G. (2007). Service quality attributes

affecting customer satisfaction for bus transit. Journal

of public transportation, 10(3):2.

Ertel, W. (2013). Grundkurs k

unstliche Intelligenz: eine

praxisorientierte Einf

uhrung. Springer-Verlag.

Friedman, M. S., Powell, K. E., Hutwagner, L., Graham,

L. M., and Teague, W. G. (2001). Impact of changes

in transportation and commuting behaviors during the

1996 summer olympic games in atlanta on air quality

and childhood asthma. JAMA, 285(7):897–905.

Fritsch, S. and Guenther, F. (2016). neuralnet: Training of

Neural Networks. R package version 1.33.

Guo, Y.-C., Niu, D.-X., and Chen, Y.-X. (2006). Support

vector machine model in electricity load forecasting.

In 2006 International Conference on Machine Learn-

ing and Cybernetics, pages 2892–2896. IEEE.

Hornik, K., Meyer, D., and Karatzoglou, A. (2006). Support

vector machines in r. Journal of statistical software,

15(9):1–28.

Jang, J. R. (1993). ANFIS: adaptive-network-based fuzzy

inference system. IEEE Trans. Systems, Man, and Cy-

bernetics, 23(3):665–685.

Mansouri, V. et al. (2014). Neural networks in electric

load forecasting: A comprehensive survey. Jour-

nal of Artiﬁcial Intelligence in Electrical Engineering,

3(10):37–50.

Mbamalu, G. and El-Hawary, M. (1993). Load forecasting

via suboptimal seasonal autoregressive models and it-

eratively reweighted least squares estimation. IEEE

Transactions on Power Systems, 8(1):343–348.

Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., and

Leisch, F. (2015). e1071: Misc Functions of the De-

partment of Statistics, Probability Theory Group (For-

merly: E1071), TU Wien. R package version 1.6-7.

Mo, Y. and Su, Y. (2009). Neural networks based

real-time transit passenger volume prediction. In

Power Electronics and Intelligent Transportation Sys-

tem (PEITS), 2009 2nd International Conference on,

volume 2, pages 303–306. IEEE.

Montgomery, D. C., Peck, E. A., and Vining, G. G. (2015).

Introduction to linear regression analysis. John Wiley

& Sons.

Neusser, K. (2011). Die sch

atzung vektor-autoregressiver

modelle. In Zeitreihenanalyse in den Wirtschaftswis-

senschaften, pages 191–195. Springer.

Niu, D.-X., Wanq, Q., and Li, J.-C. (2005). Short term

load forecasting model using support vector machine

based on artiﬁcial neural network. In 2005 Interna-

tional Conference on Machine Learning and Cyber-

netics, volume 7, pages 4260–4265. IEEE.

R Core Team (2016). R: A Language and Environment for

Statistical Computing. R Foundation for Statistical

Computing, Vienna, Austria.

Sachdeva, S. and Verma, C. M. (2008). Load forecasting

using fuzzy methods. In Power System Technology

and IEEE Power India Conference, 2008. POWER-

CON 2008. Joint International Conference on, pages

1–4. IEEE.

Salzborn, F. J. M. (1972). Optimum bus scheduling. Trans-

portation Science, 6(2):137–148.

Sch

olkopf, B., Smola, A. J., Williamson, R. C., and Bartlett,

P. L. (2000). New support vector algorithms. Neural

Computation, 12(5):1207–1245.

Singhal, A., Kamga, C., and Yazici, A. (2014). Impact of

weather on urban transit ridership. Transportation Re-

search Part A: Policy and Practice, 69:379 – 391.

Tsai, T., Lee, C., and Wei, C. (2009). Neural network

based temporal feature models for short-term railway

passenger demand forecasting. Expert Syst. Appl.,

36(2):3728–3736.

Xue, R., Sun, D. J., and Chen, S. (2015). Short-term

bus passenger demand prediction based on time se-

ries model and interactive multiple model approach.

Discrete Dynamics in Nature and Society, 2015.

Yang, H.-T., Huang, C.-M., and Huang, C.-L. (1995). Iden-

tiﬁcation of armax model for short term load forecast-

ing: an evolutionary programming approach. In Pro-

VEHITS 2017 - 3rd International Conference on Vehicle Technology and Intelligent Transport Systems

310

ceedings of Power Industry Computer Applications

Conference, pages 325–330.

Zhou, C., Dai, P., and Li, R. (2013). The passenger de-

mand prediction model on bus networks. In Ding, W.,

Washio, T., Xiong, H., Karypis, G., Thuraisingham,

B. M., Cook, D. J., and Wu, X., editors, 13th IEEE

International Conference on Data Mining Workshops,

ICDM Workshops, TX, USA, December 7-10, 2013,

pages 1069–1076. IEEE Computer Society.

Forecasting Public Transportation Capacity Utilisation Considering External Factors

311