Predicting Flight Departure Delay at Porto Airport: A Preliminary Study

Hugo Alonso

1,2

and Ant´onio Loureiro

Universidade de Aveiro, Campus Universit´ario de Santiago, 3810-193 Aveiro, Portugal

Universidade Lus´ofona do Porto, Rua Augusto Rosa, 24, 4000-098 Porto, Portugal

Aeroporto do Porto, 4470-558 Maia, Portugal

Keywords:

Flight Delay Prediction, Ordinal Classiﬁcation, Unimodal Model, Neural Networks, Trees.

Abstract:

Managing an airport is very complex. Decisions are often based on common sense and inﬂuence several

variables, such as ﬂight delay. This paper considers the problem of predicting ﬂight departure delay at Porto

Airport. As far as we know, this the ﬁrst study on the subject. The problem is treated as an ordinal classiﬁcation

task and a suitable approach, based on the so-called unimodal model, isused to predict the delay. The unimodal

model is implemented using neural networks and, for comparison purposes, also using trees.

1 INTRODUCTION

The decisions taken in the management of an airport

are often based on common sense and inﬂuence sev-

eral variables, such as ﬂight delay. Reducing this de-

lay presents the advantage of decreasing costs and

increasing the quality of the service provided to the

passengers. It is thus important to ﬁnd which vari-

ables inﬂuence ﬂight delay and use them to predict

it. In this context, several studies, such as (Rebollo

and Balakrishnan, 2014; Wong and Tsai, 2012; Tu

et al., 2008), were carried out and tried to answer the

challenge. For instance, all studies agree that there

is a close relation between arrival delay and depar-

ture delay. Some of them treat ﬂight delay prediction

as a regression problem, predicting the delay by the

minute, and others as a classiﬁcation problem, pre-

dicting a time interval where the delay will fall.

The problem here considered is to predict ﬂight

departure delay at Porto Airport. As far as we know,

this is the ﬁrst study on the subject. Given informa-

tion about a ﬂight that will departure from this airport,

such as its arrival delay, we are interested in predict-

ing in which of the following intervals the departure

delay will fall: ] − ∞,0], ]0,15], ]15,30], ]30,60] and

]60,+ ∞[ minutes. Since these intervals can be viewed

as naturally ordered classes, the prediction problem

can be treated as an ordinal classiﬁcation task. Hence,

we apply a suitable classiﬁcation method, based on

the so-called unimodal model (Pinto da Costa et al.,

2008), that takes into account the order relation be-

tween the classes, to predict the departure delay class.

The main idea behind this method is that the random

variable class associated with a given query should

follow a unimodal distribution. In order to illustrate

this idea, suppose that, for a certain ﬂight with a given

set of characteristics, the most likely is to observe the

departure delay in the interval ]30, 60] minutes. Given

that there is an order relation between the considered

departure delay intervals, ]15, 30] and ]60,+ ∞[ min-

utes are closer to ]30,60] minutes, and, therefore, the

second most likely interval should be one of these

two. Note that this makes more sense than having,

for instance, the interval ] − ∞,0] minutes for second

most likely. More generally, the probabilities should

decrease monotonically to the left and to the right of

the interval or class where the maximum probability

is attained, i.e., the distribution should be unimodal.

The studies we know where ﬂight delay prediction is

treated as a classiﬁcation problem either consider two

classes or when they consider more than two classes

they ignore the order relation between them.

The unimodal model can be implemented using

any appropriate machine learning paradigm. Here, we

implement it using neural networks (Haykin, 2009).

Moreover, we measure the importance of the predic-

tor variables by applying a sensitivity analysis pro-

posed in (Kewley et al., 2000) and select the most sig-

niﬁcant for departure delay prediction. For compari-

son purposes, we also implement the unimodal model

using trees (Hastie et al., 2009).

The remainder of this paper is organized as fol-

lows. Section 2 presents the data used in this study.

The unimodal model and the way we apply it to pre-

Alonso, H. and Loureiro, A..

Predicting Flight Departure Delay at Porto Airport: A Preliminary Study.

In Proceedings of the 7th International Joint Conference on Computational Intelligence (IJCCI 2015) - Volume 3: NCTA, pages 93-98

ISBN: 978-989-758-157-1

dict ﬂight departure delay are described in Section 3.

The results of our computer experiments are shown

in Section 4 and the conclusions and future work are

given in Section 5.

2 DATA

We had access to a large dataset of 26189 regular

commercial passenger ﬂights performed during 2012.

First, we randomly chose 2619 ﬂights, i.e., about 10%

of all cases, to form a smaller dataset on which we

could carry out our computer experiments in an ac-

ceptable amount of time. Then, we partitioned the

smaller set into training and test subsets. The former

was used to ﬁt models and was assigned 2/3 of the

data, i.e., 1746 cases, while the latter was used to test

selected models and was assigned the remaining 1/3

of the data, i.e., 873 cases.

The target variable in our dataset is the departure

delay interval or class. For reasons that will become

clear later on, we coded the classes using 0 to repre-

sent ]−∞,0], 1 to ]0,15], 2 to ]15,30], 3 to ]30, 60] and

4 to ]60,+∞[ minutes. The class distribution is unbal-

anced, as shown in Figure 1 for the training data. In

fact, classes 0 and 1 are much more frequent than the

others and together they represent almost 80% of the

cases. These two classes together correspond to the

time interval ] − ∞,15] minutes and when the ﬂight

delay falls in this interval it is said at the airport that

there is no commercial delay.

0 1 2 3 4

Departure delay class

Relative frequency

Figure 1: Relative frequency distribution of the departure

delay class in the training set.

The predictor variables in our dataset are the fol-

lowing:

• arrival delay (in minutes);

• origin and destination of the ﬂight;

• predicted weekday, hour, day and month of the

ﬂight;

• meteorological conditions;

• airline;

• aircraft type;

• aircraft parking stand, ground operation time (in

minutes) and take-off runway.

We chose these predictor variables based on a lit-

erature review regarding departure delay prediction

at other airports and on the experience of the sec-

ond author (an operations manager at Porto Airport

since 2001 and operations worker at this airport since

1987). Several predictor variables took non numerical

values and we had to transform them so that we could

use neural networks. We proceeded as explained next.

The origin and destination of the ﬂight are airports,

which we identiﬁed by their coordinates, latitude and

longitude (in degrees), in the World Geodetic System

WGS 84, used by the Global Positioning System (El-

Rabbany, 2006). For the predicted weekday of the

ﬂight, we took 1 to represent Sunday, 2 to Monday,

and so on, until 7 to Saturday. The variable meteo-

rological conditions is binary and we used 0 for nor-

mal visibility operations and 1 for low visibility oper-

ations. The airline was represented by the IATA ac-

counting code; see http://www.iata.org. Finally, the

aircraft type was identiﬁed by the aircraft length (in

meters). We did this because we found that different

types have different lengths and that each type, with

three exceptions, has a unique length. In each of the

three exceptions, we took the average of the corre-

sponding lengths, which are close to each other, to

identify the type.

Figure 2 shows a scatter plot of the arrival delay

and ground operation time training data grouped by

the departure delay class. These two predictor vari-

ables are the most signiﬁcant for departure delay pre-

diction in our implementations of the unimodal model

described next.

3 THE UNIMODAL MODEL AND

ITS APPLICATION TO FLIGHT

DEPARTURE DELAY

PREDICTION

The unimodal model is a machine learning paradigm

intended for supervised classiﬁcation problems where

the classes are ordered. It was introduced in (Pinto da

Costa et al., 2008) and, for instance, recently applied

in (Fern´andez-Navarro et al., 2015). The main idea

behind this model is that the random variable class

associated with a given query should follow a uni-

modal distribution, so that the order relation between

NCTA 2015 - 7th International Conference on Neural Computation Theory and Applications

0 20 40 60 80 100 120 140 160 180 200

200

400

600

800

1000

1200

Arrival delay (in minutes)

Ground operation time (in minutes)

Class 0

Class 1

Class 2

Class 3

Class 4

Figure 2: Scatter plot of the arrival delay and ground op-

eration time training data grouped by the departure delay

class.

the classes is respected. In this context, the output

of a classiﬁer where the a posteriori class probabil-

ities are estimated is obliged to be unimodal, i.e., to

have only one local maximum. There are different

ways to impose unimodality and in (Pinto da Costa

et al., 2008) the authors suggested two approaches. In

the parametric approach, a unimodal discrete distribu-

tion, like the binomial and Poisson’s, is assumed and

its parameters are estimated by the classiﬁer. In the

non-parametric approach, no distribution is assumed

and the classiﬁer is trained so that its output becomes

unimodal. In all practical experiments conducted by

the authors, the parametric approach led to better re-

sults, in particular when the binomial distribution was

considered. The superior performance achieved with

this distribution was also justiﬁed in theoretical terms.

For these reasons, our focus here is on the binomial

model. Furthermore, since the classiﬁer chosen by us

is a neural network (Haykin, 2009), we refer hereafter

to a binomial network. Its description applied to our

problem is given next.

As mentioned before, given information about a

ﬂight that will departure from Porto Airport, we are

interested in predicting in which of the following in-

tervals the departure delay will fall: ] − ∞,0], ]0, 15],

]15,30], ]30,60] and ]60,+∞[ minutes. Representing

the information given about the ﬂight by x and the

K = 5 departure delay classes ] − ∞,0], ... , ]60,+∞[

minutes by C

, ... , C

, respectively, Bayes decision

theory (Hastie et al., 2009) suggests classifying the

ﬂight departure delay in the class maximising the a

posteriori probability P(C

|x). To that end, the a

posteriori probabilities P(C

|x),.. .,P(C

|x) need to

be estimated. In the binomial network, these prob-

abilities are calculated from the binomial distribution

B(K−1, p). As this distribution takes values in the set

{0, 1,. .., K − 1}, we take value 0 to represent class

, 1 toC

, and so on, until K − 1 to C

. This explains

the coding of the classes presented in the previous

section. Now, since K is known, the only unknown

parameter is the probability of success p. Hence, we

consider a network architecture as in Figure 3 and

train it to adjust all connection weights from layer 1

to layer 3. Note that the connections from layer 3 to

layer 4 have a ﬁxed weight equal to 1 and serve only

to forward the value of p to the output layer of the net-

work where the probabilities from the binomial distri-

bution are calculated. For a given query x, the output

of layer 3 will be a single numerical value in [0,1],

denoted by p

. Then, the probabilities in layer 4 are

calculated from the binomial distribution:

P(C

|x) = B

k−1

(K − 1, p

), k = 1,. .. ,K, (1)

where

k−1

(K − 1, p

) =

(K − 1)!p

k−1

(1− p

)

K−k

(k− 1)!(K − k)!

. (2)

When p

is in [0,

[, the highest a posteriori proba-

bility is P(C

|x), and, therefore, the predicted ﬂight

departure delay class is C

. More generally, when

is in [

i−1

[, for some i in {1, ... , K}, the high-

est a posteriori probability is P(C

|x), and, there-

fore, the predicted ﬂight departure delay class is C

Hence, in order to train the network on a training set

T = {(x

)}

n=1

⊂ X × {C

}

k=1

, where X is the

feature space, we replace C

by the value of p corre-

sponding to the midpoint of [

k−1

[, i.e., p

k−0.5

and apply a suitable optimization algorithm, like the

Marquardt method (Rao, 2009), to ﬁnd connection

weights that minimise the mean squared error

∑

n=1



target

− p

network

(w)



, (3)

where p

target

is the value of p replacing C

and

network

(w) is the output of layer 3 given the query

and having the network the weights w. In the fol-

lowing, we describe how we apply in our case a sen-

sitivity analysis proposed in (Kewley et al., 2000) to

measure the importance of the predictor variables in a

trained network.

The binomial network, once trained, can be used

to predict a departure delay class

for a particu-

lar ﬂight, based on some information x given about

that ﬂight. Now, recall that each class is represented

by a value in the set {0, 1,... , K − 1} and note that

the value corresponding to

is ˆy

= ⌊Kp

⌋, where

Predicting Flight Departure Delay at Porto Airport: A Preliminary Study

Figure 3: Binomial network for ﬂight departure delay prediction.

is the output of layer 3 given x. In this context,

let ˆy

, ... , ˆy

denote the values we get by varying

the j-th predictor variable x

through its values in the

training set, x

, ... , x

, and holding all other pre-

dictors at their modes (for those that are nominal vari-

ables) or averages (for the remaining). Then, the vari-

ance

N − 1

∑

n=1

( ˆy

−

ˆy

)

, (4)

with

ˆy

∑

n=1

ˆy

, (5)

should be high if x

is relevant. Thus, we can measure

the relative importance of the j-th predictor variable

to the binomial network by

∑

ℓ=1

ℓ

× 100%, (6)

where J is the total number of predictors.

Before presenting the results of our computer ex-

periments in the next section, we explain how we se-

lect the predictor variables in the binomial network

and the number of neurons in layer 2. In the beginning

of a ﬁrst step, all predictor variables available are con-

sidered. The number of neuronsin layer 2 is chosen in

order to minimise the estimate of the prediction error

obtained by applying 10-fold cross-validation to the

training set (Hastie et al., 2009). The measure of error

is the same that is used for training (see (3)). Then, a

network with the variables considered and the number

of neurons selected is trained in the entire training set.

Finally, the importance of the variables to the trained

network are calculated using (6). In the beginning of

the next step, only those predictor variables with an

importance greater than the minimum observed in the

previous step are considered. Everything else is done

in the same way as in the previous step. We repeat this

procedure until there is only one predictor variable to

consider. In the end, we select the binomial network

trained in the entire training set whose predictor vari-

ables and number of neurons in layer 2 are associated

with the least estimate of the prediction error among

all minimum estimates obtained in the various steps.

4 RESULTS

We applied the procedure described in the previous

section to ﬁnd the best binomial network for depar-

ture delay prediction at Porto Airport. The network

we found through our computer experiments has two

predictor variables and two neurons in layer 2. The

two predictor variables are the ground operation time

and the arrival delay. The ﬁrst one has an importance

of 50.35% and the second one of 49.65%. Hence,

these two variables are roughly equally important for

the binomial network to predict departure delay. We

applied this network to the test data and the resulting

confusion matrix was the one shown in Table 1.

For comparison purposes, we also implemented

the binomial model using trees (Hastie et al., 2009).

The best pruned tree we found, with the best esti-

mate of the prediction error obtained by applying 10-

fold cross-validation to the training set (Hastie et al.,

2009), has four predictor variables and thirteen ter-

minal nodes. Two of the four predictor variables, the

NCTA 2015 - 7th International Conference on Neural Computation Theory and Applications

Table 1: Confusion matrix for the binomial network applied

to the test data.

Predicted class

0 1 2 3 4

True class

0 0 239 0 0 0

1 0 421 8 1 0

0 76 27 8 0

3 0 22 9 37 0

0 8 1 1 15

ground operation time and the arrival delay, coincide

with the two predictor variables in the network. The

other two are the predicted hour of the ﬂight and the

latitude of the origin of the ﬂight. The importance of

these variables is 27.95%, 69.10%, 1.49% and 1.46%,

respectively. Hence, just like in the binomial network,

the ground operation time and the arrival delay are

the most important variables for the binomial tree to

predict departure delay. However, contrary to the net-

work, the tree gives much more importance to the ar-

rival delay than to the ground operation time. The

other two variables have little importance. We applied

this tree to the test data and the resulting confusion

matrix was the one shown in Table 2.

Table 2: Confusion matrix for the binomial tree applied to

the test data.

Predicted class

0 1 2 3 4

True class

0 55 182 0 2 0

1 33 384 12 0 1

3 82 20 5 1

3 0 25 6 30 7

1 7 0 2 15

The binomial network and the binomial tree are

ordinal data classiﬁers. In order to analyse and com-

pare their results in the test set, we needed a suitable

measure to assess their performance. The misclassi-

ﬁcation error rate makes sense to be used when ev-

ery misclassiﬁcation is considered equally costly, but

this is not the case here, and, therefore, this mea-

sure is not appropriate for us. For instance, assume

for a certain ﬂight that the true departure delay class

is ]60,+∞[ minutes (class 4). Then, it is worse to

have for predicted class ] − ∞,0] minutes (class 0)

than ]30,60] minutes (class 3), since in the ﬁrst case

the predicted class is farther from the true class. The

mean squared error and the mean absolute deviation

are better than the misclassiﬁcation error rate, because

they take values which increase with the distances be-

tween the numbers representing the true classes and

the numbers representing the predicted classes, and

so the misclassiﬁcations are not taken to be equally

costly. Nevertheless, they are still not completely ap-

propriate, given that the performance assessment they

provide is evidently inﬂuenced by the numbers cho-

sen to represent the classes. In (Pinto da Costa et al.,

2008; Pinto da Costa et al., 2014), the authors anal-

ysed several possibilities and proposed a coefﬁcient

called r

int

that is not sensitive to the values chosen

to represent the classes, only to the order relation be-

tween such values, which is the same as the order rela-

tion between the classes. The coefﬁcientr

int

measures

the association between the two ordinal variables true

class and predicted class. It can be computed from

the confusion matrix, as explained in (Pinto da Costa

et al., 2014), and takes values in [−1, 1]: 1 when the

two variables are identical and -1 when they are com-

pletely opposite. This is the measure we considered

to assess the performanceof the binomial network and

tree. The results are shown next.

In the case of the binomial network, we calculated

int

from Table 1 and got r

int

= 0.70. This indicates

a strong association between the true departure delay

class and the departure delay class predicted by the

network. It is therefore a good result. In the case

of the binomial tree, we calculated r

int

from Table 2

and got r

int

= 0.66. Thus, the best performance in

the test set was achieved by the network. Note that

the network obtained a better result using only half

the predictor variables used by the tree to predict the

departure delay.

5 CONCLUSIONS AND FUTURE

WORK

This paper considered the problem of predicting ﬂight

departuredelay at Porto Airport and presented prelim-

inary prediction results. The problem was treated as

an ordinal classiﬁcation task and a suitable approach,

based on the so-called unimodal model, was used

to predict the delay. We implemented the unimodal

model using neural networks and trees and found in

our experiments that the arrival delay and the ground

operation time are the most signiﬁcant variables for

departure delay prediction. The neural network im-

plementation was simpler and led to better results in

the test set. An interesting thing is that both im-

plementations had difﬁculty in distinguishing ﬂights

whose departure delay falls in ] − ∞,0] minutes from

ﬂights whose departure delay falls in ]0,15] minutes.

In the future, we plan to study this issue. Further-

more, we plan to implement the unimodal model us-

ing support vector machines (Cristianini and Shawe-

Taylor, 2000) and to compare the unimodal approach

with other approaches to ordinal classiﬁcation, such

Predicting Flight Departure Delay at Porto Airport: A Preliminary Study

as the one proposed in (Frank and Hall, 2001).

ACKNOWLEDGEMENTS

This work was supported by Portuguese funds

through the CIDMA - Center for Research and De-

velopment in Mathematics and Applications, and the

Portuguese Foundation for Science and Technology

(“FCT - Fundac¸˜ao para a Ciˆencia e a Tecnologia”),

within project UID/MAT/04106/2013.

REFERENCES

Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction

to Support Vector Machines and Other Kernel-based

Learning Methods. Cambridge University Press,

United Kingdom, 1st edition.

El-Rabbany, A. (2006). Introduction to GPS: The Global

Positioning System. Artech House, Norwood, 2nd edi-

tion.

Fern´andez-Navarro, F., Riccardi, A., and Carloni, S.

(2015). Ordinal regression by a generalized force-

based model. IEEE Transactions on Cybernetics,

45(4):844–857.

Frank, E. and Hall, M. (2001). A simple approach to or-

dinal classiﬁcation. In Proceedings of the 12th Euro-

pean Conference on Machine Learning (ECML 2001),

volume 1, pages 145–156.

Hastie, T., Tibshirani, R., and Friedman, J. (2009). The

Elements of Statistical Learning: Data Mining, Infer-

ence, and Prediction. Springer-Verlag, New York, 2nd

edition.

Haykin, S. (2009). Neural Networks and Learning Ma-

chines. Prentice Hall, New Jersey, 3rd edition.

Kewley, R., Embrechts, M., and Breneman, C. (2000). Data

strip mining for the virtual design of pharmaceuticals

with neural networks. IEEE Transactions on Neural

Networks, 11(3):668–679.

Pinto da Costa, J. F., Alonso, H., and Cardoso, J. S. (2008).

The unimodal model for the classiﬁcation of ordinal

data. Neural Networks, 21:78–91.

Pinto da Costa, J. F., Alonso, H., and Cardoso, J. S. (2014).

Corrigendum to “The unimodal model for the classi-

ﬁcation of ordinal data” [Neural Netw. 21 (2008) 78-

79]. Neural Networks, 59:73–75.

Rao, S. S. (2009). Engineering Optimization: Theory and

Practice. John Wiley & Sons, Inc., New Jersey, 4th

edition.

Rebollo, J. J. and Balakrishnan, H. (2014). Characterization

and prediction of air trafﬁc delays. Transportation Re-

search Part C: Emerging Technologies, 44:231–241.

Tu, Y., Ball, M. O., and Jank, W. S. (2008). Estimat-

ing ﬂight departure delay distributions - A statisti-

cal approach with long-term trend and short-term pat-

tern. Journal of the American Statistical Association,

103(481):112–125.

Wong, J.-T. and Tsai, S.-C. (2012). A survival model for

ﬂight delay propagation. Journal of Air Transport

Management, 23:5–11.

NCTA 2015 - 7th International Conference on Neural Computation Theory and Applications