Figure 2: Scatter plot of the arrival delay and ground op-
eration time training data grouped by the departure delay
class.
the classes is respected. In this context, the output
of a classifier where the a posteriori class probabil-
ities are estimated is obliged to be unimodal, i.e., to
have only one local maximum. There are different
ways to impose unimodality and in (Pinto da Costa
et al., 2008) the authors suggested two approaches. In
the parametric approach, a unimodal discrete distribu-
tion, like the binomial and Poisson’s, is assumed and
its parameters are estimated by the classifier. In the
non-parametric approach, no distribution is assumed
and the classifier is trained so that its output becomes
unimodal. In all practical experiments conducted by
the authors, the parametric approach led to better re-
sults, in particular when the binomial distribution was
considered. The superior performance achieved with
this distribution was also justified in theoretical terms.
For these reasons, our focus here is on the binomial
model. Furthermore, since the classifier chosen by us
is a neural network (Haykin, 2009), we refer hereafter
to a binomial network. Its description applied to our
problem is given next.
As mentioned before, given information about a
flight that will departure from Porto Airport, we are
interested in predicting in which of the following in-
tervals the departure delay will fall: ] − ∞,0], ]0, 15],
]15,30], ]30,60] and ]60,+∞[ minutes. Representing
the information given about the flight by x and the
K = 5 departure delay classes ] − ∞,0], ... , ]60,+∞[
minutes by C
1
, ... , C
K
, respectively, Bayes decision
theory (Hastie et al., 2009) suggests classifying the
flight departure delay in the class maximising the a
posteriori probability P(C
k
|x). To that end, the a
posteriori probabilities P(C
1
|x),.. .,P(C
K
|x) need to
be estimated. In the binomial network, these prob-
abilities are calculated from the binomial distribution
B(K−1, p). As this distribution takes values in the set
{0, 1,. .., K − 1}, we take value 0 to represent class
C
1
, 1 toC
2
, and so on, until K − 1 to C
K
. This explains
the coding of the classes presented in the previous
section. Now, since K is known, the only unknown
parameter is the probability of success p. Hence, we
consider a network architecture as in Figure 3 and
train it to adjust all connection weights from layer 1
to layer 3. Note that the connections from layer 3 to
layer 4 have a fixed weight equal to 1 and serve only
to forward the value of p to the output layer of the net-
work where the probabilities from the binomial distri-
bution are calculated. For a given query x, the output
of layer 3 will be a single numerical value in [0,1],
denoted by p
x
. Then, the probabilities in layer 4 are
calculated from the binomial distribution:
P(C
k
|x) = B
k−1
(K − 1, p
x
), k = 1,. .. ,K, (1)
where
B
k−1
(K − 1, p
x
) =
(K − 1)!p
k−1
x
(1− p
x
)
K−k
(k− 1)!(K − k)!
. (2)
When p
x
is in [0,
1
K
[, the highest a posteriori proba-
bility is P(C
1
|x), and, therefore, the predicted flight
departure delay class is C
1
. More generally, when
p
x
is in [
i−1
K
,
i
K
[, for some i in {1, ... , K}, the high-
est a posteriori probability is P(C
i
|x), and, there-
fore, the predicted flight departure delay class is C
i
.
Hence, in order to train the network on a training set
T = {(x
n
,C
x
n
)}
N
n=1
⊂ X × {C
k
}
K
k=1
, where X is the
feature space, we replace C
k
by the value of p corre-
sponding to the midpoint of [
k−1
K
,
k
K
[, i.e., p
k
=
k−0.5
K
,
and apply a suitable optimization algorithm, like the
Marquardt method (Rao, 2009), to find connection
weights that minimise the mean squared error
1
N
N
∑
n=1
p
target
x
n
− p
network
x
n
(w)
2
, (3)
where p
target
x
n
is the value of p replacing C
x
n
and
p
network
x
n
(w) is the output of layer 3 given the query
x
n
and having the network the weights w. In the fol-
lowing, we describe how we apply in our case a sen-
sitivity analysis proposed in (Kewley et al., 2000) to
measure the importance of the predictor variables in a
trained network.
The binomial network, once trained, can be used
to predict a departure delay class
ˆ
C
x
for a particu-
lar flight, based on some information x given about
that flight. Now, recall that each class is represented
by a value in the set {0, 1,... , K − 1} and note that
the value corresponding to
ˆ
C
x
is ˆy
x
= ⌊Kp
x
⌋, where