4 EXPERIMENTATION AND
STATISTICAL STUDY
The main goal of the experiments is to study the be
havior of the algorithm LCoR comparing with other
5 methods found in the literature and for 3 different
quality measures.
4.1 Experimental Methodology
As in (ParrasGutierrez et al., 2012), the experimen
tation has been carried out using 20 data bases, most
of then taken from the INE
1
. The data represent ob
servations from different activities and have differ
ent nature, size, and characteristics. The data bases
have been labeled as: Airline, WmFrancfort, Wm
London, WmMadrid, WmMilan, WmNewYork, Wm
Tokyo, Deceases, SpaMovSpec, Exchange, Gasoline,
MortCanc, MortMade, Books, FreeHouPrize, Prison
ers, TurIn, TurOut, TUrban, and HouseFin. The num
ber of samples in every database is between 43 (for
MortCanc) and 618 (for Gasoline, a database used in
the NN3 competition).
To compare the effectiveness of LCoR, 5 ad
ditional methods have been used, all of them found
within the ﬁeld of time series forecasting: Exponen
tial smoothing method (ETS), Croston, Theta, Ran
dom Walk (RW), and ARIMA (Hyndman and Khan
dakar, 2008).
In order to test and compare the generalization ca
pabilities of every method, databases have been split
into training and test sets. Training sets have been
given the ﬁrst 75% of the data, while test sets are com
posed by the remaining 25% samples.
An open question when dealing with time series is
the measure to be used in order to calculate the accu
racy of the obtained predictions. Mean Absolute Per
centage Error (MAPE) (Bowerman et al., 2004) was
intensively used until many other measures as Geo
metric Mean Relative Absolute Error, Median Rel
ative Absolute Error, Symmetric Median and Me
dian Absolute Percentage Error (MdAPE), or Sym
metric Mean Absolute Percentage Error were pro
posed (Makridakis and Hibon, 2000). However, a
disadvantage was found in these measures, they were
not generally applicable and can be inﬁnite, unde
ﬁned or can produce misleading results, as Hyndman
and Koehler explained in their work (Hyndman and
Koehler, 2006). Thus, they proposed Mean Absolute
Scaled Error (MASE) that is less sensitive to outliers,
less variable on small samples, and more easily inter
preted.
1
National Statistics Institute (http://www.ine.es/)
In this work, the measures used are MAPE (i.e.,
mean( p
t
)), MASE (deﬁned as mean( q
t
)), and
MdAPE (as median( p
t
) ), taking into account that
Y
t
is the observation at time t = 1, ...,n; F
t
is the fore
cast of Y
t
; e
t
is the forecast error (i.e. e
t
= Y
t
− F
t
);
p
t
= 100e
t
/Y
t
is the percentage error, and q
t
is deter
mined as:
q
t
=
e
t
1
n − 1
n
∑
i=2
 Y
i
−Y
i−1

Due to its stochastic nature, the results yielded by
LCoR have been calculated as the average errors
over 30 executions with every time series. For each
execution, the following parameters are used in the
LCoR algorithm: lags population size=50, lags pop
ulation generations=5, lags chromosome size=10%,
RBFNs population size=50, RBFNs population gen
erations=10, validation rate=0.25, maximum num
ber of neurons of ﬁrst generation=0.05, tournament
size=3, replacement rate=0.5, crossover rate=0.8, mu
tation rate=0.2, and total number of generations=20.
Tables 1, 2, and 3 show the results of the LCo
R and the utilized methods to compare (ETS, Cros
ton, Theta, RW, and ARIMA), for measures MAPE,
MASE, and MdAPE, respectively (best results are
emphasized with the character *). As mentioned be
fore, every result indicated in the tables represent the
average of 30 executions for each time series. With
respect to MAPE, the LCoR algorithm obtains the
best results in 15 of 20 time series used, as can be
seen in table 1. Regarding MASE, LCoR stands out
yielding the best results for 5 time series; ETS, Cros
ton and Theta for 3 time series; RW only for 2; and
ARIMA for 4 time series; as can be observed in ta
ble 2. Concerning MdAPE, LCoR acquires better
results than the other methods in 12 of 20 time series,
as table 3 shows. Thus, the LCoR algorithm is able
to achieve a more accurate forecast in the most time
series for any of the quality measures considered.
4.2 Analysis of the Results
To analyze in more detail the results and check
whether the observed differences are signiﬁcant, two
main steps are performed: ﬁrstly, identifying whether
exist differences in general between the methods used
in the comparison; and secondly, determining if the
best method is signiﬁcant better than the rest of the
methods. To do this, ﬁrst of all it has to be decided if
is possible to use parametric o nonparametric statisti
cal techniques. An adequate use of parametric statis
tical techniques reaching three necessary conditions:
independency, normality and homoscedasticity (She
skin, 2004).
TheLCoRCoevolutionaryAlgorithmAComparativeAnalysisinMediumtermTimeseriesForecastingProblems
147