4 EXPERIMENTATION AND
STATISTICAL STUDY
The main goal of the experiments is to study the be-
havior of the algorithm L-Co-R comparing with other
5 methods found in the literature and for 3 different
quality measures.
4.1 Experimental Methodology
As in (Parras-Gutierrez et al., 2012), the experimen-
tation has been carried out using 20 data bases, most
of then taken from the INE
1
. The data represent ob-
servations from different activities and have differ-
ent nature, size, and characteristics. The data bases
have been labeled as: Airline, WmFrancfort, Wm-
London, WmMadrid, WmMilan, WmNewYork, Wm-
Tokyo, Deceases, SpaMovSpec, Exchange, Gasoline,
MortCanc, MortMade, Books, FreeHouPrize, Prison-
ers, TurIn, TurOut, TUrban, and HouseFin. The num-
ber of samples in every database is between 43 (for
MortCanc) and 618 (for Gasoline, a database used in
the NN3 competition).
To compare the effectiveness of L-Co-R, 5 ad-
ditional methods have been used, all of them found
within the field of time series forecasting: Exponen-
tial smoothing method (ETS), Croston, Theta, Ran-
dom Walk (RW), and ARIMA (Hyndman and Khan-
dakar, 2008).
In order to test and compare the generalization ca-
pabilities of every method, databases have been split
into training and test sets. Training sets have been
given the first 75% of the data, while test sets are com-
posed by the remaining 25% samples.
An open question when dealing with time series is
the measure to be used in order to calculate the accu-
racy of the obtained predictions. Mean Absolute Per-
centage Error (MAPE) (Bowerman et al., 2004) was
intensively used until many other measures as Geo-
metric Mean Relative Absolute Error, Median Rel-
ative Absolute Error, Symmetric Median and Me-
dian Absolute Percentage Error (MdAPE), or Sym-
metric Mean Absolute Percentage Error were pro-
posed (Makridakis and Hibon, 2000). However, a
disadvantage was found in these measures, they were
not generally applicable and can be infinite, unde-
fined or can produce misleading results, as Hyndman
and Koehler explained in their work (Hyndman and
Koehler, 2006). Thus, they proposed Mean Absolute
Scaled Error (MASE) that is less sensitive to outliers,
less variable on small samples, and more easily inter-
preted.
1
National Statistics Institute (http://www.ine.es/)
In this work, the measures used are MAPE (i.e.,
mean(| p
t
|)), MASE (defined as mean(| q
t
|)), and
MdAPE (as median(| p
t
|) ), taking into account that
Y
t
is the observation at time t = 1, ...,n; F
t
is the fore-
cast of Y
t
; e
t
is the forecast error (i.e. e
t
= Y
t
− F
t
);
p
t
= 100e
t
/Y
t
is the percentage error, and q
t
is deter-
mined as:
q
t
=
e
t
1
n − 1
n
∑
i=2
| Y
i
−Y
i−1
|
Due to its stochastic nature, the results yielded by
L-Co-R have been calculated as the average errors
over 30 executions with every time series. For each
execution, the following parameters are used in the
L-Co-R algorithm: lags population size=50, lags pop-
ulation generations=5, lags chromosome size=10%,
RBFNs population size=50, RBFNs population gen-
erations=10, validation rate=0.25, maximum num-
ber of neurons of first generation=0.05, tournament
size=3, replacement rate=0.5, crossover rate=0.8, mu-
tation rate=0.2, and total number of generations=20.
Tables 1, 2, and 3 show the results of the L-Co-
R and the utilized methods to compare (ETS, Cros-
ton, Theta, RW, and ARIMA), for measures MAPE,
MASE, and MdAPE, respectively (best results are
emphasized with the character *). As mentioned be-
fore, every result indicated in the tables represent the
average of 30 executions for each time series. With
respect to MAPE, the L-Co-R algorithm obtains the
best results in 15 of 20 time series used, as can be
seen in table 1. Regarding MASE, L-Co-R stands out
yielding the best results for 5 time series; ETS, Cros-
ton and Theta for 3 time series; RW only for 2; and
ARIMA for 4 time series; as can be observed in ta-
ble 2. Concerning MdAPE, L-Co-R acquires better
results than the other methods in 12 of 20 time series,
as table 3 shows. Thus, the L-Co-R algorithm is able
to achieve a more accurate forecast in the most time
series for any of the quality measures considered.
4.2 Analysis of the Results
To analyze in more detail the results and check
whether the observed differences are significant, two
main steps are performed: firstly, identifying whether
exist differences in general between the methods used
in the comparison; and secondly, determining if the
best method is significant better than the rest of the
methods. To do this, first of all it has to be decided if
is possible to use parametric o non-parametric statisti-
cal techniques. An adequate use of parametric statis-
tical techniques reaching three necessary conditions:
independency, normality and homoscedasticity (She-
skin, 2004).
TheL-Co-RCo-evolutionaryAlgorithm-AComparativeAnalysisinMedium-termTime-seriesForecastingProblems
147