and the parameter d represents the order of non sea-
sonal integration necessary to obtain a stationary time
series. The parameters p, q and d are commonly used
for referring to non-seasonal models in a concise way.
For more complex models, in which similar pattern
at regular time intervals can be observed, it is more
realistic to take seasonality under consideration. A
data set comprised of food sales on large-scale stores
is typically affected by a weekly seasonality, which
reflects the customers’ habit to buy foods especially
in the weekend. The seasonal component is defined
by the parameters P, D and Q, where P defines the
order of the autoregressive non-seasonal component
SAR, Q defines the order of the moving average non-
seasonal component SMA, and D is the order of sea-
sonal differences. Finally, s defines the series’ sea-
sonality. A seasonal ARIMA model is synthetically
described as ARIMA(p,d,q) × (P,D,Q)
s
. The most
critical disadvantage of classical ARIMA models with
seasonality is that the effect of exogenous variables on
data is not taken into account. In the following sec-
tions we show how to cope with this issue, and to this
end we present two alternative forecasting models.
According to (Box et al., 2008), given a data set,
the best forecasting model can be identified according
to the following framework:
• model identification,
• model estimation,
• diagnostic check.
In the literature the model identification is imple-
mented through either an incremental approach or an
exhaustive one. In the first approach the value of
the parameters defining the ARIMA model are iter-
atively incremented and statistical significance tests
are performed for halting. The simplest way to im-
plement this approach is to define the parameter d as
follows: it starts setting d = 0 and testing the time se-
ries stationarity by statistical tests; based on the result
of the latter, either d is incremented or the process is
halted. For a complete application of the incremental
approach see, for instance, (Andrews et al., 2013). On
the contrary, in the exhaustive approach each param-
eter ranges in predefined intervals; see for instance
(H
¨
oglund and
¨
Ostermark, 1991). The main advantage
of this approach is that a larger set of combinations,
i.e., forecasting models, are compared and the result-
ing model is more accurate. However, the computa-
tional effort required may be larger, since for every
ARIMA model a set of statistical tests and analyses
have to be performed. In the proposed forecasting
models, we implemented the incremental approach
for parameters p, d, q, P, D and Q and we set sea-
sonality s = 7.
For each tuple (p,d,q) × (P,D, Q)
s
the maximum
likelihood principle is adopted for model parame-
ters’ estimation. Finally the diagnostic check of the
forecasting model is implemented by means of two
kinds of performance indicators, in-sample and out-
of-sample, that are used to determine the best model.
In the following section we analyze the diagnostic
check phase in more detail and provide a complete
description of the performance indicators used within
our forecasting models.
4 PERFORMANCE INDICATORS
In this section we describe a set of statistical indica-
tors used to assess the forecasting quality of the mod-
els. These indicators can be divided into two groups,
in-sample and out-of-sample indicators, according to
the set of data used for computing them. For the sake
of clearness, we describe the latter separately as their
meaning, as well as their use, is different within the
proposed forecasting models.
4.1 In-sample Indicators
This subset includes indicators that are computed on
the training set as defined in Section 2. These are
mostly used as lack of fit measures, based on the in-
formation entropy and parsimony of models. Thus,
in-sample analysis has the objective to measure the
matching between real data and simulated data ob-
tained by the mathematical model under analysis.
We computed two different indicators: the Ljiung-
Box test and the Hannan-Quinn Information Criterion
(HQC) (Box et al., 2008), (Burnham and Anderson,
2002). The Ljiung-Box test is a a portmanteau test in
which the null hypothesis is that the first m autocorre-
lations of the residuals r
h
are zero, i.e. they are like a
white process noise. The statistical test applied in this
study is
Q(m) = n(n + 2)
∑
m
h=1
r
2
h
n − h
, (9)
which follows a χ
2
(m − K) distribution with m − K
degrees of freedom, where K is the number of param-
eters estimated within the model and n is the num-
ber of observations in the test set. The Hannan-Quinn
Information Criterion (HQC) is a well known crite-
rion used to quantify the entropy of the information
and the information lost in the fitting process. Under
the assumption that the residuals are independent and
identically distributed,
HQC = n log(SSR/n) + 2K loglog(n), (10)
SalesForecastingModelsintheFreshFoodSupplyChain
421