.
.
.
p
Input Recurrent tanh layer Output linear layer
D
b
1
a
2
(k) = y
a
2
(k) = LW
2,1
a
1
(k) + b
2
a
1
(k) = tanh(IW
1,1
p + LW
1,1
a
1
(k − 1 + b
1
))
a
1
(k − 1)
b
2
a
1
(k)
11
IW
1,1
LW
1,1
LW
2,1
.
Figure 3: Elman network generic structure.
proper correction), if desired, any new observation of
the monitored time series.
Data Monitoring comprises two main phases. In
the first one (which is an off-line phase), a sufficient
number of past observations of the series are used
to adjust a pre-processing algorithm, that will be de-
scribed in the next subsection. These pre-processed
observations are then used to train a neural forecast-
ing model, using both feed-forward and recurrent net-
work architectures as described in the previous sec-
tion. The trained neural model is considered to be
adjusted when the forecasting error for the test set
(composed by the last observations of the available
data) lay within a previously specified value for the
individual time series under analysis. In this phase,
the error figure used is the MAPE (Medium Absolute
Percentage Error).
The second phase works on-line. Each monitored
series has already had an adjusted model ready to re-
ceive a new observation and to test its validity. This
validation test will be detailed in the subsection 4.2.
4.1 Semi-automatic Pre-processing
Algorithm
The pre-processing phase has a great importance and
much of the monitoring success depends on it.
First of all, one must detect if the time series trend
is stochastic or deterministic. We perform this ver-
ification by applying a combination of Augmented
Dickey-Fuller (ADF) (Dickey and Fuller, 1979) and
Phillips-Perron (Phillips, 1987) tests. The resulting
test detects whether or not there are unit roots in the
generating process of the time series. If they exist, it
means that the trend is stochastic and that one must
take the first difference of the series n times in order
to make it stationary, being n the number of unit roots
detected (that is, the order of integration of the pro-
cess).
If the test detects no unit roots, one may conclude
that the trend is deterministic. In this case, we remove
the trend by performing a polynomial fitting. In both
stochastic or deterministic cases, we also test for the
presence of seasonal cycles (after trend removing), re-
N
greater than zero?
of unit roots
Is the number "n"
Verify the presence of any cycles
and remove them by spectral (FFT)
analysis or periodic diffrencing
Series is now pre−processed and
ready to be fed into the neural system
of the series "n" times
Take the first difference
Set "Stochastic Trend" flag "On"
Set "Deterministic Trend" flag "On"
Remove the series trend
by polynomial fitting
Perform Unit Root Test
Select a series
Y
Figure 4: Semi-automatic pre-processing algorithm to re-
move trends and possible cycles.
moving them by spectral (Fourier) analysis or by a
convenient periodic differences
1
(Chatfield, 1984).
This pre-processing phase is claimed to be semi-
automatic because it cannot exclude the intervention
of a human specialist, who inserts his or her knowl-
edge into the system, mainly for cycles removing.
Figure 4 shows these steps in a flow diagram.
4.2 Validation Test
The on-line phase of the monitoring system consists
of receiving new observations and testing their va-
lidity according to the previously developed model.
When a new observation x(T) arrives, we have al-
ready an estimated (predicted) value for it (ˆx(T)),
which was determined by the adjusted neural model.
Here, human intervention may also be necessary.
For example, new observations can be excluded by
direct action of a manager who feels that they are bi-
ased by some temporary condition that he or she is
aware of (and that the system cannot be able to track
on time). On the other hand, the system is designed to
automatically screen all current observations to iden-
tify those that appear unusual, that is, outliers. Each
outlier could be called to the attention of an appro-
priate management person, who would then decide
whether or not to include the observation in the fore-
casting process (Montgomery et al., 1990). In fact,
this outlier may have a reasonable origin, or may sim-
ply be the result of error.
Outliers can be identified by analyzing the fore-
cast error e(T) = x(T) − ˆx(T). If this error is large,
it may be concluded that the observation x(T) came
1
If a series is known to have a weekly seasonality (as
the case of electricity consume), it is more convenient to
remove this cycle by applying the (1− B
7
) operator, where
B is the delay operator, that is, Bx(t) = x(t −1).
NEURAL NETWORKS FOR DATA QUALITY MONITORING OF TIME SERIES
413