model is useful in situations where the database is a
set of time series that have seasonal periods that occur
with the same time intensity (either time, day, month
or year). Already ANN use methods that simulate the
problem solving ability of human brains in informa-
tion systems (Kraft et al., 2003). Both algorithms
have good applicability and assertiveness in future
prediction systems, and the union of the two methods
can assist in the prediction of both seasonal (using
SARIMA) and atypical (using ANN) situations.
For setting the SARIMA parameters, the
Autocorrelation (ACF) and Partial Autocorrelation
(PACF) Functions were used, thus defining the
best order and seasonal order parameters for each
grouping within the time base characteristics. The
ANN was implemented using a sequential model of
the Tensor Flow package with 1000 neurons in the
first layer and 100 in the second, using the total of
2000 epochs, inputting dates that change the seasonal
component of the series, such as holidays, recesses
and events in the city, generating a quantitative
output. Both algorithms were implemented in Python
and the ANN settings were defined from empirical
tests.
The databases were subjected to a seasonal de-
composition step. From the calculations performed at
this stage, a seasonal period of 7 days was determined
for the bases. This period was used in the SARIMA
model. From observations in the databases, it was
also possible to determine changes in the incidence
of occurrences on holidays, optional points and dates
that occurred special events (such as football games,
concerts and events), being called special events. This
information, along with the day, month, year, and day
of the week, was used as input to ANN.
Step 5 - Results Interpretation and
Evaluation
From the results obtained by the SARIMA method
and the ANN method, a model for the union of these
results was proposed. We used values found by ANN
on dates that differed from the linear component of
the series, (special events), and the results of the
SARIMA method for occurrences on normal days.
The use of this approach provided a gain in the
assertiveness of the proposed method, where it takes
into account dates whose seasonality is not effective.
To perform the tests and validate the results,
predictions were made for the events of November
2016 and December 2016 in all predetermined sub-
groups. The results were compared with the actual
values of the data.
The results interpretation and analysis is funda-
mental for the knowledge extraction process. For
this, two evaluation parameters were used: the
assertiveness of the algorithm and the Root Mean
Square Deviation (RMSD). Assertiveness is the
percentage that represents the proximity of the
prediction to the real value, and its formula is pre-
sented in Equation 1:
δ = (1 − |1 − P
i
/O
i
|) ∗ 100 (1)
where δ represents assertiveness, P
i
represents
predicted value and O
i
represents actual value. This
formula is derived from Equation 2 and it normalizes
the values within the range of 0% to 100%.
δ = (P
i
/O
i
) ∗ 100 (2)
According to Willmott (1982), RMSD is one of
the best general measures of model performance and
its error value is presented in the same dimensions as
the analyzed variable. The RMSD measure is given
by Equation 3:
RMSD =
s
1
n
n
∑
i=1
(P
i
− O
i
)
2
(3)
where P
i
is the predicted value, O
i
is the actual
observed value, and n is the amount of values
analyzed. The closer the RMSD result is to 0, the
greater the assertiveness of the algorithm.
Table 5 illustrates the assertiveness and RMSD
of the mathematical model SARIMA, ANN and
the union of the two models in each database
(representing each region of the city of S
˜
ao Paulo)
from the tests performed for the month of November.
The results were satisfactory, with the highest
assertiveness 86.83% (C15) and the best RMSD 0.81
(C12), and the average assertiveness of the 15 clusters
83.12% and the average of RMSD 1.75.
To prove the results found in the tests carried
out for the month of November, the same tests
were carried out for the month of December. Table
6 illustrates the assertiveness and RMSD of the
mathematical model SARIMA, ANN and the union
of the two models in each database of S
˜
ao Paulo
from the tests performed for the month of Decem-
ber. The results were satisfactory, with the highest
assertiveness 85.41% (C13) and the best RMSD 1.25
(C12), and the average assertiveness of the 15 clusters
76.68% and the average of RMSD 2.16.
The results using the union of the SARIMA
mathematical model with ANN (SARIMA+ANN)
showed (for almost all cases with exceptions in only
two of them) better results, both in the assertiveness
and in the RMSD compared to the results using only
the mathematical model SARIMA or only ANN.
ICEIS 2020 - 22nd International Conference on Enterprise Information Systems
412