(and in the present case there are no sequences for
which the nnd is exactly 0). In most of the cases (usu-
ally around 99% of the sequences), the numerator of
Eq. 8 is zero. In Table 1 we show the results of the ap-
plication of the topologically approximated nnd pro-
file for other time series.
5 CONCLUSIONS AND FUTURE
WORKS
When analysing time series, an nnd profile can be
useful in order to characterize the properties of the
sequences, for example to find out possible anoma-
lies in the form of discords or recurrent sequences.
Unfortunately a full nnd profile requires calculations
which scale quadratically with the number of points.
In the present article we propose a procedure which
can speed up the process greatly. The idea followed
in this article is to exploit different kinds of neigh-
borhoods for each sequence in order to constrain the
calculations to search spaces where the probability to
find the exact neighbor is very high. The three topolo-
gies are: the one introduced by SAX, the time topol-
ogy and the Euclidean topology. The reduced search
space thus obtained allows one to skip most of the cal-
culations while retaining a good accuracy for the nnds
of each sequence.
This is a heuristic selection procedure, and so it is
not possible to provide exact bounds in term of com-
putational complexity. However, the experimental re-
sults we obtained on real data time series are very
interesting since the speed-ups in respect to a brute
force algorithm are between 1 and 2 orders of mag-
nitude. There is also a clear trend indicating that the
ratio between the brute force computational time and
the computational time obtained with the time topol-
ogy increases with the size of the time series under
observation and for this reason our approach becomes
particularly appealing with large time series.
In terms of accuracy, the time-approximated nnd
profiles are very close to the exact ones, in the cases
under observation for more than 98% of the sequences
the exact nnd has been found, while for those se-
quences for which just an approximate nnd has been
found, the values are close (≈ 10%) to the correct
ones. It should be emphasized that, due to the nature
of the time-approximated nnd profile (which essen-
tially extends the search space of the HOT SAX algo-
rithm) the results automatically include highest nnds
(corresponding to the discords of the time series).
An interesting result implied by exploiting the
time topology is Eq. 7, which expresses an upper
bound for the nnd of a sequence, once the nnd of a
time neighbor of that sequence is known. In practice
this implies that, once an approximate nnd profile has
been obtained, it is very easy (linear with the size of
the time series) to check if some of the approximate
nnds are particularly distant from their correct value.
In summary, this approach allows to diminish sig-
nificantly the amount of calculations needed to obtain
the nnd profile of a time series at a reasonable loss of
precision. In the present literature the state of the art
is represented by the algorithms of the Matrix Profile
series (Yeh et al., 2016), however they scale quadrat-
ically with the length of the time series and they also
provide information which might be difficult to uti-
lize (they allow to obtain the distance from all the se-
quences while often times only the nearest neighbors
play an important role in determining the properties
of a sequence).
Future works include the application of MASS al-
gorithm which exploits the Fast Fourier Transform
to speed up the calculation of the distances (Mueen
et al., 2017) which is at the basis of (Yeh et al., 2016),
and it is known to greatly speed up the calculation of
distances between sequences.
It is possible to exploit, with minor modifications,
the procedure provided by this work in order to obtain
approximations of the second, third,..., k-th neighbors
of a sequence and we are in the process of implement-
ing and testing them for an even more complete pro-
file of the main properties of each sequence.
We are also implementing the time topology to
speed up the calculation of discords.
REFERENCES
Avogadro, P., Palonca, L., and Dominoni, M. A. Online
anomaly search in time series: significant online dis-
cords. under review.
Bu, Y., Leung, T.-W., Fu, A. W.-C., Keogh, E., Pei, J., and
Meshkin, S. WAT: Finding Top-K Discords in Time
Series Database, pages 449–454.
Chandola, V., Banerjee, A., and Kumar, V. (2009).
Anomaly detection: A survey. ACM Comput. Surv.,
41(3):15:1–15:58.
Chiu, B., Keogh, E., and Lonardi, S. (2003). Probabilis-
tic discovery of time series motifs. In Proceedings
of the Ninth ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, KDD ’03,
pages 493–498, New York, NY, USA. ACM.
Goldberger, A. L., Amaral, L. A. N., Glass, L., Hausdorff,
J. M., Ivanov, P. C., Mark, R. G., Mietus, J. E.,
Moody, G. B., Peng, C.-K., and Stanley, H. E.
(2000 (June 13)). PhysioBank, PhysioToolkit, and
PhysioNet: Components of a new research resource
for complex physiologic signals. Circulation,
101(23):e215–e220. Circulation Electronic Pages:
Topological Approach for Finding Nearest Neighbor Sequence in Time Series
243