to avoid getting biased results.
Table 1 shows the classification error (the smaller
the better) in a 1−NN classification task of the
adaptive sampling method applied to DFT (c.f.
Section 1), and also applied using DTW and the
Euclidean distance. The experiments are conducted
for different compression ratios ρ, where the ρ = 0%
indicates no adaptive sampling is performed (the
method is turned off)
The results show that DTW is adapted to the
classification task in question. Adaptive sampling
gave acceptable results even for compression ratios
between 25% and 50%. For dataset ECG the results
were quite acceptable even for a very high
compression ratio (ρ = 90%).
ED was also adapted to adaptive sampling as the
classification error was in general acceptable for
compression ratio of 50%.
When applying adaptive sampling to DFT, the
results were always better than the original method
for all datasets and for all compression ratios.
An interesting phenomenon that we noticed is that
in many cases, applying adaptive sampling for a
compression ratio of 5% gave better results than the
raw data themselves. We believe the reason for this is
that compassion has a positive effect of smoothing the
data.
We also conducted k-means clustering
experiments on the same datasets and for the same
compression ratios. Table 2 shows the k-means
clustering quality (the larger the better) of the datasets
we tested. As we can see from Table 2, the results of
the k-means clustering are similar to those of 1−NN
classification. They show that DTW is the most
adapted method for the k-means clustering task and
again the adaptive sampling yielded acceptable
results even for compression ratios between 25% and
50% for almost all the datasets tested. The results,
however, degraded in most cases for high
compression ratios.
ED was also adapted to adaptive sampling as the
quality of k-means clustering was still acceptable
even for a compression ratio of 50%.
As was the case with classification, adaptive
sampling improved the performance of DFT for all
datasets and for all compression ratios. When
applying adaptive sampling to DFT, the results were
always better than the original method for all
compression ratios and for all compression ratios.
The smoothing effect that appeared in the
classification task experiments for a compression
ratio of 5% also appeared in the k-means clustering
experiments.
4 CONCLUSIONS
In this paper, we conducted extensive experiments on
the adaptive sampling method of time series in 1−
NN classification and k-means clustering tasks. These
experiments were conducted on a variety of time
series datasets, using the Euclidean distance, the
dynamic time warping, and the discrete Fourier
transform (DFT). The output of our experiments
shows that even when using high compression ratios,
the performance of the adaptive sampling method is
still acceptable in the two aforementioned time series
data mining tasks. In some cases, the adaptive
sampling method yielded acceptable results even for
a high compression ratio.
In the future, we intend to study the impact of
adaptive sampling on other time series data mining
tasks and also to compare it with other time series
dimensionality reduction techniques.
REFERENCES
Agrawal, R., Faloutsos, C., & Swami, A. (1993): Efficient
similarity search in sequence databases. Proceedings of
the 4th Conf. on Foundations of Data Organization and
Algorithms.
Agrawal, R., Lin, K. I., Sawhney, H. S. and Shim, K.
(1995): Fast similarity search in the presence of noise,
scaling, and translation in time-series databases. In
Proceedings of the 21st Int'l Conference on Very Large
Databases. Zurich, Switzerland.
Bellman, R., (1957): Dynamic programming. Princeton
University Press, Princeton, NJ.
Bunke, H., Kraetzl, M. (2003): Classification and detection
of abnormal events in time series of graphs. In: Last,
M., Kandel, A., Bunke, H. (eds.): Data mining in time
series databases. World Scientific.
Chen,Y., Keogh, E., Hu, B., Begum, N., Bagnall, A.,
Mueen, A., and Batista, G. (2015). The UCR Time
Series Classification Archive. URL.
www.cs.ucr.edu/~eamonn/time_series_data.
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., and
Keogh, E. (2008): Querying and mining of time series
data: experimental comparison of representations and
distance measures. In Proc of the 34th VLDB.
Faloutsos, C., Ranganathan, M., and Manolopoulos, Y.
(1994): Fast subsequence matching in time-series
databases. In Proc. ACM SIGMOD Conf., Minneapolis.
Gorunescu, F. (2006): Data mining: concepts, models and
techniques, Blue Publishing House, Cluj-Napoca.
Guo, A.Y., and Siegelmann, H. (2004): Time-warped
longest common subsequence algorithm for music
retrieval, in Proc. ISMIR.
Keogh, E., Chakrabarti, K., Pazzani, M. & Mehrotra,S.
(2000): Dimensionality reduction for fast similarity