Data Driven Structural Similarity - A Distance Measure for Adaptive Linear Approximations of Time Series

Victor Ionescu, Rodica Potolea, Mihaela Dinsoreanu

Abstract

Much effort has been invested in recent years in the problem of detecting similarity in time series. Most work focuses on the identification of exact matches through point-by-point comparisons, although in many real-world problems recurring patterns match each other only approximately. We introduce a new approach for identifying patterns in time series, which evaluates the similarity by comparing the overall structure of candidate sequences instead of focusing on the local shapes of the sequence and propose a new distance measure ABC (Area Between Curves) that is used to achieve this goal. The approach is based on a data-driven linear approximation method that is intuitive, offers a high compression ratio and adapts to the overall shape of the sequence. The similarity of candidate sequences is quantified by means of the novel distance measure, applied directly to the linear approximation of the time series. Our evaluations performed on multiple data sets show that our proposed technique outperforms similarity search approaches based on the commonly referenced Euclidean Distance in the majority of cases. The most significant improvements are obtained when applying our method to domains and data sets where matching sequences are indeed primarily determined based on the similarity of their higher-level structures.

References

  1. Aghabozorgi, S. and Teh, Y. W., 2014. Stock market comovement assessment using a three-phase clustering method. Expert Systems with Applications, 41(4), pp. 1301-1314.
  2. Agrawal, R., Faloutsos, C. and Arun, S., 1993. Efficient Similarity Search in sequence databases. Proceedings of the 4th international Conference on Foundations of Data Organization and Algorithms, pp. 69-84.
  3. Batista, G., Keogh, E., Tataw, O. M. and de Souza, V. M. A., 2014. CID: an efficient complexity-invariant distance for time series. Data Mining and Knowledge Discovery, 28(3), pp. 634-669.
  4. Chen, Q. et al., 2007. Indexable PLA for Efficient Similarity Search. Proceedings of the 33rd international conference on Very Large Data Bases, pp. 435-446.
  5. Faloutsos, C., Ranganathan, M. and Manolopoulos, Y., 1994. Fast subsequence matching in time-series databases. Proceedings of the 1994 Annual ACM SIGMOD Conference, pp. 419-429.
  6. Fulcher, B. D. and Jones, N. S., 2014. Highly Comparative Feature-Based Time-Series Classification. IEEE Transactions on Knowledge and Data Engineering, 26(12), pp. 3026-3037.
  7. Fu, T.-c., 2011. A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1), pp. 164-181.
  8. Fu, T.-c., Chung, F.-l., Lunk, R. and Ng, C.-m., 2005. Preventing meaningless Stock Time Series Pattern Discovery by Changing Perceptually Important Point Detection. Fuzzy Systems and Knowledge Discovery, pp. 1171-1174.
  9. Keogh, E., 2003. Probabilistic Discovery of Time Series Motifs. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 493-498.
  10. Keogh, E., Chakrabarti, K., Pazzani, M. and Mehrotra, S., 2001. Dimensionality Reduction for fast similarity search in large time series databases. Knowledge and information Systems, Volume 3, pp. 263-286.
  11. Keogh, E., Chakrabarti, K., Pazzani, M. and Mehrotra, S., 2001. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM SIGMOD Record, Volume 30, pp. 151-162.
  12. Keogh, E., Chu, S., Hart, D. and Pazzani, M., 2001. An online algorithm for segmenting time series. Proceedings IEEE International Conference on Data Mining, pp. 289-296.
  13. Keogh, E. and Pazzani, M., 2001. Derivative Dynamic Time Warping. SDM, Volume 1, pp. 5-7.
  14. Keogh, E., Zhu, Q., Hu, B., Hao. Y., Xi, X., Wei, L. and Ratanamahatana, C. A., 2011. The UCR Time Series Classification/Clustering Homepage
  15. Lines, J. and Bagnall, A., 2014. Time series classification with ensembles of elastic distance measures. Data Mining and Knowledge Discovery, 29(3), pp. 565-592.
  16. Lin, J., Khade, R. and Li, Y., 2012. Rotation-invariant similarity in time series using bag-of-patterns representation. Journal of Intelligent Information Systems, 38(2), pp. 287-315.
  17. Lin, J., Williamson, S., Borne, K. and DeBarr, D., 2012. Pattern Recognition in Time Series. Advances in Machine Learning and Data Mining for Astronomy, Volume 1, pp. 617-645.
  18. Olszewski, R., 2001. Generalized Feature Extraction for Structural Pattern Matching in Time-Series Data.
  19. Pazzani, M. and Keogh, E., 1998. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. KDD, Volume 98, pp. 239-243.
  20. Ratanamahatana, C. A. and Keogh, E., 2005. Exact indexing of dynamic time warping. Knowledge and information systems, 7(3), pp. 358-386.
  21. Shatkay, H. and Zdonik, S., 1996. Approximate Queries and Representation for Large Data Sequences. Proceedings of the Twelfth International Conference on Data Engineering, pp. 536-545.
  22. Shieh, J. and Keogh, E., 2008. iSAX: Indexing and Mining Terabyte Sized Time Series. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 623-631.
  23. Toshniwal, D. and Joshi, R. C., 2005. Similarity Search in Time Series Data Using Time Weighted Slopes. Informatica, 29(1).
  24. Wang, X. et al., 2013. Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery, pp. 275-309.
Download


Paper Citation


in Harvard Style

Ionescu V., Potolea R. and Dinsoreanu M. (2015). Data Driven Structural Similarity - A Distance Measure for Adaptive Linear Approximations of Time Series . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 67-74. DOI: 10.5220/0005597400670074


in Bibtex Style

@conference{kdir15,
author={Victor Ionescu and Rodica Potolea and Mihaela Dinsoreanu},
title={Data Driven Structural Similarity - A Distance Measure for Adaptive Linear Approximations of Time Series},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={67-74},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005597400670074},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - Data Driven Structural Similarity - A Distance Measure for Adaptive Linear Approximations of Time Series
SN - 978-989-758-158-8
AU - Ionescu V.
AU - Potolea R.
AU - Dinsoreanu M.
PY - 2015
SP - 67
EP - 74
DO - 10.5220/0005597400670074