The Longest Common Subsequence Distance using a Complexity Factor

Octavian Lucian Hasna, Rodica Potolea

Abstract

In this paper we study the classic longest common subsequence problem and we use the length of the longest common subsequence as a similarity measure between two time series. We propose an original algorithm for computing the approximate length of the LCSS that uses a discretization step, a complexity invariant factor and a dynamic threshold used for skipping the computation.

References

  1. Bagnall, A., Bostrom, A., Large, J., and Lines, J. (2016). The Great Time Series Classification Bake Off: An Experimental Evaluation of Recently Proposed Algorithms. Extended Version.
  2. Batista, G. E., Wang, X., and Keogh, E. J. (2011). A Complexity-Invariant Distance Measure for Time Series. In SDM, volume 11, pages 699-710. SIAM.
  3. Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., and Batista, G. (2015). The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time series data/.
  4. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1):10-18.
  5. Hasna, O. L. (2015). The time series math library. github.com/octavian-h/time-series-math/.
  6. Hirschberg, D. S. (1975). A linear space algorithm for computing maximal common subsequences. 18(6):341- 343.
  7. Itakura, F. (1975). Minimum prediction residual principle applied to speech recognition. 23(1):67-72.
  8. Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large time series databases. 3(3):263-286.
  9. Lin, J., Keogh, E., Lonardi, S., and Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 2-11. ACM.
  10. Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., and Keogh, E. (2012). Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 262-270. ACM.
  11. Sakoe, H. and Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. 26(1):43-49.
  12. Vlachos, M., Kollios, G., and Gunopulos, D. (2002). Discovering similar multidimensional trajectories. In Data Engineering, 2002. Proceedings. 18th International Conference on, pages 673-684. IEEE.
Download


Paper Citation


in Harvard Style

Hasna O. and Potolea R. (2016). The Longest Common Subsequence Distance using a Complexity Factor . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 336-343. DOI: 10.5220/0006067603360343


in Bibtex Style

@conference{kdir16,
author={Octavian Lucian Hasna and Rodica Potolea},
title={The Longest Common Subsequence Distance using a Complexity Factor},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},
year={2016},
pages={336-343},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006067603360343},
isbn={978-989-758-203-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - The Longest Common Subsequence Distance using a Complexity Factor
SN - 978-989-758-203-5
AU - Hasna O.
AU - Potolea R.
PY - 2016
SP - 336
EP - 343
DO - 10.5220/0006067603360343