TOWARDS A FASTER SYMBOLIC AGGREGATE APPROXIMATION METHOD

Muhammad Marwan Muhammad Fuad, Pierre-François Marteau

2010

Abstract

The similarity search problem is one of the main problems in time series data mining. Traditionally, this problem was tackled by sequentially comparing the given query against all the time series in the database, and returning all the time series that are within a predetermined threshold of that query. But the large size and the high dimensionality of time series databases that are in use nowadays make that scenario inefficient. There are many representation techniques that aim at reducing the dimensionality of time series so that the search can be handled faster at a lower-dimensional space level. The symbolic aggregate approximation (SAX) is one of the most competitive methods in the literature. In this paper we present a new method that improves the performance of SAX by adding to it another exclusion condition that increases the exclusion power. This method is based on using two representations of the time series: one of SAX and the other is based on an optimal approximation of the time series. Pre-computed distances are calculated and stored offline to be used online to exclude a wide range of the search space using two exclusion conditions. We conduct experiments which show that the new method is faster than SAX.

References

  1. Agrawal, R., Faloutsos, C., & Swami, A. 1993: Efficient Similarity Search in Sequence Databases. Proceedings of the 4th Conf. on Foundations of Data Organization and Algorithms.
  2. Agrawal, R., Lin, K. I., Sawhney, H. S. and Shim. 1995: Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-series Databases, in Proceedings of the 21st Int'l Conference on Very Large Databases. Zurich, Switzerland, pp. 490-501.
  3. Cai, Y. and Ng, R. 2004: Indexing Spatio-temporal Trajectories with Chebyshev Polynomials. In SIGMOD.
  4. Chan, K. & Fu, A. W. 1999: Efficient Time Series Matching by Wavelets. In proc. of the 15th IEEE Int'l Conf. on Data Engineering. Sydney, Australia, Mar 23-26. pp 126-133..
  5. Jessica Lin, Eamonn J. Keogh, Stefano Lonardi, Bill Yuan-chi Chiu. 2003: A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. DMKD 2003: 2-11.
  6. Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra. 2000: Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. J. of Know. and Inform. Sys.
  7. Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra. 2001: Locally Adaptive Dimensionality Reduction for Similarity Search in Large Time Series Databases. SIGMOD pp 151-162 .
  8. Korn, F., Jagadish, H & Faloutsos. C. 1997: Efficiently Supporting ad hoc Queries in Large Datasets of Time Sequences. Proceedings of SIGMOD 7897, Tucson, AZ, pp 289-300.
  9. Larsen, R. J. & Marx, M. L. 1986. An Introduction to Mathematical Statistics and Its Applications. Prentice Hall, Englewood, Cliffs, N.J. 2nd Edition.
  10. Morinaka, Y., Yoshikawa, M. , Amagasa, T., and Uemura, S 2001: The L-index: An Indexing Structure for Efficient Subsequence Matching in Time Sequence Databases. In Proc. 5th PacificAisa Conf. on Knowledge Discovery and Data Mining, pages 51-60 .
  11. Muhammad Fuad, M.M. and Marteau, P.F. 2008: The Extended Edit Distance Metric, Sixth International Workshop on Content-Based Multimedia Indexing (CBMI 2008) 18-20th June, 2008, London, UK.
  12. Schulte,M. J. ,Lindberg, M. and Laxminarain, A. 2005: Performance Evaluation of Decimal Floating-point Arithmetic in IBM Austin Center for Advanced Studies Conference, February
  13. Yi, B. K., & Faloutsos, C. 2000: Fast Time Sequence Indexing for Arbitrary Lp norms. Proceedings of the 26st International Conference on Very Large Databases, Cairo, Egypt .
Download


Paper Citation


in Harvard Style

Muhammad Fuad M. and Marteau P. (2010). TOWARDS A FASTER SYMBOLIC AGGREGATE APPROXIMATION METHOD . In Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT, ISBN 978-989-8425-22-5, pages 305-310. DOI: 10.5220/0003006703050310


in Bibtex Style

@conference{icsoft10,
author={Muhammad Marwan Muhammad Fuad and Pierre-François Marteau},
title={TOWARDS A FASTER SYMBOLIC AGGREGATE APPROXIMATION METHOD},
booktitle={Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,},
year={2010},
pages={305-310},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003006703050310},
isbn={978-989-8425-22-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,
TI - TOWARDS A FASTER SYMBOLIC AGGREGATE APPROXIMATION METHOD
SN - 978-989-8425-22-5
AU - Muhammad Fuad M.
AU - Marteau P.
PY - 2010
SP - 305
EP - 310
DO - 10.5220/0003006703050310