HyperSAX: Fast Approximate Search of Multidimensional Data

Jens Emil Gydesen, Henrik Haxholm, Niels Sonnich Poulsen, Sebastian Wahl, Bo Thiesson

Abstract

The increasing amount and size of data makes indexing and searching more difficult. It is especially challenging for multidimensional data such as images, videos, etc. In this paper we introduce a new indexable symbolic data representation that allows us to efficiently index and retrieve from a large amount of data that may appear in multiple dimensions. We use an approximate lower bounding distance measure to compute the distance between multidimensional arrays, which allows us to perform fast similarity searches. We present two search methods, exact and approximate, which can quickly retrieve data using our representation. Our approach is very general and works for many types of multidimensional data, including different types of image representations. Even for millions of multidimensional arrays, the approximate search will find a result in a few milliseconds, and will in many cases return a result similar to the best match.

References

  1. André-Jönsson, H. (2002). Indexing Strategies for Time Series Data. Department of Computer and Information Science, Linköpings universitet.
  2. Bach, J. R., Fuller, C., Gupta, A., Hampapur, A., Horowitz, B., Humphrey, R., Jain, R., and Shu, C.-F. (1996). Virage image search engine: an open framework for image management. In Sethi, I. K. and Jain, R. C., editors, Storage and Retrieval for Still Image and Video Databases IV, volume 2670 of Society of PhotoOptical Instrumentation Engineers (SPIE) Conference Series, pages 76-87.
  3. Camerra, A., Palpanas, T., Shieh, J., and Keogh, E. (2010). iSAX 2.0: Indexing and mining one billion time series. In Proceedings of the 2010 IEEE International Conference on Data Mining, pages 58-67. IEEE Computer Society.
  4. Camerra, A., Shieh, J., Palpanas, T., Rakthanmanon, T., and Keogh, E. J. (2014). Beyond one billion time series: indexing and mining very large time series collections with iSAX2+. Knowl. Inf. Syst., 39(1):123-151.
  5. Cheng, S.-C. and Wu, T.-L. (2006). Speeding up the similarity search in high-dimensional image database by multiscale filtering and dynamic programming. Image Vision Comput., 24(5):424-435.
  6. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and FeiFei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. In Computer Vision and Pattern Recognition.
  7. Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. (1994). Fast subsequence matching in time-series databases. SIGMOD Rec., 23(2):419-429.
  8. Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., and Yanker, P. (1995). Query by image and video content: the QBIC system. Computer, 28(9):23-32.
  9. Gaede, V. and Günther, O. (1998). Multidimensional access methods. ACM Comput. Surv., 30(2):170-231.
  10. Jain, A. K. and Vailaya, A. (1996). Image retrieval using color and shape. Pattern Recognition, 29(8):1233 - 1244.
  11. Kasson, J. M. and Plouffe, W. (1992). An analysis of selected computer interchange color spaces. ACM Trans. Graph., 11(4):373-405.
  12. Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra, S. (2001a). Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst., 3(3):263-286.
  13. Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra, S. (2001b). Locally adaptive dimensionality reduction for indexing large time series databases. SIGMOD Rec., 30(2):151-162.
  14. Lin, J., Keogh, E., Lonardi, S., and Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 2-11. ACM.
  15. Shieh, J. and Keogh, E. (2008). iSAX: Indexing and mining terabyte sized time series. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 623-631. ACM.
  16. Torralba, A., Fergus, R., and Freeman, W. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958-1970.
  17. Yi, B.-K. and Faloutsos, C. (2000). Fast time sequence indexing for arbitrary Lp norms. In Proceedings of the 26th International Conference on Very Large Data Bases, VLDB 7800, pages 385-394, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
Download


Paper Citation


in Harvard Style

Gydesen J., Haxholm H., Poulsen N., Wahl S. and Thiesson B. (2015). HyperSAX: Fast Approximate Search of Multidimensional Data . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 190-198. DOI: 10.5220/0005185201900198


in Bibtex Style

@conference{icpram15,
author={Jens Emil Gydesen and Henrik Haxholm and Niels Sonnich Poulsen and Sebastian Wahl and Bo Thiesson},
title={HyperSAX: Fast Approximate Search of Multidimensional Data},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={190-198},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005185201900198},
isbn={978-989-758-076-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - HyperSAX: Fast Approximate Search of Multidimensional Data
SN - 978-989-758-076-5
AU - Gydesen J.
AU - Haxholm H.
AU - Poulsen N.
AU - Wahl S.
AU - Thiesson B.
PY - 2015
SP - 190
EP - 198
DO - 10.5220/0005185201900198