RANDOM SAMPLING ALGORITHMS FOR LANDMARK WINDOWS OVER DATA STREAMS

Zhang Longbo, Li Zhanhuai, Yu Min, Wang Yong, Jiang Yun

2006

Abstract

In many applications including sensor networks, telecommunications data management, network monitoring and financial applications, data arrives in a stream. There are growing interests in algorithms over data streams recently. This paper introduces the problem of sampling from landmark windows of recent data items from data streams and presents a random sampling algorithm for this problem. The presented algorithm, which is called SMS Algorithm, is a stratified multistage sampling algorithm for landmark window. It takes different sampling fraction in different strata of landmark window, and works even when the number of data items in the landmark window varies dramatically over time. The theoretic analysis and experiments show that the algorithm is effective and efficient for continuous data streams processing.

References

  1. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. Proceedings of 21st ACM SIGACTSIGMODSIGART Symp. on Principles of Database Systems, pages 1-16, Madison,Wisconsin, May 2002.
  2. L. Golab and M.T. Ozsu. Issues in data stream management. SIGMOD Record, Vol. 32, No. 2, 2003.
  3. Sirish Chandrasekaran and Michael J. Franklin. Streaming queries over streaming data. Proceedings of the 28th VLDB Conference, Hong Kong, China, 2002.
  4. D. J. Abadi, D. Carney, U. Cetintemel, et al. Aurora: a new model and architecture for data stream management. The VLDB Journal (2003) /Digital Object Identifier (DOI) 10.1007/s00778-003-0095-z
  5. Zhu Y, Shasha D. StatStream: Statistical monitoring of thousands of data streams in real time. Proceedings of the 28th Int'l VLDB Conference. Hong Kong, China, 2002. 358369.
  6. Vitter JS. Random sampling with a reservoir. ACM Trans. on Mathematical Software, 1985, 11(1): 37-57.
  7. Gibbons PB, Matias Y. New sampling- based summary statistics for improving approximate query answers. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. Seattle, Washington, United States. 1998.331 - 342.
  8. S. Guha, N. Koudas, K. Shim. Data Streams and Histograms. Symposiumon the Theory of Computing (STOC), July 2001.
  9. M Datar, A Gionis, P Indyk, et al. Maintaining stream statistics over sliding windows. The 13th Annual ACM-SIAM Symp on Discrete Algorithms, San Francisco, California, 2002.
  10. C Jermaine, A Pol, S Arumugam. Online Maintenance of very large random samples. SIGMOD 2004, June 13- 18, 2004, Paris, France.
  11. M. Datar. Algorithms for data stream systems. PhD thesis. 2003
  12. A. Das, J. Gehrke, M. Riedwald. Approximate join processing over data streams. SIGMOD 2003, June 9- 12, 2003, San Diego,CA
  13. T. Johnson, S. Muthukrishnan, I. Rozenbaum. Sampling Algorithms in a Stream Operator. SIGMOD Record 2005.
  14. C. Cranor, T. Johnson, O. Spatschnek, V. Shkapenyuk. Gogascope: A Stream Database for Network Applications. SIGMOD 2002, page 262, 2002.
  15. M. Greenwald and S. Khanna, Space-efficient online computation of quantile summaries, SIGMOD 2001.
  16. G. Manku and R. Motwani. Approximate frequency counts over data streams. Proceedings of VLDB, Hong Kong, China, 2002. 346-357.
  17. S. Guha, N. Koudas, K. Approximating a Data Stream for Querying and Estimation: Algorithm Performance Evaluation. Proceedings of the 18th International Conference on Data Engineering (ICDE.02). San Jose, California, USA, 2002.
  18. P.Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. Proceedings of the 27th VLDB conference, Roma, 2001. 541-550.
Download


Paper Citation


in Harvard Style

Longbo Z., Zhanhuai L., Min Y., Yong W. and Yun J. (2006). RANDOM SAMPLING ALGORITHMS FOR LANDMARK WINDOWS OVER DATA STREAMS . In Proceedings of the Eighth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-972-8865-41-2, pages 103-107. DOI: 10.5220/0002440501030107


in Bibtex Style

@conference{iceis06,
author={Zhang Longbo and Li Zhanhuai and Yu Min and Wang Yong and Jiang Yun},
title={RANDOM SAMPLING ALGORITHMS FOR LANDMARK WINDOWS OVER DATA STREAMS},
booktitle={Proceedings of the Eighth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2006},
pages={103-107},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002440501030107},
isbn={978-972-8865-41-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Eighth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - RANDOM SAMPLING ALGORITHMS FOR LANDMARK WINDOWS OVER DATA STREAMS
SN - 978-972-8865-41-2
AU - Longbo Z.
AU - Zhanhuai L.
AU - Min Y.
AU - Yong W.
AU - Yun J.
PY - 2006
SP - 103
EP - 107
DO - 10.5220/0002440501030107