ANALYTICAL AND EXPERIMENTAL EVALUATION OF STREAM-BASED JOIN

Henry Kostowski, Kajal T. Claypool

2005

Abstract

Continuous queries over data streams have gained popularity as the breadth of possible applications, ranging from network monitoring to online pattern discovery, have increased. Joining of streams is a fundamental issue that must be resolved to enable complex queries over multiple streams. However, as streams can represent potentially infinite data, it is infeasible to have full join evaluations as is the case with traditional databases. Joins in a stream environment are thus evaluated not over entire streams, but on specific windows defined on the streams. In this paper, we present windowed implementations of the traditional nested loops and hash join algorithms. In our work we analytically and experimentally evaluate the performance of these algorithms for different parameters. We find that, in general, a hash join provides better performance. We also investigate invalidation strategies to remove stale data from the window buffers, and propose an optimal strategy that balances processing time versus buffer size.

References

  1. Arasu, A., Babu, S., and Widom, J. (2002). An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Technical report, Stanford University.
  2. Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. (2002). Models and Issues in Data Stream Systems. In Principles of Database Systems (PODS).
  3. Babu, S. and Widom, J. (2001). Continuous Queries over Data Streams. In Sigmod Record.
  4. Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. (2002). Monitoring Streams - A New Class of Data Management Applications. In Int. Conference on Very Large Data Bases, pages 215-226.
  5. Chen, J., DeWitt, D., Tian, F., and Wang, Y. (2000). NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In SIGMOD, pages 379-390.
  6. Cranor, C., Gao, Y., Johnson, T., Shkapenyuk, V., and Spatscheck, O. (2002). Gigascope: High Performance Network Monitoring with an SQL Interface. In SIGMOD, page 623.
  7. Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., and Varma, R. (2003). Query Processing, Resource Management, and Approximation in a Data Stream Management System. In Conference on Innovative Data Systems Research.
  8. Ullman, J. and Widom, J. (1997). A First Course in Database Systems. Prentice-Hall, Inc.
  9. Viglas, S. and Naughton, J. (2002). Rate-based Query Optimization for Streaming Information Sources. In SIGMOD, pages 37-48.
Download


Paper Citation


in Harvard Style

Kostowski H. and T. Claypool K. (2005). ANALYTICAL AND EXPERIMENTAL EVALUATION OF STREAM-BASED JOIN . In Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 972-8865-19-8, pages 154-161. DOI: 10.5220/0002526701540161


in Bibtex Style

@conference{iceis05,
author={Henry Kostowski and Kajal T. Claypool},
title={ANALYTICAL AND EXPERIMENTAL EVALUATION OF STREAM-BASED JOIN},
booktitle={Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2005},
pages={154-161},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002526701540161},
isbn={972-8865-19-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - ANALYTICAL AND EXPERIMENTAL EVALUATION OF STREAM-BASED JOIN
SN - 972-8865-19-8
AU - Kostowski H.
AU - T. Claypool K.
PY - 2005
SP - 154
EP - 161
DO - 10.5220/0002526701540161