PIPELINED PARALLELISM IN MULTI-JOIN QUERIES ON HETEROGENEOUS SHARED NOTHING ARCHITECTURES

Mohamad Al Hajj Hassan, Mostafa Bamha

Abstract

Pipelined parallelism was largely studied and successfully implemented, on shared nothing machines, in several join algorithms in the presence of ideal conditions of load balancing between processors and in the absence of data skew. The aim of pipelining is to allow flexible resource allocation while avoiding unnecessary disk input/output for intermediate join results in the treatment of multi-join queries. The main drawback of pipelining in existing algorithms is that communication and load balancing remain limited to the use of static approaches (generated during query optimization phase) based on hashing to re-distribute data over the network and therefore cannot solve data skew problem and load imbalance between processors on heterogeneous multi-processor architectures where the load of each processor may vary in a dynamic and unpredictable way. In this paper, we present a new parallel join algorithm allowing to solve the problem of data skew while guaranteeing perfect balancing properties, on heterogeneous multi-processor Shared Nothing architectures. The performance of this algorithm is analyzed using the scalable portable BSP (Bulk Synchronous Parallel) cost model.

References

  1. Bamha, M. (2005). An optimal and skew-insensitive join and multi-join algorithm for distributed architectures. In Proc. of DEXA'2005 International Conference. Copenhagen, Danemark, pages 616-625.
  2. Bamha, M. and Exbrayat, M. (2003). Pipelining a skewinsensitive parallel join algorithm. Parallel Processing Letters, 13(3), pages 317-328.
  3. Bamha, M. and Hains, G. (2000). A skew insensitive algorithm for join and multi-join operation on Shared Nothing machines. Proc. of DEXA'2000 International Conference, pages 644-653, London, UK.
  4. Bamha, M. and Hains, G. (1999). A frequency adaptive join algorithm for Shared Nothing machines. PDCP Journal, Volume 3, Number 3, pages 333-345.
  5. Chen, M.-S., Lo, M. L., Yu, P. S., and Young, H. C. (1992a). Using segmented right-deep trees for the execution of pipelined hash joins. Proc. of VLDB'92 International Conference, 1992, Vancouver, Canada, pages 15-26.
  6. Chen, M.-S., Yu, P. S., and Wu, K.-L. (1992b). Scheduling and processor allocation for the execution of multijoin queries. In International Conference on Data Engineering, pages 58-67, Los Alamos, Ca., USA.
  7. Datta, A., Moon, B., and Thomas, H. (1998). A case for parallelism in datawarehousing and OLAP. In Proc. of DEXA 98 International Workshop, IEEE Computer Society, pages 226-231, Vienna.
  8. DeWitt, D. J., Naughton, J. F., Schneider, D. A., and Seshadri, S. (1992). Practical Skew Handling in Parallel Joins. In Proceedings of the 18th VLDB Conference, pages 27-40, Vancouver, British Columbia, Canada.
  9. Gounaris, A. (2005). Resource aware query processing on the grid. Thesis report, University of Manchester, Faculty of Engineering and Physical Sciences.
  10. Hassan, M. A. H. and Bamha, M. (2008). Dynamic data redistribution for join queries on heterogeneous shared nothing architecture. Technical Report 2, LIFO, Université d'Orléans, France.
  11. Hua, K. A. and Lee, C. (1991). Handling data skew in multiprocessor database computers using partition tuning. In Proc. of VLDB 17th International Conference, pages 525-535, Barcelona, Catalonia, Spain.
  12. Liu, B. and Rundensteiner, E. A. (2005). Revisiting pipelined parallelism in multi-join query processing. In Proc. of VLDB'05 International Conference, pages 829-840.
  13. Lu, H., Ooi, B.-C., and Tan, K.-L. (1994). Query Processing in Parallel Relational Database Systems. IEEE Computer Society Press, Los Alamos, California.
  14. Mourad, A. N., Morris, R. J. T., Swami, A., and Young, H. C. (1994). Limits of parallelism in hash join algorithms. Performance evaluation, 20(1/3):301-316.
  15. Rahm, E. (August 1996). Dynamic load balancing in parallel database systems. in: Proc. EURO-PAR'96 Conference, LNCS, Springer-Verlag, Lyon.
  16. Schneider, D. and DeWitt, D. (1989). A performance evaluation of four parallel join algorithms in a sharednothing multiprocessor environment. Proc. of 1989 ACM SIGMOD International Conference, Portland, Oregon, pages 110-121, New York, NY 10036, USA.
  17. Skillicorn, D. B., Hill, J. M. D., and McColl, W. F. (1997). Questions and Answers about BSP. Scientific Programming, 6(3):249-274.
  18. Valiant, L. G. (1990). A bridging model for parallel computation. Communications of the ACM, 33(8):103-111.
  19. Wilschut, A. N., Flokstra, J., and Apers, P. M. (1995). Parallel evaluation of multi-join queries. In Proc. of ACMSIGMOD, 24(2):115-126.
Download


Paper Citation


in Harvard Style

Al Hajj Hassan M. and Bamha M. (2008). PIPELINED PARALLELISM IN MULTI-JOIN QUERIES ON HETEROGENEOUS SHARED NOTHING ARCHITECTURES . In Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT, ISBN 978-989-8111-51-7, pages 127-134. DOI: 10.5220/0001889901270134


in Bibtex Style

@conference{icsoft08,
author={Mohamad Al Hajj Hassan and Mostafa Bamha},
title={PIPELINED PARALLELISM IN MULTI-JOIN QUERIES ON HETEROGENEOUS SHARED NOTHING ARCHITECTURES},
booktitle={Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT,},
year={2008},
pages={127-134},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001889901270134},
isbn={978-989-8111-51-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT,
TI - PIPELINED PARALLELISM IN MULTI-JOIN QUERIES ON HETEROGENEOUS SHARED NOTHING ARCHITECTURES
SN - 978-989-8111-51-7
AU - Al Hajj Hassan M.
AU - Bamha M.
PY - 2008
SP - 127
EP - 134
DO - 10.5220/0001889901270134