Authors:
Mohamad Al Hajj Hassan
and
Mostafa Bamha
Affiliation:
LIFO, University of Orléans, France
Keyword(s):
PDBMS (Parallel Database Management Systems), Intra-transaction parallelism, Parallel joins, Multi-joins, Data skew, Dynamic load balancing.
Related
Ontology
Subjects/Areas/Topics:
Distributed and Mobile Software Systems
;
Energy and Economy
;
Load Balancing in Smart Grids
;
Parallel and High Performance Computing
;
Smart Grids
;
Software Engineering
Abstract:
Pipelined parallelism was largely studied and successfully implemented, on shared nothing machines, in several join algorithms in the presence of ideal conditions of load balancing between processors and in the absence of data skew. The aim of pipelining is to allow flexible resource allocation while avoiding unnecessary disk input/output for intermediate join results in the treatment of multi-join queries. The main drawback of pipelining in existing algorithms is that communication and load balancing remain limited to the use of static approaches (generated during query optimization phase) based on hashing to re-distribute data over the network and therefore cannot solve data skew problem and load imbalance between processors on heterogeneous multi-processor architectures where the load of each processor may vary in a dynamic and unpredictable way. In this paper, we present a new parallel join algorithm allowing to solve the problem of data skew while guaranteeing perfect balancing
properties, on heterogeneous multi-processor Shared Nothing architectures. The performance of this algorithm is analyzed using the scalable portable BSP (Bulk Synchronous Parallel) cost model.
(More)