(E. Alba, 2002; J.R. Gonz
´
alez, 2004) include skele-
tons for divide and conquer. The MaLLBa implemen-
tation of the divide and conquer skeleton presented
in (E. Alba, 2002) is based on a farm (master-slave)
strategy, which is inapplicable for streams (Poldner
and Kuchen, 2008a). The distributed approach dis-
cussed in (J.R. Gonz
´
alez, 2004) offers the same user
interface as the MaLLBa skeleton and can be inte-
grated into the MaLLBa framework. Unfortunately,
neither runtimes of the considered example applica-
tions are presented nor the design was discussed in
the context of streams. In (Cole, 1997), Cole suggests
to offer divide and combine as independent skeletons.
But this approach has not been implemented in eSkel.
The eSkel Butterfly-Skeleton (Cole, 2004) is based on
group partitioning and supports divide and conquer
algorithms in which all activity occurs in the divide
phase. In contrast to our approach, the number of
processors used for the Butterfly skeleton starts from
a power of two. This is due to the group partitioning
strategy. Note that algorithms like Strassen or Karat-
suba produce a number of subproblems which is not a
power of two. The skeTo library (K. Matsuzaki, 2006)
only provides data parallel skeletons and is based on
the theory of Constructive Algorithmics. Restricted
data parallel approaches are discussed in (Bischof,
2005; Gorlatch, 1997). In (Gorlatch, 1997), a pro-
cessor topology called N-graph is presented, which
is used for a parallel implementation of a divide and
conquer skeleton in a functional context. Hermann
presents different general and special forms of divide
and conquer skeletons in context of the purely func-
tional programming language HDC, which is a sub-
set of Haskell (Herrmann, 2000). A mixed data and
task parallel approach can be found in (Y. Bai, 2007).
However, we are not aware of any implementation of
a divide and conquer skeleton which combines stream
processing and internal task parallelism.
5 CONCLUSIONS
We have analyzed alternative topologies for process-
ing streams of divide and conquer problems. After in-
troducing the design of the fully distributed DCSkele-
ton, which was used in a previous version of the
skeleton library Muesli, we have considered a dis-
tributed farm of a sequentially working DCSkeletons
as well as a fully distributed DCSkeleton in the con-
text of streams. We suggest combining skeletal inter-
nal task parallelism with stream parallelism to achieve
both, better memory utilization and a reduction of idle
times of the engaged solvers. Moreover, we present a
new divide and conquer skeleton optimized for stream
processing. By applying the skeleton with applica-
tion specific parameters, it can be configured to be
a hybrid of a pure stream processing farm and the
DCSkeleton, and it can range between both extremes.
In comparison to the DCSkeleton the new StreamDC
skeleton benefits from overlapping the startup and end
phases of solving single problems by solving sev-
eral problems in parallel. The advantage is, that only
few problems must be prepared for load distribution
which reduces divide and combine operator calls and
increases the sequential part of the computation. As
we have shown, the new StreamDC skeleton is clearly
superior to the DCSkeleton. In comparison to a farm
of sequentially working DCSkeletons it offers a better
scalability, which is advantageous in particular when
only few divide and conquer problems have to be
solved. Moreover, the complete sharing of the dis-
tributed memory is a great advantage compared to a
farm, in which the solvers only have access to their
own local memory. Thus, the new StreamDC is able
to solve problems, which cannot be solved by a se-
quential DCSkeleton used in farms due to the lack
of memory. In future work we intend to investigate
alternative stream based implementation schemes of
skeletons for branch and bound and other search al-
gorithms.
REFERENCES
A. Benoit, M. Cole, J. H. S. G. (2005). Flexible skeletal pro-
gramming with eskel. In Proc. EuroPar 2005. LNCS
3648, 761–770, Springer Verlag, 2005.
Aldinucci, M. and Danelutto, M. (1999). Stream paral-
lel skeleton optimization. In Proceedings of the 11th
IASTED International Conference on Parallel and
Distributed Computing and Systems, MIT, Boston,
USA. IASTED/ACTA press.
Baumgartner, G. (2002). A high-level approach to synthesis
of high-performance codes for quantum chemistry.
Bischof, H. (2005). Systematic development of parallel pro-
grams using skeletons. In PhD thesis. Shaker.
Cole, M. (1989). Algorithmic Skeletons: Structured Man-
agement of Parallel Computation. MIT Press.
Cole, M. (1997). On dividing and conquering indepen-
dently. In in Proceedings of Euro-Par’97, LNCS 1300,
pages 634-637. Springer.
Cole, M. (2004). Bringing skeletons out of the closet:
A pragmatic manifesto for skeletal parallel program-
ming. In Parallel Computing 30(3), 389–406.
E. Alba, F. Almeida, e. a. (2002). Mallba: A library of
skeletons for combinatorial search. In Proc. Euro-Par
2002. LNCS 2400, 927–932, Springer Verlag, 2005.
G. H. Botorog, H. K. (1996). Efficient parallel program-
ming with algorithmic skeletons. In Proc. Euro-
Par’96. LNCS 1123, 718–731, Springer Verlag, 1996.
ICSOFT 2008 - International Conference on Software and Data Technologies
188