(n − m)!
In the end, the calculus of the number of all the
execution paths for a certain application has to con-
sider both the fragment distribution configuration (eq.
8) and the partial permutation of mappers (eq. 9):
P(n, m) ×
(n − m)!
For example, in the case of n=7 the number of gen-
erated paths will be around 18.000. For n=8 more
than 150.000 configurations were obtained. Treating
the problem of the generation of execution paths as an
integer partitioning problem allowed us to apply well
known algorithms working in constant amortized time
that guarantee acceptable time also on off-the-shelf
PCs (Zoghbi and Stojmenovic, 1994). For each con-
figuration generated by the algorithm, a correspond-
ing graph is built. On each graph’s node, parame-
ters (computing capacity, link capacity, β) are then
assigned. Finally the graph’s execution time is com-
The increasing rate at which data grow have stimu-
lated through the years the search for new strategies
to overcome the limits showed by legacy tools that
have been used so far to analyze data. MapReduce,
and in particular its open implementation Hadoop, has
attracted the interest of both private and academic re-
search as the programming model that best fit the need
for coping with big data. In this paper we address
the peculiar need to handle big data which by their
nature are distributed over many sites geographically
distant from each other. Plain Hadoop was proved to
be inefficient in that context. We propose a strategy
which inspires to hierarchical approaches prior pre-
sented in other literature’s works. The strategy lever-
ages on the partition number and the combinatorial
theory to partition big data into fragments and effi-
ciently distributes the workload among datacenters.
With respect to previous works, this exploits fresh
context information like the available computing and
the inter-site link capacity.
