D
n,m
=
n!
(n − m)!
(9)
In the end, the calculus of the number of all the
execution paths for a certain application has to con-
sider both the fragment distribution configuration (eq.
8) and the partial permutation of mappers (eq. 9):
N
exepath
=
n
∑
m=1
P(n, m) ×
n!
(n − m)!
(10)
For example, in the case of n=7 the number of gen-
erated paths will be around 18.000. For n=8 more
than 150.000 configurations were obtained. Treating
the problem of the generation of execution paths as an
integer partitioning problem allowed us to apply well
known algorithms working in constant amortized time
that guarantee acceptable time also on off-the-shelf
PCs (Zoghbi and Stojmenovic, 1994). For each con-
figuration generated by the algorithm, a correspond-
ing graph is built. On each graph’s node, parame-
ters (computing capacity, link capacity, β) are then
assigned. Finally the graph’s execution time is com-
puted.
5 CONCLUSION
The increasing rate at which data grow have stimu-
lated through the years the search for new strategies
to overcome the limits showed by legacy tools that
have been used so far to analyze data. MapReduce,
and in particular its open implementation Hadoop, has
attracted the interest of both private and academic re-
search as the programming model that best fit the need
for coping with big data. In this paper we address
the peculiar need to handle big data which by their
nature are distributed over many sites geographically
distant from each other. Plain Hadoop was proved to
be inefficient in that context. We propose a strategy
which inspires to hierarchical approaches prior pre-
sented in other literature’s works. The strategy lever-
ages on the partition number and the combinatorial
theory to partition big data into fragments and effi-
ciently distributes the workload among datacenters.
With respect to previous works, this exploits fresh
context information like the available computing and
the inter-site link capacity.
REFERENCES
Andrews, G. E. (1976). The Theory of Partitions, volume 2
of Encyclopedia of Mathematics and its Applications.
Dean, J. and Ghemawat, S. (2004). MapReduce: simplified
data processing on large clusters. In OSDI04: Pro-
ceeding of the 6th Conference on Symposium on op-
erating systems design and implementation. USENIX
Association.
Facebook (2012). Under the Hood: Scheduling
MapReduce jobs more efficiently with Corona.
https://www.facebook.com/notes/facebook-
engineering/under-the-hood-scheduling-mapreduce-
jobs-more-efficiently-with-corona.
Heintz, B., Chandra, A., Sitaraman, R., and Weissman, J.
(2014). End-to-end Optimization for Geo-Distributed
MapReduce. IEEE Transactions on Cloud Comput-
ing, PP(99):1–1.
Jayalath, C., Stephen, J., and Eugster, P. (2014). From
the Cloud to the Atmosphere: Running MapReduce
across Data Centers. IEEE Transactions on Comput-
ers, 63(1):74–87.
Kim, S., Won, J., Han, H., Eom, H., and Yeom, H. Y.
(2011). Improving Hadoop Performance in Intercloud
Environments. SIGMETRICS Perform. Eval. Rev.,
39(3):107–109.
Luo, Y., Guo, Z., Sun, Y., Plale, B., Qiu, J., and Li, W. W.
(2011). A Hierarchical Framework for Cross-domain
MapReduce Execution. In Proceedings of the Second
International Workshop on Emerging Computational
Methods for the Life Sciences, ECMLS ’11, pages 15–
22.
Mattess, M., Calheiros, R. N., and Buyya, R. (2013). Scal-
ing MapReduce Applications Across Hybrid Clouds
to Meet Soft Deadlines. In Proceedings of the 2013
IEEE 27th International Conference on Advanced In-
formation Networking and Applications, AINA ’13,
pages 629–636.
Open Networking Foundation (2012). Software-Defined
Networking: The New Norm for Networks. White
paper, Open Networking Foundation.
The Apache Software Foundation (2011). The Apache
Hadoop project. http://hadoop.apache.org/.
Yang, H., Dasdan, A., Hsiao, R., and Parker, D. S. (2007).
Map-reduce-merge: Simplified relational data pro-
cessing on large clusters. In Proceedings of the 2007
ACM SIGMOD International Conference on Manage-
ment of Data, SIGMOD ’07, pages 1029–1040.
Zhang, Q., Liu, L., Lee, K., Zhou, Y., Singh, A.,
Mandagere, N., Gopisetty, S., and Alatorre, G. (2014).
Improving Hadoop Service Provisioning in a Geo-
graphically Distributed Cloud. In Cloud Computing
(CLOUD), 2014 IEEE 7th International Conference
on, pages 432–439.
Zikopoulos, P. and Eaton, C. (2011). Understanding Big
Data: Analytics for Enterprise Class Hadoop and
Streaming Data. McGraw Hill.
Zoghbi, A. and Stojmenovic, I. (1994). Fast algorithms for
generating integer partitions. International Journal of
Computer Mathematics, 80:319–332.
Context-awareMapReduceforGeo-distributedBigData
421