Context-aware MapReduce for Geo-distributed Big Data

Marco Cavallo, Giuseppe Di Modica, Carmelo Polito, Orazio Tomarchio

2015

Abstract

MapReduce is an effective distributed programming model used in cloud computing for large-scale data analysis applications. Hadoop, the most known and used open-source implementation of the MapReduce model, assumes that every node in a cluster has the same computing capacity and that data are local to tasks. However, in many real big data applications where data may be located in many datacenters distributed over the planet these assumptions do not hold any longer, thus affecting Hadoop performance. This paper addresses this point, by proposing a hierarchical MapReduce programming model where a toplevel scheduling system is aware of the underlying computing contexts heterogeneity. The main idea of the approach is to improve the job processing time by partitioning and redistributing the workload among geo-distributed workers: this is done by adequately monitoring the bottom-level computing and networking context.

References

  1. Andrews, G. E. (1976). The Theory of Partitions, volume 2 of Encyclopedia of Mathematics and its Applications.
  2. Dean, J. and Ghemawat, S. (2004). MapReduce: simplified data processing on large clusters. In OSDI04: Proceeding of the 6th Conference on Symposium on operating systems design and implementation. USENIX Association.
  3. Facebook (2012). Under the Hood: Scheduling MapReduce jobs more efficiently with Corona. https://www.facebook.com/notes/facebookengineering/under-the-hood-scheduling-mapreducejobs-more-efficiently-with-corona.
  4. Heintz, B., Chandra, A., Sitaraman, R., and Weissman, J. (2014). End-to-end Optimization for Geo-Distributed MapReduce. IEEE Transactions on Cloud Computing, PP(99):1-1.
  5. Jayalath, C., Stephen, J., and Eugster, P. (2014). From the Cloud to the Atmosphere: Running MapReduce across Data Centers. IEEE Transactions on Computers, 63(1):74-87.
  6. Kim, S., Won, J., Han, H., Eom, H., and Yeom, H. Y. (2011). Improving Hadoop Performance in Intercloud Environments. SIGMETRICS Perform. Eval. Rev., 39(3):107-109.
  7. Luo, Y., Guo, Z., Sun, Y., Plale, B., Qiu, J., and Li, W. W. (2011). A Hierarchical Framework for Cross-domain MapReduce Execution. In Proceedings of the Second International Workshop on Emerging Computational Methods for the Life Sciences, ECMLS 7811, pages 15- 22.
  8. Mattess, M., Calheiros, R. N., and Buyya, R. (2013). Scaling MapReduce Applications Across Hybrid Clouds to Meet Soft Deadlines. In Proceedings of the 2013 IEEE 27th International Conference on Advanced Information Networking and Applications, AINA 7813, pages 629-636.
  9. Open Networking Foundation (2012). Software-Defined Networking: The New Norm for Networks. White paper, Open Networking Foundation.
  10. The Apache Software Foundation (2011). The Apache Hadoop project. http://hadoop.apache.org/.
  11. Yang, H., Dasdan, A., Hsiao, R., and Parker, D. S. (2007). Map-reduce-merge: Simplified relational data processing on large clusters. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 7807, pages 1029-1040.
  12. Zhang, Q., Liu, L., Lee, K., Zhou, Y., Singh, A., Mandagere, N., Gopisetty, S., and Alatorre, G. (2014). Improving Hadoop Service Provisioning in a Geographically Distributed Cloud. In Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on, pages 432-439.
  13. Zikopoulos, P. and Eaton, C. (2011). Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw Hill.
  14. Zoghbi, A. and Stojmenovic, I. (1994). Fast algorithms for generating integer partitions. International Journal of Computer Mathematics, 80:319-332.
Download


Paper Citation


in Harvard Style

Cavallo M., Di Modica G., Polito C. and Tomarchio O. (2015). Context-aware MapReduce for Geo-distributed Big Data . In Proceedings of the 5th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-758-104-5, pages 414-421. DOI: 10.5220/0005497704140421


in Bibtex Style

@conference{closer15,
author={Marco Cavallo and Giuseppe Di Modica and Carmelo Polito and Orazio Tomarchio},
title={Context-aware MapReduce for Geo-distributed Big Data},
booktitle={Proceedings of the 5th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2015},
pages={414-421},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005497704140421},
isbn={978-989-758-104-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - Context-aware MapReduce for Geo-distributed Big Data
SN - 978-989-758-104-5
AU - Cavallo M.
AU - Di Modica G.
AU - Polito C.
AU - Tomarchio O.
PY - 2015
SP - 414
EP - 421
DO - 10.5220/0005497704140421