Key based Reducer Placement for Data Analytics across Data Centers Considering Bi-level Resource Provision in Cloud Computing

Jiangtao Zhang, Lingmin Zhang, Hejiao Huang, Zeo L. Jiang, Xuan Wang

2016

Abstract

Due to the distribution characteristic of the data source, such as astronomy and sales, or the legal prohibition, it is not always practical to store the world-wide data in only one data center (DC). Hadoop is a commonly accepted framework for big data analytics. But it can only deal with data within one DC. The distribution of data necessitates the study of Hadoop across DCs. In this situation, though we can place mapper in the local DCs, where to place reducers is a great challenge, since each reducer almost needs to process all map output across all involved DCs. Aiming to reduce costs, a key based scheme is proposed which can respect the locality principle of traditional Hadoop as much as possible while realizing deployment of reducers with lower cost. Considering both data center level and server level resource provision, a bi-level programming is used to formalize the problem and it is solved by a tailored two level group genetic algorithm (TLGGA). Extensive simulations demonstrate the effectiveness of TLGGA. It can outperform both the baseline and the state-of-the-art mechanisms by 49% and 40%, respectively.

References

  1. Chang, H., Kodialam, M., Kompella, R., Lakshman, T., Lee, M., and Mukherjee, S. (2011). Scheduling in mapreduce-like systems for fast completion time. In INFOCOM, 2011 Proceedings IEEE, pages 3074- 3082.
  2. Chang, V. (2015). Towards a big data system disaster recovery in a private cloud. Ad Hoc Networks, 35:65-82.
  3. Chang, V., Kuo, Y. H., and Ramachandran, M. (2015). Cloud computing adoption frameworka security framework for business clouds. Future Generation Computer Systems, 57:2441.
  4. Chang, V. and Ramachandran, M. (2016). Towards achieving data security with the cloud computing adoption framework. IEEE Transactions on Services Computing, pages 1-1.
  5. Chang, V. and Wills, G. (2015). A model to compare cloud and non-cloud storage of big data. Future Generation Computer Systems.
  6. Fan, X., Weber, W.-D., and Barroso, L. A. (2007). Power provisioning for a warehouse-sized computer. SIGARCH Comput. Archit. News, 35(2):13-23.
  7. Greenberg, A., Hamilton, J., Maltz, D. A., and Patel, P. (2008). The cost of a cloud: research problems in data center networks. ACM SIGCOMM Computer Communication Review, 39(1):68-73.
  8. He, C., Weitzel, D., Swanson, D., and Lu, Y. (2012). Hog: Distributed hadoop mapreduce on the grid. In High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, pages 1276- 1283. IEEE.
  9. Jayalath, C., Stephen, J., and Eugster, P. (2014). From the cloud to the atmosphere: Running mapreduce across data centers. Computers, IEEE Transactions on, 63(1):74-87.
  10. Kuo, J.-J., Yang, H.-H., and Tsai, M.-J. (2014). Optimal approximation algorithm of virtual machine placement for data latency minimization in cloud systems. In INFOCOM, 2014 Proceedings IEEE, pages 1303-1311. IEEE.
  11. Lublinsky, B., Smith, K. T., and Yakubovich, A. (2013). Professional Hadoop solutions. John Wiley & Sons, Inc.
  12. Maheshwari, N., Nanduri, R., and Varma, V. (2012). Dynamic energy efficient data placement and cluster reconfiguration algorithm for mapreduce framework. Future Generation Computer Systems, 28(1):119127.
  13. Moghaddam, F. F., Moghaddam, R. F., and Cheriet, M. (2014). Carbon-aware distributed cloud: multi-level grouping genetic algorithm. Cluster Computing, pages 1-15.
  14. Schadt, E. E., Linderman, M. D., Sorenson, J., Lee, L., and Nolan, G. P. (2010). Computational solutions to largescale data management and analysis. Nature Reviews Genetics, 11(9):647-657.
  15. Sun, H., Gao, Z., and Wu, J. (2008). A bi-level programming model and solution algorithm for the location of logistics distribution centers. Applied Mathematical Modelling, 32(4):610 - 616.
  16. Tannir, K. (2014). Optimizing Hadoop for MapReduce. Packt Publishing Ltd.
  17. Tudoran, R., Costan, A., and Antoniu, G. (2012). Mapiterativereduce: a framework for reduction-intensive data processing on azure clouds. In Proceedings of third international workshop on MapReduce and its Applications Date, pages 9-16.
  18. VMware. vcpu. http://pubs.vmware.com/vsphere-50/ index.jsp#com.vmware.vsphere.vm admin.doc 50/ GUID-13AD347E-3B77-4A67-B3F4- 4AC2230E4509.html.
  19. Wang, L. and Shen, J. (2014). Multi-phase ant colony system for multi-party data-intensive service provision. Services Computing, IEEE Transactions on, PP(99):1-1.
  20. Wang, L., Shen, J., and Luo, J. (2015). Facilitating an ant colony algorithm for multi-objective data-intensive service provision. Journal of Computer & System Sciences, 81(4):734-746.
  21. Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., and Chen, D. (2013). G-hadoop: Mapreduce across distributed data centers for data-intensive computing. Future Generation Computer Systems, 29(3):739-750.
  22. White, T. (2010). Hadoop: The Definitive Guide. 2nd Edition. O'Reilly Media, Inc.
  23. Xu, H. and Li, B. (2012). A general and practical datacenter selection framework for cloud services. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on, pages 9-16. IEEE.
  24. Yao, Y., Huang, L., Sharma, A., Golubchik, L., and Neely, M. (2014). Power cost reduction in distributed data centers: A two-time-scale approach for delay tolerant workloads. IEEE Transactions on Parallel and Distributed Systems, 25(1):200-211.
  25. Zeng, L., Veeravalli, B., and Wei, Q. (2014). Space4time: Optimization latency-sensitive content service in cloud. Journal of Network and Computer Applications, 41:358-368.
  26. Zhang, J., Huang, H., and Wang, X. (2016a). Resource provision algorithms in cloud computing: A survey. Journal of Network and Computer Applications,pages 1-1
  27. Zhang, J., Zhang, L., Huang, H., Wang, X., Gu, C., and He, Z. (2016b). A unified algorithm for virtual desktops placement in distributed cloud computing. Mathematical Problems in Engineering, 2016:1 - 15.
  28. Zhang, W., Wang, L., Ma, Y., and Liu, D. (2014). Design and implementation of task scheduling strategies for massive remote sensing data processing across multiple data centers. Software Practice & Experience, 44(7):873-886.
  29. Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., Ranjan, R., Kolodziej, J., Streit, A., and Georgakopoulos, D. (2014). A security framework in g-hadoop for big data computing across distributed cloud data centres. Journal of Computer and System Sciences,80(5):994- 1007.
Download


Paper Citation


in Harvard Style

Zhang J., Zhang L., Huang H., Jiang Z. and Wang X. (2016). Key based Reducer Placement for Data Analytics across Data Centers Considering Bi-level Resource Provision in Cloud Computing . In Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD, ISBN 978-989-758-183-0, pages 243-254. DOI: 10.5220/0005894202430254


in Bibtex Style

@conference{iotbd16,
author={Jiangtao Zhang and Lingmin Zhang and Hejiao Huang and Zeo L. Jiang and Xuan Wang},
title={Key based Reducer Placement for Data Analytics across Data Centers Considering Bi-level Resource Provision in Cloud Computing},
booktitle={Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD,},
year={2016},
pages={243-254},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005894202430254},
isbn={978-989-758-183-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD,
TI - Key based Reducer Placement for Data Analytics across Data Centers Considering Bi-level Resource Provision in Cloud Computing
SN - 978-989-758-183-0
AU - Zhang J.
AU - Zhang L.
AU - Huang H.
AU - Jiang Z.
AU - Wang X.
PY - 2016
SP - 243
EP - 254
DO - 10.5220/0005894202430254