Characterising the Power Consumption of Hadoop Clouds - A Social Media Analysis Case Study

Javier Conejero, Omer Rana, Peter Burnap, Jeffrey Morgan, Carmen Carrion, Blanca Caminero

2013

Abstract

Energy efficiency is often identified as one of the key reasons for migrating to Cloud environments. It is often stated that a data centre hosting the Cloud environment is likely to achieve greater energy efficiency (at a reduced cost) compared to a local deployment. With increasing energy prices, it is also estimated that a large percentage of operational costs within a Cloud environment can be attributed to energy. In this work, we investigate and measure energy consumption of a number of virtual machines running the Hadoop system, over an OpenNebula Cloud. Our workload is based on sentiment analysis undertaken over Twitter messages. Our objective is to understand the tradeoff between energy efficiency and performance for such a workload. From our results we generalise and speculate on how such an analysis could be used as a basis to establish a Service Level Agreement with a Cloud provider – especially where there is likely to be a high level of variability (both in performance and energy use) over multiple runs of the same application (at different times).

References

  1. Cardiff On-line Social Media Observatory (COSMOS) (Last access: January 30, 2013). Web page at http:// www.cs.cf.ac.uk/cosmos/.
  2. CentOS: The Community ENTerprise Operating System (Last access: 13th October, 2012). Web page at http:// www.centos.org/.
  3. CloudSuite 1.0 (Last access: 16th October, 2012). Web page at http://parsa.epfl.ch/cloudsuite/ cloudsuite.html.
  4. Dean, J. and Ghemawat, S. (2008). Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107-113.
  5. Garg, S. and Buyya, R. (2012). Green Cloud Computing and Environmental Sustainability, Harnessing Green IT: Principles and Practices. Wiley Press, UK.
  6. Ghamkhari, M. and Mohsenian-Rad, H. (2012). Optimal Integration of Renewable Energy Resources in Data Centers with Behind-the-Meter Renewable Generator. In Proc. of the IEEE International Conference in Communications (ICC'2012), Ottawa, Canada.
  7. Goiri, I. n., Le, K., Nguyen, T. D., Guitart, J., Torres, J., and Bianchini, R. (2012). Greenhadoop: leveraging green energy in data-processing frameworks. In Proceedings of the 7th ACM european conference on Computer Systems, EuroSys 7812, pages 57-70, New York, NY, USA. ACM.
  8. Green Grid Association (Last access: January 30, 2013). Web page at http://www.thegreengrid.org/.
  9. Green IT Calculator (Last access: 22th November, 2012). Web page at http://www.vmware.com/solutions/ green/calculator.html.
  10. Intel Xeon Processor e5 Family (Last access: 13th October, 2012). Web page at http://www.intel.com/content/ www/us/en/processors/xeon/xeon-processor-5%000- sequence.html.
  11. Kaushik, R. T. and Bhandarkar, M. (2010). Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In Proceedings of the 2010 international conference on Power aware computing and systems, HotPower'10, pages 1-9, Berkeley, CA, USA. USENIX Association.
  12. Kernel Based Virtual Machine (KVM) (Last access: October 13, 2012). Web page at http://www.linuxkvm.org/.
  13. Lam, C. (2010). Hadoop in Action. Manning Publications.
  14. Laszewski, G. and Wang, L. (2010). GreenIT Service Level Agreements. In Wieder, P., Yahyapour, R., and Ziegler, W., editors, Grids and Service-Oriented Architectures for Service Level Agreements, pages 77- 88. Springer US.
  15. Leverich, J. and Kozyrakis, C. (2010). On the energy (in)efficiency of hadoop clusters. SIGOPS Oper. Syst. Rev., 44(1):61-65.
  16. Liu, L., Wang, H., Liu, X., Jin, X., He, W. B., Wang, Q. B., and Chen, Y. (2009). Greencloud: a new architecture for green data center. In Proceedings of the 6th international conference industry session on Autonomic computing and communications industry session, ICAC-INDST 7809, pages 29-38, New York, NY, USA. ACM.
  17. OpenNebula: The Open Source Solution for Data Center Virtualization (Last access: 13th October, 2012). Web page at http://opennebula.org/.
  18. Pang, B. and Lee, L. (2008). Opinion Mining and Sentiment Analysis. In Foundations and Trends in Information Retrieval 2(1-2) - Available at: http://www.cs.cornell.edu/home/llee/opinion-miningsentimen%t-analysis-survey.html, pages 1-135.
  19. Rivest, R. (1992). The MD5 Message-Digest Algorithm. RFC 1321 (Informational). Updated by RFC 6151.
  20. SentiStrength: The sentiment strength detection in short texts (Last access: 10th October, 2012). Web page at http://sentistrength.wlv.ac.uk/.
  21. Shi, B. and Srivastava, A. (2010). Thermal and poweraware task scheduling for hadoop based storage centric datacenters. In Proceedings of the International Conference on Green Computing, GREENCOMP 7810, pages 73-83, Washington, DC, USA. IEEE Computer Society.
  22. Sood, D. D. and Kumar, S. (2010). Cloud Computing & Green IT. Technical report.
  23. UPS Selector Sizing Application (Last access: 22th November, 2012). Web page at http://www.apc.com/ template/size/apc/.
  24. White, T. (2009). Hadoop: The Definitive Guide. O'Reilly.
Download


Paper Citation


in Harvard Style

Conejero J., Rana O., Burnap P., Morgan J., Carrion C. and Caminero B. (2013). Characterising the Power Consumption of Hadoop Clouds - A Social Media Analysis Case Study . In Proceedings of the 3rd International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-8565-52-5, pages 233-243. DOI: 10.5220/0004373502330243


in Bibtex Style

@conference{closer13,
author={Javier Conejero and Omer Rana and Peter Burnap and Jeffrey Morgan and Carmen Carrion and Blanca Caminero},
title={Characterising the Power Consumption of Hadoop Clouds - A Social Media Analysis Case Study},
booktitle={Proceedings of the 3rd International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2013},
pages={233-243},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004373502330243},
isbn={978-989-8565-52-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - Characterising the Power Consumption of Hadoop Clouds - A Social Media Analysis Case Study
SN - 978-989-8565-52-5
AU - Conejero J.
AU - Rana O.
AU - Burnap P.
AU - Morgan J.
AU - Carrion C.
AU - Caminero B.
PY - 2013
SP - 233
EP - 243
DO - 10.5220/0004373502330243