A Lightweight Tool for Anomaly Detection in Cloud Data Centres

Sakil Barbhuiya, Zafeirios Papazachos, Peter Kilpatrick, Dimitrios S. Nikolopoulos

2015

Abstract

Cloud data centres are critical business infrastructures and the fastest growing service providers. Detecting anomalies in Cloud data centre operation is vital. Given the vast complexity of the data centre system software stack, applications and workloads, anomaly detection is a challenging endeavour. Current tools for detecting anomalies often use machine learning techniques, application instance behaviours or system metrics distribution, which are complex to implement in Cloud computing environments as they require training, access to application-level data and complex processing. This paper presents LADT, a lightweight anomaly detection tool for Cloud data centres that uses rigorous correlation of system metrics, implemented by an efficient correlation algorithm without need for training or complex infrastructure set up. LADT is based on the hypothesis that, in an anomaly-free system, metrics from data centre host nodes and virtual machines (VMs) are strongly correlated. An anomaly is detected whenever correlation drops below a threshold value. We demonstrate and evaluate LADT using a Cloud environment, where it shows that the hosting node I/O operations per second (IOPS) are strongly correlated with the aggregated virtual machine IOPS, but this correlation vanishes when an application stresses the disk, indicating a node-level anomaly.

References

  1. Antunes, J., Neves, N., and Verissimo, P. (2008). Detection and prediction of resource-exhaustion vulnerabilities. In Software Reliability Engineering, 2008. ISSRE 2008. 19th International Symposium on, pages 87-96.
  2. Azmandian, F., Moffie, M., Alshawabkeh, M., Dy, J., Aslam, J., and Kaeli, D. (2011). Virtual machine monitor-based lightweight intrusion detection. ACM SIGOPS Operating Systems Review, 45(2):38-53.
  3. Dahbur, K., Mohammad, B., and Tarakji, A. B. (2011). A survey of risks, threats and vulnerabilities in cloud computing. In Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications, ISWSA 7811, pages 12:1-12:6, New York, NY, USA. ACM.
  4. Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A. D., Ailamaki, A., and Falsafi, B. (2012). Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 7812, pages 37-48, New York, NY, USA. ACM.
  5. Hansen, S. E. and Atkins, E. T. (1993). Automated system monitoring and notification with swatch. In Proceedings of the 7th USENIX Conference on System Administration, LISA 7893, pages 145-152, Berkeley, CA, USA. USENIX Association.
  6. Kang, H., Chen, H., and Jiang, G. (2010). Peerwatch: A fault detection and diagnosis tool for virtualized consolidation systems. In Proceedings of the 7th International Conference on Autonomic Computing, ICAC 7810, pages 119-128, New York, NY, USA. ACM.
  7. Kephart, J. O. and Chess, D. M. (2003). The vision of autonomic computing. Computer, 36(1):41-50.
  8. Kumar, V., Cooper, B. F., Eisenhauer, G., and Schwan, K. (2007). imanage: Policy-driven self-management for enterprise-scale systems. In Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware, Middleware 7807, pages 287-307, New York, NY, USA. Springer-Verlag New York, Inc.
  9. Li, D., Jin, H., Liao, X., Zhang, Y., and Zhou, B. (2013). Improving disk i/o performance in a virtualized system. J. Comput. Syst. Sci., 79(2):187-200.
  10. Lou, J.-G., Fu, Q., Yang, S., Xu, Y., and Li, J. (2010). Mining invariants from console logs for system problem detection. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC'10, pages 24-24, Berkeley, CA, USA. USENIX Association.
  11. Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. (2008). Pig latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD 7808, pages 1099-1110, New York, NY, USA. ACM.
  12. Oppenheimer, D., Ganapathi, A., and Patterson, D. A. (2003). Why do internet services fail, and what can be done about it? In Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems - Volume 4, USITS'03, pages 1-1, Berkeley, CA, USA. USENIX Association.
  13. Pertet, S. and Narasimhan, P. (2005). Causes of failure in web applications. Technical report, CMU-PDL-05- 109.
  14. Rabkin, A. and Katz, R. (2010). Chukwa: A system for reliable large-scale log collection. In Proceedings of the 24th International Conference on Large Installation System Administration, LISA'10, pages 1-15, Berkeley, CA, USA. USENIX Association.
  15. Rajasekar, N. C. and Imafidon, C. (2010). Exploitation of vulnerabilities in cloud storage. In Proceedings of the First International Conference on Cloud Computing, GRIDs, and Virtualization, pages 122-127.
  16. Rouillard, J. P. (2004). Refereed papers: Real-time log file analysis using the simple event correlator (sec). In Proceedings of the 18th USENIX Conference on System Administration, LISA 7804, pages 133-150, Berkeley, CA, USA. USENIX Association.
  17. Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST 7810, pages 1-10, Washington, DC, USA. IEEE Computer Society.
  18. Sigar (2014). https://support.hyperic.com/display/sigar/home.
  19. Tan, J., Kavulya, S., Gandhi, R., and Narasimhan, P. (2012). Light-weight black-box failure detection for distributed systems. In Proceedings of the 2012 Workshop on Management of Big Data Systems, MBDS 7812, pages 13-18, New York, NY, USA. ACM.
  20. The, J. P. and Prewett, J. E. (2003). Analyzing cluster log files using logsurfer. In in Proceedings of the 4th Annual Conference on Linux Clusters.
  21. Virt-Top (2014). http://virt-tools.org/about/.
  22. Vora, M. (2011). Hadoop-hbase for large-scale data. In Computer Science and Network Technology (ICCSNT), 2011 International Conference on, volume 1, pages 601-605.
  23. Wang, C. (2009). Ebat: Online methods for detecting utility cloud anomalies. In Proceedings of the 6th Middleware Doctoral Symposium, MDS 7809, pages 4:1-4:6, New York, NY, USA. ACM.
  24. Ward, J. S. and Barker, A. (2013). Varanus: In situ monitoring for large scale cloud systems. In Proceedings of the 2013 IEEE International Conference on Cloud Computing Technology and Science - Volume 02, CLOUDCOM 7813, pages 341-344, Washington, DC, USA. IEEE Computer Society.
  25. Xu, W., Huang, L., Fox, A., Patterson, D., and Jordan, M. I. (2009). Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP 7809, pages 117-132, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Barbhuiya S., Papazachos Z., Kilpatrick P. and S. Nikolopoulos D. (2015). A Lightweight Tool for Anomaly Detection in Cloud Data Centres . In Proceedings of the 5th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-758-104-5, pages 343-351. DOI: 10.5220/0005453403430351


in Bibtex Style

@conference{closer15,
author={Sakil Barbhuiya and Zafeirios Papazachos and Peter Kilpatrick and Dimitrios S. Nikolopoulos},
title={A Lightweight Tool for Anomaly Detection in Cloud Data Centres},
booktitle={Proceedings of the 5th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2015},
pages={343-351},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005453403430351},
isbn={978-989-758-104-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - A Lightweight Tool for Anomaly Detection in Cloud Data Centres
SN - 978-989-758-104-5
AU - Barbhuiya S.
AU - Papazachos Z.
AU - Kilpatrick P.
AU - S. Nikolopoulos D.
PY - 2015
SP - 343
EP - 351
DO - 10.5220/0005453403430351