Machine Learning Models with Fault Tree Analysis for Explainable Failure Detection in Cloud Computing

Rudolf Hoffmann, Christoph Reich

2024

Abstract

Cloud computing infrastructures availability rely on many components, like software, hardware, cloud management system (CMS), security, environmental, and human operation, etc. If something goes wrong the root cause analysis (RCA) is often complex. This paper explores the integration of Machine Learning (ML) with Fault Tree Analysis (FTA) to enhance explainable failure detection in cloud computing systems. We introduce a framework employing ML for FT selection and generation, and for predicting Basic Events (BEs) to enhance the explainability of failure analysis. Our experimental validation focuses on predicting BEs and using these predictions to calculate the Top Event (TE) probability. The results demonstrate improved diagnostic accuracy and reliability, highlighting the potential of combining ML predictions with traditional FTA to identify root causes of failures in cloud computing environments and make the failure diagnostic more explainable.

Download


Paper Citation


in Harvard Style

Hoffmann R. and Reich C. (2024). Machine Learning Models with Fault Tree Analysis for Explainable Failure Detection in Cloud Computing. In Proceedings of the 14th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER; ISBN 978-989-758-701-6, SciTePress, pages 295-302. DOI: 10.5220/0012727600003711


in Bibtex Style

@conference{closer24,
author={Rudolf Hoffmann and Christoph Reich},
title={Machine Learning Models with Fault Tree Analysis for Explainable Failure Detection in Cloud Computing},
booktitle={Proceedings of the 14th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER},
year={2024},
pages={295-302},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012727600003711},
isbn={978-989-758-701-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER
TI - Machine Learning Models with Fault Tree Analysis for Explainable Failure Detection in Cloud Computing
SN - 978-989-758-701-6
AU - Hoffmann R.
AU - Reich C.
PY - 2024
SP - 295
EP - 302
DO - 10.5220/0012727600003711
PB - SciTePress