Machine Learning Models with Fault Tree Analysis for Explainable Failure Detection in Cloud Computing
Rudolf Hoffmann, Christoph Reich
2024
Abstract
Cloud computing infrastructures availability rely on many components, like software, hardware, cloud management system (CMS), security, environmental, and human operation, etc. If something goes wrong the root cause analysis (RCA) is often complex. This paper explores the integration of Machine Learning (ML) with Fault Tree Analysis (FTA) to enhance explainable failure detection in cloud computing systems. We introduce a framework employing ML for FT selection and generation, and for predicting Basic Events (BEs) to enhance the explainability of failure analysis. Our experimental validation focuses on predicting BEs and using these predictions to calculate the Top Event (TE) probability. The results demonstrate improved diagnostic accuracy and reliability, highlighting the potential of combining ML predictions with traditional FTA to identify root causes of failures in cloud computing environments and make the failure diagnostic more explainable.
DownloadPaper Citation
in Harvard Style
Hoffmann R. and Reich C. (2024). Machine Learning Models with Fault Tree Analysis for Explainable Failure Detection in Cloud Computing. In Proceedings of the 14th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER; ISBN 978-989-758-701-6, SciTePress, pages 295-302. DOI: 10.5220/0012727600003711
in Bibtex Style
@conference{closer24,
author={Rudolf Hoffmann and Christoph Reich},
title={Machine Learning Models with Fault Tree Analysis for Explainable Failure Detection in Cloud Computing},
booktitle={Proceedings of the 14th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER},
year={2024},
pages={295-302},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012727600003711},
isbn={978-989-758-701-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER
TI - Machine Learning Models with Fault Tree Analysis for Explainable Failure Detection in Cloud Computing
SN - 978-989-758-701-6
AU - Hoffmann R.
AU - Reich C.
PY - 2024
SP - 295
EP - 302
DO - 10.5220/0012727600003711
PB - SciTePress