themselves remains opaque. The ”black box” nature
of DL models used for predicting these events limits
our ability to fully understand and interpret the occur-
rence of the BEs.
In future work, our research will explore the un-
validated approaches of using ML for selecting, gen-
erating, or both selecting and generating FTs. We
will investigate methodologies for employing ML al-
gorithms to automate the selection of appropriate
FTs based on observed symptoms or failure modes.
This will involve developing algorithms that navi-
gate through multiple failure scenarios to identify the
most suitable FTs for RCA. Furthermore we will in-
vestigate how ML can be utilized to automate the
genreration of FTs based on observational or histor-
ical data. This involves developing algorithms that
construct FTs that accurately represent the complex
failure mechanisms within cloud computing systems,
while also ensuring interpretability and relevance for
effective fault diagnosis.By pursuing these paths, we
aim to enhance fault diagnosis by fully leveraging the
integration of ML with FTs. Additionally, we will
explore the implementation of our approach in real-
world settings to evaluate its applicability and robust-
ness across various cloud computing environments.
Through these efforts, we try to unlock advanced ca-
pabilities for more precise analysis and understanding
of system failures.
6 CONCLUSION
Our investigation into integrating ML with FTA
presents a significant advancement in fault detection
methodologies for cloud computing systems. By con-
centrating on the prediction of BEs and the subse-
quent calculation of TE probability, we not only en-
hance the precision of fault diagnosis but also in-
crease the system’s interpretability and transparency.
Although our experimental validation focused on
this particular approach, we discussed the theoretical
framework and potential benefits of using ML for se-
lecting and generating FTs. Future work will explore
these unvalidated approaches to further refine and ex-
pand our understanding of integrating ML with FTA,
aiming to develop more robust and intuitive fault di-
agnosis tools for complex computing environments.
FUNDING
This research was funded by the Deutsche
Forschungsgemeinschaft (DFG, German Research
Foundation), under grant DFG -GZ: RE 2881/6-1
and the French Agence Nationale de la Recherche
(ANR), under grant ANR-22-CE92-0007.
REFERENCES
Backblaze (2023). Harddrive cleaned smart dataset. Ac-
cessed: 2024-02-15.
Fazlollahtabar, H. and Niaki, S. (2018). Fault tree analy-
sis for reliability evaluation of an advanced complex
manufacturing system. Journal of Advanced Manu-
facturing Systems, 17:107–118.
Hoffmann, R. and Reich, C. (2023). A systematic literature
review on artificial intelligence and explainable artifi-
cial intelligence for visual quality assurance in manu-
facturing. Electronics, 12(22).
Hoffmann, R., Reich, C., and Skerl, K. (2022). Eval-
uating different combination methods to analyse ul-
trasound and shear wave elastography images auto-
matically through discriminative convolutional neu-
ral network in breast cancer imaging. International
Journal of Computer Assisted Radiology and Surgery,
17(12):2231–2237.
Kaptein, M. and van den Heuvel, E. (2022). Probability
Theory, pages 81–102. Springer International Pub-
lishing, Cham.
Mani, D. and Mahendran, A. (2017). An approach to evalu-
ate the availability of system in cloud computing using
fault tree technique. International Journal of Intelli-
gent Engineering and Systems, 10:245–255.
Mesbahi, M. R., Rahmani, A. M., and Hosseinzadeh, M.
(2018). Reliability and high availability in cloud com-
puting environments: a reference roadmap. Human-
centric Computing and Information Sciences, 8(1):20.
Ng’ang’a, D. N., Cheruiyot, W., and Njagi, D. (2023). A
machine learning framework for predicting failures in
cloud data centers -a case of google cluster -azure
clouds and alibaba clouds. Accessed: 2024-02-17.
Nieuwhof, G. (1975). An introduction to fault tree analysis
with emphasis on failure rate evaluation. Microelec-
tronics Reliability, 14(2):105–119.
Vargas-Arcila, A. M., Corrales, J. C., Sanchis, A., and
Rend
´
on, A. (2021). Dataset of symptom-fault causal
relationships for an ip-based network. Accessed:
2024-02-15.
Williams, B. and Cremaschi, S. (2019). Surrogate model se-
lection for design space approximation and surrogate-
based optimization. In Mu
˜
noz, S. G., Laird, C. D., and
Realff, M. J., editors, Proceedings of the 9th Interna-
tional Conference on Foundations of Computer-Aided
Process Design, volume 47 of Computer Aided Chem-
ical Engineering, pages 353–358. Elsevier.
Xie, X., Wang, Y., Hu, K., and Du, J. (2021). Quantitative
analysis of fault diagnosis based on fault tree reason-
ing. In 2021 3rd International Conference on Applied
Machine Learning (ICAML), pages 7–10.
Yang, H. and Kim, Y. (2022). Design and implementation
of machine learning-based fault prediction system in
cloud infrastructure. Electronics, 11(22).
CLOSER 2024 - 14th International Conference on Cloud Computing and Services Science
302