Detecting Outliers in CI/CD Pipeline Logs Using Latent Dirichlet Allocation

Daniel Atzberger, Tim Cech, Willy Scheibel, Rico Richter, Jürgen Döllner

2023

Abstract

Continuous Integration and Continuous Delivery are best practices used in the context of DevOps. By using automated pipelines for building and testing small software changes, possible risks are intended to be detected early. Those pipelines continuously generate log events that are collected in semi-structured log files. In practice, these log files can amass 100 000 events and more. However, the relevant sections in these log files must be manually tagged by the user. This paper presents an online learning approach for detecting relevant log events using Latent Dirichlet Allocation. After grouping a fixed number of log events in a document, our approach prunes the vocabulary to eliminate words without semantic meaning. A sequence of documents is then described as a discrete sequence by applying Latent Dirichlet Allocation, which allows the detection of outliers within the sequence. By integrating the latent variables of the model, our approach provides an explanation of its prediction. Our experiments show that our approach is sensitive to the choice of its hyperparameters in terms of the number and choice of detected anomalies.

Download


Paper Citation


in Harvard Style

Atzberger D., Cech T., Scheibel W., Richter R. and Döllner J. (2023). Detecting Outliers in CI/CD Pipeline Logs Using Latent Dirichlet Allocation. In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE, ISBN 978-989-758-647-7, SciTePress, pages 461-468. DOI: 10.5220/0011858500003464


in Bibtex Style

@conference{enase23,
author={Daniel Atzberger and Tim Cech and Willy Scheibel and Rico Richter and Jürgen Döllner},
title={Detecting Outliers in CI/CD Pipeline Logs Using Latent Dirichlet Allocation},
booktitle={Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,},
year={2023},
pages={461-468},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011858500003464},
isbn={978-989-758-647-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,
TI - Detecting Outliers in CI/CD Pipeline Logs Using Latent Dirichlet Allocation
SN - 978-989-758-647-7
AU - Atzberger D.
AU - Cech T.
AU - Scheibel W.
AU - Richter R.
AU - Döllner J.
PY - 2023
SP - 461
EP - 468
DO - 10.5220/0011858500003464
PB - SciTePress