Author:
Siavash Ghiasvand
Affiliation:
Technische Universität Dresden and Germany
Keyword(s):
Pattern Detection, Time Series Analysis, Anomaly Detection, Neural Networks, Noise Mitigation.
Related
Ontology
Subjects/Areas/Topics:
Feature Selection and Extraction
;
Pattern Recognition
;
Theory and Methods
Abstract:
Rapid growing complexity of HPC systems in response to demand for higher computing performance, results in higher probability of failures. Early detection of failures significantly reduces the damages caused by failure via impeding their propagation through system. Various anomaly detection mechanism are proposed to detect failures in their early stages. Insufficient amount of failure samples in addition to privacy concerns extremely limits the functionality of available anomaly detection approaches. Advances in machine learning techniques, significantly increased the accuracy of unsupervised anomaly detection methods, addressing the challenge of insufficient failure samples. However, available approaches are either domain specific, inaccurate, or require comprehensive knowledge about the underlying system. Furthermore, processing certain monitoring data such as system logs raises high privacy concerns. In addition, noises in monitoring data severely impact the correctness of data an
alysis. This work proposes an unsupervised and privacy-aware approach for detecting abnormal behaviors in general HPC systems. Preliminary results indicate high potentials of autoencoders for automatic detection of abnormal behaviors in HPC systems via analyzing anonymized system logs using fast-trainable noise-resistant models.
(More)