Authors:
Marcelo Bacher
;
Irad Ben-Gal
and
Erez Shmueli
Affiliation:
Tel-Aviv University, Israel
Keyword(s):
Subspace Analysis, Rokhlin, Ensemble, Anomaly Detection.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Business Analytics
;
Computational Intelligence
;
Data Analytics
;
Data Engineering
;
Data Reduction and Quality Assessment
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Pre-Processing and Post-Processing for Data Mining
;
Soft Computing
;
Symbolic Systems
Abstract:
Identifying anomalies in multi-dimensional datasets is an important task in many real-world applications. A
special case arises when anomalies are occluded in a small set of attributes (i.e., subspaces) of the data and
not necessarily over the entire data space. In this paper, we propose a new subspace analysis approach named
Agglomerative Attribute Grouping (AAG) that aims to address this challenge by searching for subspaces that
comprise highly correlative attributes. Such correlations among attributes represent a systematic interaction
among the attributes that can better reflect the behavior of normal observations and hence can be used to improve
the identification of future abnormal data samples. AAG relies on a novel multi-attribute metric derived
from information theory measures of partitions to evaluate the ”information distance” between groups of data
attributes. The empirical evaluation demonstrates that AAG outperforms state-of-the-art subspace analysis
methods, w
hen they are used in anomaly detection ensembles, both in cases where anomalies are occluded
in relatively small subsets of the available attributes and in cases where anomalies represent a new class (i.e.,
novelties). Finally, and in contrast to existing methods, AAG does not require any tuning of parameters.
(More)