FINDING DISTANCE-BASED OUTLIERS IN SUBSPACES THROUGH BOTH POSITIVE AND NEGATIVE EXAMPLES

Fabio Fassetti, Fabrizio Angiulli

2010

Abstract

In this work an example-based outlier detection method exploiting both positive (that is, outlier) and negative (that is, inlier) examples in order to guide the search for anomalies in an unlabelled data set, is introduced. The key idea of the method is to find the subspace where positive examples mostly exhibit their outlierness while at the same time negative examples mostly exhibit their inlierness. The degree to which an example is an outlier is measured by means of well-known unsupervised outlier scores evaluated on the collection of unlabelled data. A subspace discovery algorithm is designed, which searches for the most discriminating subspace. Experimental results show that the method is able to detect a near optimal solution, and that the method is promising from the point of view of the knowledge mined.

References

  1. Aggarwal, C. C. and Yu, P. (2001). Outlier detection for high dimensional data. In Proc. Int. Conference on Managment of Data.
  2. Angiulli, F. and Pizzuti, C. (2002). Fast outlier detection in large high-dimensional data sets. In Proc. Int. Conf. on Principles of Data Mining and Knowledge Discovery, pages 15-26.
  3. Holland, J. (1992). Adaptation in Natural and Artificial Systems. The MIT Press, Cambridge, MA.
  4. Holland, J., Holyoak, K., Nisbett, R., and Thagard, P. (1986). Computational Models of Cognition and Perception, chapter Induction: Processes of Inference, Learning, and Discovery. The MIT Press, Cambridge, MA.
  5. Papadimitriou, S., Kitagawa, H., Gibbons, P. B., and Faloutsos, C. (2003). Loci: Fast outlier detection using the local correlation integral. In ICDE, pages 315-326.
  6. Wei, L., Qian, W., Zhou, A., Jin, W., and Yu, J. (2003). Hot: Hypergraph-based outlier test for categorical data. In Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pages 399-410.
  7. Zhang, J. and Wang, H. (2006). Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance. Knowledge and Information Systems, to appear.
  8. Zhu, C., Kitagawa, H., and Faloutsos, C. (2005). Examplebased robust outlier detection in high dimensional datasets. In Proc. Fifth IEEE International Conference on Data Mining, pages 829-832.
Download


Paper Citation


in Harvard Style

Fassetti F. and Angiulli F. (2010). FINDING DISTANCE-BASED OUTLIERS IN SUBSPACES THROUGH BOTH POSITIVE AND NEGATIVE EXAMPLES . In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-674-021-4, pages 5-10. DOI: 10.5220/0002699600050010


in Bibtex Style

@conference{icaart10,
author={Fabio Fassetti and Fabrizio Angiulli},
title={FINDING DISTANCE-BASED OUTLIERS IN SUBSPACES THROUGH BOTH POSITIVE AND NEGATIVE EXAMPLES},
booktitle={Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2010},
pages={5-10},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002699600050010},
isbn={978-989-674-021-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - FINDING DISTANCE-BASED OUTLIERS IN SUBSPACES THROUGH BOTH POSITIVE AND NEGATIVE EXAMPLES
SN - 978-989-674-021-4
AU - Fassetti F.
AU - Angiulli F.
PY - 2010
SP - 5
EP - 10
DO - 10.5220/0002699600050010