and Toivonen, H., editors, Principles of Data Mining
and Knowledge Discovery, volume 2431 of Lecture
Notes in Computer Science, pages 43–78. Springer
Berlin / Heidelberg.
Barnett, V. and Lewis, T. (1994). Outliers in Statistical
Data. Wiley Series in Probability & Statistics. Wiley.
Basharat, A., Gritai, A., and Shah, M. (2008). Learning
object motion patterns for anomaly detection and im-
proved object detection. In Computer Vision and Pat-
tern Recognition. (CVPR 2008). IEEE Conference on,
pages 1–8. IEEE Computer Society Press.
Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J.
(2000). LOF: Identifying density-based local outliers.
In Proceedings of the 2000 ACM SIGMOD Interna-
tional Conference on Management of Data, pages 93–
104, Dallas, Texas, USA. ACM Press.
Chandola, V., Banerjee, A., and Kumar, V. (2009).
Anomaly detection: A survey. ACM Computing Sur-
veys, 41(3):1–58.
Gebhardt, J., Goldstein, M., Shafait, F., and Dengel, A.
(2013). Document authentication using printing tech-
nique features and unsupervised anomaly detection. In
Proceedings of the 12th International Conference on
Document Analysis and Recognition (ICDAR 2013),
pages 479–483. IEEE Computer Society Press.
Goldstein, M. (2014). Anomaly Detection in Large
Datasets. Phd-thesis, University of Kaiserslautern,
Germany.
Goldstein, M. and Dengel, A. (2012). Histogram-based out-
lier score (hbos): A fast unsupervised anomaly detec-
tion algorithm. In W
¨
olfl, S., editor, KI-2012: Poster
and Demo Track, pages 59–63. Online.
Grubbs, F. E. (1969). Procedures for Detecting Outlying
Observations in Samples. Technometrics, 11(1):1–21.
Guyon, I., Matic, N., and Vapnik, V. (1996). Discovering
informative patterns and data cleaning. Advances in
Knowledge Discovery and Data Mining, pages 181–
203.
Hawkins, S., He, H., Williams, G. J., and Baxter, R. A.
(2000). Outlier detection using replicator neural net-
works. In Proceedings of the 4th International Con-
ference on Data Warehousing and Knowledge Dis-
covery (DaWaK 2000), pages 170–180, London, UK.
Springer-Verlag.
He, Z., Xu, X., and Deng, S. (2003). Discovering cluster-
based local outliers. Pattern Recognition Letters,
24(9-10):1641–1650.
Jin, W., Tung, A., Han, J., and Wang, W. (2006). Ranking
outliers using symmetric neighborhood relationship.
In Ng, W.-K., Kitsuregawa, M., Li, J., and Chang, K.,
editors, Advances in Knowledge Discovery and Data
Mining, volume 3918 of Lecture Notes in Computer
Science, pages 577–593. Springer Berlin / Heidelberg.
Kriegel, H.-P., Kr
¨
oger, P., Schubert, E., and Zimek, A.
(2009). Loop: Local outlier probabilities. In Proceed-
ing of the 18th ACM Conference on Information and
Knowledge Management (CIKM ’09), pages 1649–
1652, New York, NY, USA. ACM Press.
Lin, J., Keogh, E., Fu, A., and Herle, H. V. (2005). Approx-
imations to magic: Finding unusual medical time se-
ries. In In 18th IEEE Symposium on Computer-Based
Medical Systems (CBMS), pages 23–24. IEEE Com-
puter Society Press.
Lindsay, B. (1995). Mixture Models: Theory, Geometry,
and Applications. NSF-CBMS Regional Conference
Series in Probability and Statistics. Institute of Math-
ematical Statistics, Penn. State University.
Mehrotra, K., Mohan, C. K., and Ranka, S. (1997). Ele-
ments of Artificial Neural Networks. MIT Press, Cam-
bridge, MA, USA.
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M.,
and Euler, T. (2006). Yale: Rapid prototyping for
complex data mining tasks. In Proceedings of the
ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining (KDD 2006), pages
935–940, New York, NY, USA. ACM Press.
Portnoy, L., Eskin, E., and Stolfo, S. (2001). Intrusion de-
tection with unlabeled data using clustering. In In Pro-
ceedings of ACM CSS Workshop on Data Mining Ap-
plied to Security (DMSA-2001), pages 5–8.
Ramaswamy, S., Rastogi, R., and Shim, K. (2000). Efficient
algorithms for mining outliers from large data sets. In
Proceedings of the 2000 ACM SIGMOD International
Conference on Management of Data (SIGMOD ’00),
pages 427–438, New York, NY, USA. ACM Press.
Sch
¨
olkopf, B. and Smola, A. J. (2002). Learning with Ker-
nels: Support Vector Machines, Regularization, Op-
timization, and Beyond. Adaptive Computation and
Machine Learning. MIT Press, Cambridge, MA.
Sch
¨
olkopf, B., Williamson, R. C., Smola, A. J., Shawe-
Taylor, J., and Platt, J. C. (1999). Support vector
method for novelty detection. In Advances in Neu-
ral Information Processing Systems 12 (NIPS), pages
582–588. The MIT Press.
Sharma, P. K., Haleem, H., and Ahmad, T. (2015). Improv-
ing classification by outlier detection and removal. In
Emerging ICT for Bridging the Future - Proceedings
of the 49th Annual Convention of the Computer So-
ciety of India CSI Volume 2, volume 338 of Advances
in Intelligent Systems and Computing, pages 621–628.
Springer International Publishing.
Smith, M. and Martinez, T. (2011). Improving classification
accuracy by identifying and removing instances that
should be misclassified. In Neural Networks (IJCNN),
The 2011 International Joint Conference on, pages
2690–2697.
Tang, J., Chen, Z., Fu, A., and Cheung, D. (2002). Enhanc-
ing effectiveness of outlier detections for low density
patterns. In Chen, M.-S., Yu, P., and Liu, B., editors,
Advances in Knowledge Discovery and Data Mining,
volume 2336 of Lecture Notes in Computer Science,
pages 535–548. Springer Berlin / Heidelberg.
Turlach, B. A. (1993). Bandwidth selection in kernel den-
sity estimation: A review.
A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection
269