The Effect of Noise and Outliers on Fuzzy Clustering of High Dimensional Data

Ludmila Himmelspach, Stefan Conrad

2016

Abstract

Clustering high dimensional data is still a challenging problem for fuzzy clustering algorithms because distances between each pair of data items get similar with the increasing number of dimensions. The presence of noise and outliers in data is an additional problem for clustering algorithms because they might affect the computation of cluster centers. In this work, we analyze the effect of different kinds of noise and outliers on fuzzy clustering algorithms that can handle high dimensional data: FCM with attribute weighting, the multivariate fuzzy c-means (MFCM), and the possibilistic multivariate fuzzy c-means (PMFCM). Additionally, we propose a new version of PMFCM to enhance its ability handling noise and outliers in high dimensional data. The experimental results on different high dimensional data sets show that the possibilistic versions of MFCM produce accurate cluster centers independently of the kind of noise and outliers.

References

  1. Beyer, K. S., Goldstein, J., Ramakrishnan, R., and Shaft, U. (1999). When is ”nearest neighbor” meaningful? In Proceedings of the 7th International Conference on Database Theory, ICDT 7899, pages 217-235, London, UK, UK. Springer-Verlag.
  2. Bezdek, J. C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell, MA, USA.
  3. Dave, R. N. and Krishnapuram, R. (1997). Robust clustering methods: A unified view. IEEE Transactions on Fuzzy Systems, 5(2):270-293.
  4. Dunn, J. C. (1973). A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cybernetics, 3(3):32-57.
  5. Himmelspach, L. and Conrad, S. (2016). A possibilistic multivariate fuzzy c-means clustering algorithm. In Proceedings of the 10th International Conference on Scalable Uncertainty Management, SUM 2016, pages 338-344.
  6. Keller, A. and Klawonn, F. (2000). Fuzzy clustering with weighting of data variables. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 8(6):735-746.
  7. Klawonn, F. (2013). What can fuzzy cluster analysis contribute to clustering of high-dimensional data? In Proceedings of the 10th International Workshop on Fuzzy Logic and Applications, WILF2013, Genoa, Italy, November 19-22, 2013. , pages 1-14.
  8. Klawonn, F., Kruse, R., and Winkler, R. (2015). Fuzzy clustering: More than just fuzzification. Fuzzy Sets and Systems, 281:272-279.
  9. Kriegel, H., Kr öger, P., Schubert, E., and Zimek, A. (2009). Outlier detection in axis-parallel subspaces of high dimensional data. In Proceedings of the 13th PacificAsia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2009, pages 831-838.
  10. Krishnapuram, R. and Keller, J. M. (1993). A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems, 1(2):98-110.
  11. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pages 281-297, Berkeley. University of California Press.
  12. Pal, N. R., Pal, K., Keller, J. M., and Bezdek, J. C. (2005). A possibilistic fuzzy c-means clustering algorithm. IEEE Transactions on Fuzzy Systems, 13(4):517-530.
  13. Pimentel, B. A. and de Souza, R. M. C. R. (2013). A multivariate fuzzy c-means method. Applied Soft Computing, 13(4):1592-1607.
  14. Rehm, F., Klawonn, F., and Kruse, R. (2007). A novel approach to noise clustering for outlier detection. Soft Computing, 11(5):489-494.
  15. Winkler, R., Klawonn, F., and Kruse, R. (2011). Fuzzy cmeans in high dimensional spaces. International Journal of Fuzzy System Applications, 1(1):1-16.
Download


Paper Citation


in Harvard Style

Himmelspach L. and Conrad S. (2016). The Effect of Noise and Outliers on Fuzzy Clustering of High Dimensional Data . In Proceedings of the 8th International Joint Conference on Computational Intelligence - Volume 2: FCTA, (IJCCI 2016) ISBN 978-989-758-201-1, pages 101-108. DOI: 10.5220/0006070601010108


in Bibtex Style

@conference{fcta16,
author={Ludmila Himmelspach and Stefan Conrad},
title={The Effect of Noise and Outliers on Fuzzy Clustering of High Dimensional Data},
booktitle={Proceedings of the 8th International Joint Conference on Computational Intelligence - Volume 2: FCTA, (IJCCI 2016)},
year={2016},
pages={101-108},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006070601010108},
isbn={978-989-758-201-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Computational Intelligence - Volume 2: FCTA, (IJCCI 2016)
TI - The Effect of Noise and Outliers on Fuzzy Clustering of High Dimensional Data
SN - 978-989-758-201-1
AU - Himmelspach L.
AU - Conrad S.
PY - 2016
SP - 101
EP - 108
DO - 10.5220/0006070601010108