Overlapping Clustering with Outliers Detection

Amira Rezgui, Chiheb-Eddine Ben N'Cir, Nadia Essoussi

Abstract

Detecting overlapping groups is an important challenge in clustering offering relevant solutions for many applications domains. Recently, Parametrized R-OKMmethod was defined as an extension of OKMto control overlapping boundaries between clusters. However, the performance of both, OKMand Parametrized R-OKM is considerably reduced when data contain outliers. The presence of outliers affects the resulting clusters and yields to clusters which do not fit the true structure of data. In order to improve the existing methods, we propose a robust method able to detect relevant overlapping clusters with outliers identification. Experiments performed on artificial and real multi-labeled data sets showed the effectiveness of the proposed method to produce relevant non disjoint groups.

References

  1. Battle, A., Segal, E., and Koller, D. (2005). Probabilistic discovery of overlapping cellular processes and their regulation. Journal of computational biology : a journal of computational molecular cell biology, 12(7):909-927.
  2. Ben N'Cir, C., Cleuziou, G., and Essoussi, N. (2013). Identification of non-disjoint clusters with small and parameterizable overlaps. In Computer Applications Technology (ICCAT), 2013 International Conference on, pages 1-6.
  3. Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algoritms. Plenum Press, 4(2):67-76.
  4. Cleuziou, G. (2008). An extended version of the k-means method for overlapping clustering. In International Conference on Pattern Recognition ICPR, pages 1-4, Florida, USA. IEEE.
  5. Cleuziou, G., Martin, L., Vrain, C., and Vrain, C. (2004). Poboc: an overlapping clustering algorithm. application to rule-based classification and textual data. In Proceedings of the 16th European Conference on Artificial Intelligence (ECAI-04), pages 440-444.
  6. Davè, R. N. (1991). Characterization and detection of noise in clustering. Pattern Recognition Letters, 12(11):657-664.
  7. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1-38.
  8. Eran, S., Alexis, B., and Daphne, K. (2003). Decomposing gene expression into cellular processes. In Pacific Symposium on Biocomputing'03, pages 89-100.
  9. Krishnapuram, R. and Keller, J. M. (1993). A possibilistic approach to clustering. Trans. Fuz Sys., 1(2):98-110.
  10. Pantel, P. and Dekang, L. (2002). Discovering word senses from text. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 613-619.
  11. Sahami, M., Hearst, M. A., and Saund, E. (1996). Applying the multiple cause mixture model to text categorization. In Saitta, L., editor, Machine Learning, Proceedings of the Thirteenth International Conference (ICML 7896), pages 435-443.
  12. Trohidis, K., Tsoumakas, G., Kalliris, G., and Vlahavas, I. P. (2008). Multi-label classification of music into emotions. In Bello, J. P., Chew, E., and Turnbull, D., editors, ISMIR, pages 325-330.
  13. Wang, X., Tang, L., Gao, H., and Liu, H. (2010). Discovering overlapping groups in social media. In Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 7810, pages 569-578, Washington, DC, USA. IEEE Computer Society.
  14. Yang, J., Yan, R., and Hauptmann, A. G. (2007). Crossdomain video concept detection using adaptive svms. In Proceedings of the 15th international conference on Multimedia, MULTIMEDIA 7807, pages 188-197, New York, NY, USA.
Download


Paper Citation


in Harvard Style

Rezgui A., Ben N'Cir C. and Essoussi N. (2014). Overlapping Clustering with Outliers Detection . In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-018-5, pages 279-286. DOI: 10.5220/0004830002790286


in Bibtex Style

@conference{icpram14,
author={Amira Rezgui and Chiheb-Eddine Ben N'Cir and Nadia Essoussi},
title={Overlapping Clustering with Outliers Detection},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2014},
pages={279-286},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004830002790286},
isbn={978-989-758-018-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Overlapping Clustering with Outliers Detection
SN - 978-989-758-018-5
AU - Rezgui A.
AU - Ben N'Cir C.
AU - Essoussi N.
PY - 2014
SP - 279
EP - 286
DO - 10.5220/0004830002790286