On the Extension of k-Means for Overlapping Clustering - Average or Sum of Clusters’ Representatives?

Chiheb-Eddine Ben N'Cir, Nadia Essoussi

2013

Abstract

Clustering is an unsupervised learning technique which aims to fit structures for unlabeled data sets. Identifying non disjoint groups is an important issue in clustering. This issue arises naturally because many real life applications need to assign each observation to one or several clusters. To deal with this problem, recent proposed methods are based on theoretical, rather than heuristic, model and introduce overlaps in their optimized criteria. In order to model overlaps between clusters, some of these methods use the average of clusters’ prototypes while other methods are based on the sum of clusters’ prototypes. The use of SUM or AVERAGE can have significant impact on the theoretical validity of the method and affects induced patterns. Therefore, we study in this paper patterns induced by these approaches through the comparison of patterns induced by Overlapping k-means (OKM) and Alternating Least Square (ALS) methods which generalize k-means for overlapping clustering and are based on AVERAGE and SUM approaches respectively.

References

  1. Banerjee, A., Krumpelman, C., Basu, S., Mooney, R. J., and Ghosh, J. (2005). Model based overlapping clustering. In International Conference on Knowledge Discovery and Data Mining, pages 532-537, Chicago, USA. SciTePress.
  2. Cleuziou, G. (2008). An extended version of the k-means method for overlapping clustering. In International Conference on Pattern Recognition ICPR, pages 1-4, Florida, USA. IEEE.
  3. Cleuziou, G. (2009). Two variants of the okm for overlapping clustering. In Advances in Knowledge Discovery and Management, pages 149-166.
  4. Deodhar, M. and Ghosh, J. (2006). Consensus clustering for detection of overlapping clusters in microarray data.workshop on data mining in bioinformatics. In International Conference on data mining, pages 104- 108, Los Alamitos, CA, USA. IEEE Computer Society.
  5. Depril, D., Mechelen, I. V., and Wilderjans, T. F. (2012). Lowdimensional additive overlapping clustering. Journal of Classification, 29(3):297-320.
  6. Depril, D., Van Mechelen, I., and Mirkin, B. (2008). Algorithms for additive clustering of rectangular data tables. Computational Statistics and Data Analysis, 52(11):4923-4938.
  7. Diday, E. (1984). Orders and overlapping clusters by pyramids. Technical Report 730, INRIA, France.
  8. Fellows, M. R., Guo, J., Komusiewicz, C., Niedermeier, R., and Uhlmann, J. (2011). Graph-based data clustering with overlaps. Discrete Optimization, 8(1):2-17.
  9. Lingras, P. and West, C. (2004). Interval set clustering of web users with rough k-means. J. Intell. Inf. Syst., 23(1):5-16.
  10. Lu, H., Hong, Y., Street, W., Wang, F., and Tong, H. (2012). Overlapping clustering with sparseness constraints. In IEEE 12th International Conference on Data Mining Workshops (ICDMW), pages 486-494.
  11. Masson, M.-H. and Denux, T. (2008). Ecm: An evidential version of the fuzzy c-means algorithm. Pattern Recognition, 41(4):1384 - 1397.
  12. Mirkin, B. G. (1987a). Additive clustering and qualitative factor analysis methods for similarity matrices. Journal of Classification, 4(1):7-31.
  13. Mirkin, B. G. (1987b). Method of principal cluster analysis. Automation and Remote Control, 48:1379-1386.
  14. Mirkin, B. G. (1990). A sequential fitting procedure for linear data analysis models. Journal of Classification, 7(2):167-195.
  15. N'cir, C.-E. B., Essoussi, N., and Bertrand, P. (2010). Kernel overlapping k-means for clustering in feature space. In KDIR, pages 250-255.
  16. Snoek, C. G. M., Worring, M., van Gemert, J. C., Geusebroek, J.-M., and Smeulders, A. W. M. (2006). The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th annual ACM international conference on Multimedia, MULTIMEDIA 7806, pages 421-430, New York, USA. ACM.
  17. Tang, L. and Liu, H. (2009). Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1107-1116.
  18. Wang, X., Tang, L., Gao, H., and Liu, H. (2010). Discovering overlapping groups in social media. In Proceedings of the 2010 IEEE International Conference on Data Mining, pages 569-578.
  19. Wieczorkowska, A., Synak, P., and Ras, Z. (2006). Multilabel classification of emotions in music. In Intelligent Information Processing and Web Mining, volume 35 of Advances in Soft Computing, pages 307-315.
  20. Wilderjans, T. F., Depril, D., and Mechelen, I. V. (2012). Additive biclustering: A comparison of one new and two existing als algorithms. Journal of Classification, 30(1):56-74.
  21. Zhang, S., Wang, R.-S., and Zhang, X.-S. (2007). Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Physica A: Statistical Mechanics and its Applications, 374(1):483-490.
Download


Paper Citation


in Harvard Style

Ben N'Cir C. and Essoussi N. (2013). On the Extension of k-Means for Overlapping Clustering - Average or Sum of Clusters’ Representatives? . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013) ISBN 978-989-8565-75-4, pages 208-213. DOI: 10.5220/0004626502080213


in Bibtex Style

@conference{kdir13,
author={Chiheb-Eddine Ben N'Cir and Nadia Essoussi},
title={On the Extension of k-Means for Overlapping Clustering - Average or Sum of Clusters’ Representatives?},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)},
year={2013},
pages={208-213},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004626502080213},
isbn={978-989-8565-75-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: KDIR, (IC3K 2013)
TI - On the Extension of k-Means for Overlapping Clustering - Average or Sum of Clusters’ Representatives?
SN - 978-989-8565-75-4
AU - Ben N'Cir C.
AU - Essoussi N.
PY - 2013
SP - 208
EP - 213
DO - 10.5220/0004626502080213