Combining Data Clusterings with Instance Level Constraints

João M. M. Duarte, Ana L. N. Fred, F. Jorge Duarte

Abstract

Recent work has focused the incorporation of a priori knowledge into the data clustering process, in the form of pairwise constraints, aiming to improve clustering quality and find appropriate clustering solutions to specific tasks or interests. In this work, we integrate must-link and cannot-link constraints into the cluster ensemble framework. Two algorithms for combining multiple data partitions with instance level constraints are proposed. The first one consists of a modification to Evidence Accumulation Clustering and the second one maximizes both the similarity between the cluster ensemble and the target consensus partition, and constraint satisfaction using a genetic algorithm. Experimental results shown that the proposed constrained clustering combination methods performances are superior to the unconstrained Evidence Accumulation Clustering.

References

  1. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31 (1999) 264-323
  2. Fred, A.L.N.: Finding consistent clusters in data partitions. In: MCS 7801: Proceedings of the Second International Workshop on Multiple Classifier Systems, London, UK, SpringerVerlag (2001) 309-318
  3. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3 (2003) 583-617
  4. Fred, A.L.N., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 835-850
  5. Duarte, F.J., Fred, A.L.N., Rodrigues, M.F.C., Duarte, J.: Weighted evidence accumulation clustering using subsampling. In: Sixth International Workshop on Pattern Recognition in Information Systems. (2006)
  6. Fern, X., Brodley, C.: Solving cluster ensemble problems by bipartite graph partitioning. In: ICML 7804: Proceedings of the twenty-first international conference on Machine learning, New York, NY, USA, ACM (2004) 36
  7. Topchy, A.P., Jain, A.K., Punch, W.F.: A mixture model for clustering ensembles. In Berry, M.W., Dayal, U., Kamath, C., Skillicorn, D.B., eds.: SDM, SIAM (2004)
  8. Jouve, P., Nicoloyannis, N.: A new method for combining partitions, applications for distributed clustering. In: International Workshop on Paralell and Distributed Machine Learning and Data Mining (ECML/PKDD03). (2003) 35-46
  9. Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC (2008)
  10. Tung, A.K.H., Hou, J., Han, J.: Coe: Clustering with obstacles entities. a preliminary study. In: PADKK 7800: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications, London, UK, Springer-Verlag (2000) 165-168
  11. Wagstaff, K.L.: Intelligent clustering with instance-level constraints. PhD thesis, Ithaca, NY, USA (2002) Chair-Claire Cardie.
  12. Ge, R., Ester, M., Jin, W., Davidson, I.: Constraint-driven clustering. In: KDD 7807: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, ACM (2007) 320-329
  13. Basu, S.: Semi-supervised clustering: probabilistic models, algorithms and experiments. PhD thesis, Austin, TX, USA (2005) Supervisor-Mooney, Raymond J.
  14. Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: ICML 7802: Proceedings of the Nineteenth International Conference on Machine Learning, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. (2002) 307-314
  15. Davidson, I., Ravi, S.: Clustering with constraints feasibility issues and the k-means algorithm. In: 2005 SIAM International Conference on Data Mining (SDM'05), Newport Beach,CA (2005) 138-149
  16. Duarte, F.J.: Optimizac¸ a˜o da Combinac¸a˜o de Agrupamentos Baseado na Acumulac¸a˜o de Provas Pesadas por índices de Validac¸a˜o e com Uso de Amostragem. PhD thesis, Universidade de Trás-os-Montes e Alto Douro (2008)
Download


Paper Citation


in Harvard Style

Duarte J., Fred A. and Duarte F. (2009). Combining Data Clusterings with Instance Level Constraints . In Proceedings of the 9th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2009) ISBN 978-989-8111-89-0, pages 49-60. DOI: 10.5220/0002260300490060


in Bibtex Style

@conference{pris09,
author={João M. M. Duarte and Ana L. N. Fred and F. Jorge Duarte},
title={Combining Data Clusterings with Instance Level Constraints},
booktitle={Proceedings of the 9th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2009)},
year={2009},
pages={49-60},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002260300490060},
isbn={978-989-8111-89-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2009)
TI - Combining Data Clusterings with Instance Level Constraints
SN - 978-989-8111-89-0
AU - Duarte J.
AU - Fred A.
AU - Duarte F.
PY - 2009
SP - 49
EP - 60
DO - 10.5220/0002260300490060