Evidence Accumulation Clustering using Pairwise Constraints

João M. M. Duarte, Ana L. N. Fred, F. Jorge F. Duarte

Abstract

Recent work on constrained data clustering have shown that the incorporation of pairwise constraints, such as must-link and cannot-link constraints, increases the accuracy of single run data clustering methods. It was also shown that the quality of a consensus partition, resulting from the combination of multiple data partitions, is usually superior than the quality of the partitions produced by single run clustering algorithms. In this paper we test the effectiveness of adding pairwise constraints to the Evidence Accumulation Clustering framework. For this purpose, a new soft-constrained hierarchical clustering algorithm is proposed and is used for the extraction of the consensus partition from the co-association matrix. It is also studied whether there are advantages in selecting the must-link and cannot-link constraints on certain subsets of the data instead of selecting these constraints at random on the entire data set. Experimental results on 7 synthetic and 7 real data sets have shown the use of soft constraints improves the performance of the Evidence Accumulation Clustering.

References

  1. Basu, S. (2005). Semi-supervised clustering: probabilistic models, algorithms and experiments. PhD thesis, Austin, TX, USA. Supervisor-Mooney, Raymond J.
  2. Basu, S., Davidson, I., and Wagstaff, K. (2008). Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC.
  3. Davidson, I. and Ravi, S. (2005). Clustering with constraints feasibility issues and the k-means algorithm. In 2005 SIAM International Conference on Data Mining (SDM'05), pages 138-149, Newport Beach,CA.
  4. Domeniconi, C. and Al-Razgan, M. (2009). Weighted cluster ensembles: Methods and analysis. ACM Trans. Knowl. Discov. Data, 2:17:1-17:40.
  5. Duarte, J. M. M., Fred, A. L. N., and Duarte, F. J. F. (2009). Combining data clusterings with instance level constraints. In Fred, A. L. N., editor, Proceedings of the 9th International Workshop on Pattern Recognition in Information Systems, pages 49-60. INSTICC PRESS.
  6. Dudoit, S. and Fridlyand, J. (2003). Bagging to Improve the Accuracy of a Clustering Procedure. Bioinformatics, 19(9):1090-1099.
  7. Fern, X. Z. and Brodley, C. E. (2003). Random projection for high dimensional data clustering: A cluster ensemble approach. pages 186-193.
  8. Fern, X. Z. and Brodley, C. E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the twenty-first international conference on Machine learning, ICML 7804, pages 36-, New York, NY, USA. ACM.
  9. Fred, A. and Jain, A. (2005). Combining multiple clustering using evidence accumulation. IEEE Trans Pattern Analysis and Machine Intelligence, 27(6):835-850.
  10. Fred, A. L. N. (2001). Finding consistent clusters in data partitions. In Proceedings of the Second International Workshop on Multiple Classifier Systems, MCS 7801, pages 309-318, London, UK. Springer-Verlag.
  11. Ge, R., Ester, M., Jin, W., and Davidson, I. (2007). Constraint-driven clustering. In KDD 7807: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 320-329, New York, NY, USA. ACM.
  12. Klein, D., Kamvar, S. D., and Manning, C. D. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In ICML 7802: Proceedings of the Nineteenth International Conference on Machine Learning, pages 307-314, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  13. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Cam, L. M. L. and Neyman, J., editors, Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281-297. University of California Press.
  14. Sneath, P. and Sokal, R. (1973). Numerical taxonomy. Freeman, London, UK.
  15. Sokal, R. R. and Michener, C. D. (1958). A statistical method for evaluating systematic relationships. University of Kansas Scientific Bulletin, 28:1409-1438.
  16. Strehl, A. and Ghosh, J. (2003). Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res., 3:583-617.
  17. Topchy, A., Jain, A. K., and Punch, W. (2003). Combining multiple weak clusterings. pages 331-338.
  18. Topchy, A., Minaei-Bidgoli, B., Jain, A. K., and Punch, W. F. (2004). Adaptive clustering ensembles. In ICPR 7804: Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1, pages 272-275, Washington, DC, USA. IEEE Computer Society.
  19. Tung, A. K. H., Hou, J., and Han, J. (2000). Coe: Clustering with obstacles entities. a preliminary study. In PADKK 7800: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications, pages 165-168, London, UK. Springer-Verlag.
  20. Wagstaff, K. L. (2002). Intelligent clustering with instancelevel constraints. PhD thesis, Ithaca, NY, USA. ChairClaire Cardie.
  21. Wang, X. and Davidson, I. (2010). Flexible constrained spectral clustering. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 7810, pages 563-572, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

M. M. Duarte J., L. N. Fred A. and F. Duarte F. (2012). Evidence Accumulation Clustering using Pairwise Constraints . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012) ISBN 978-989-8565-29-7, pages 293-299. DOI: 10.5220/0004171902930299


in Bibtex Style

@conference{kdir12,
author={João M. M. Duarte and Ana L. N. Fred and F. Jorge F. Duarte},
title={Evidence Accumulation Clustering using Pairwise Constraints},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)},
year={2012},
pages={293-299},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004171902930299},
isbn={978-989-8565-29-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)
TI - Evidence Accumulation Clustering using Pairwise Constraints
SN - 978-989-8565-29-7
AU - M. M. Duarte J.
AU - L. N. Fred A.
AU - F. Duarte F.
PY - 2012
SP - 293
EP - 299
DO - 10.5220/0004171902930299