Authors:
João M. M. Duarte
1
;
Ana L. N. Fred
2
and
F. Jorge F. Duarte
3
Affiliations:
1
Polytechnic of Porto (ISEP/IPP) and Instituto Superior Técnico, Portugal
;
2
Instituto Superior Técnico, Portugal
;
3
Polytechnic of Porto (ISEP/IPP), Portugal
Keyword(s):
Constrained Data Clustering, Clustering Combination, Unsupervised Learning.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Soft Computing
;
Structured Data Analysis and Statistical Methods
;
Symbolic Systems
Abstract:
Recent work on constrained data clustering have shown that the incorporation of pairwise constraints, such as must-link and cannot-link constraints, increases the accuracy of single run data clustering methods. It was also shown that the quality of a consensus partition, resulting from the combination of multiple data partitions, is usually superior than the quality of the partitions produced by single run clustering algorithms. In this paper we test the effectiveness of adding pairwise constraints to the Evidence Accumulation Clustering framework. For this purpose, a new soft-constrained hierarchical clustering algorithm is proposed and is used for the extraction of the consensus partition from the co-association matrix. It is also studied whether there are advantages in selecting the must-link and cannot-link constraints on certain subsets of the data instead of selecting these constraints at random on the entire data set. Experimental results on 7 synthetic and 7 real data sets ha
ve shown the use of soft constraints improves the performance of the Evidence Accumulation Clustering.
(More)