fication to the average-link clustering algorithm. The
soft-constrained average-link algorithm was applied
in the EAC framework to produce the consensus par-
tition using the co-association matrix as input and out-
performed the hard-constrained clustering algorithms
used for comparison.
The experimental results have shown that con-
strained clustering algorithms usually produce better
consensus partitions than the traditional clustering al-
gorithms, and that acquiring constraints from a subset
of data containing the patterns with the lowest degree
of confidence improves clustering quality.
Future work include the development of an “intel-
ligent” algorithm for acquiring clustering constraints
using the insights gained in this paper, the study of the
effect of the softness parameter, and the establishment
of criteria for its selection.
ACKNOWLEDGEMENTS
This work is supported by FEDER Funds through
the “Programa Operacional Factores de Competitivi-
dade - COMPETE” program and by National Funds
through FCT under the projects FCOMP-01-0124-
FEDER-PEst-OE/EEI/UI0760/2011 and PTDC/EIA -
CCO/103230/2008 and grant SFRH/BD/43785/2008.
REFERENCES
Basu, S. (2005). Semi-supervised clustering: probabilis-
tic models, algorithms and experiments. PhD thesis,
Austin, TX, USA. Supervisor-Mooney, Raymond J.
Basu, S., Davidson, I., and Wagstaff, K. (2008). Con-
strained Clustering: Advances in Algorithms, Theory,
and Applications. Chapman & Hall/CRC.
Davidson, I. and Ravi, S. (2005). Clustering with con-
straints feasibility issues and the k-means algorithm.
In 2005 SIAM International Conference on Data Min-
ing (SDM’05), pages 138–149, Newport Beach,CA.
Domeniconi, C. and Al-Razgan, M. (2009). Weighted clus-
ter ensembles: Methods and analysis. ACM Trans.
Knowl. Discov. Data, 2:17:1–17:40.
Duarte, J. M. M., Fred, A. L. N., and Duarte, F. J. F. (2009).
Combining data clusterings with instance level con-
straints. In Fred, A. L. N., editor, Proceedings of the
9th International Workshop on Pattern Recognition in
Information Systems, pages 49–60. INSTICC PRESS.
Dudoit, S. and Fridlyand, J. (2003). Bagging to Improve the
Accuracy of a Clustering Procedure. Bioinformatics,
19(9):1090–1099.
Fern, X. Z. and Brodley, C. E. (2003). Random projection
for high dimensional data clustering: A cluster ensem-
ble approach. pages 186–193.
Fern, X. Z. and Brodley, C. E. (2004). Solving cluster en-
semble problems by bipartite graph partitioning. In
Proceedings of the twenty-first international confer-
ence on Machine learning, ICML ’04, pages 36–, New
York, NY, USA. ACM.
Fred, A. and Jain, A. (2005). Combining multiple cluster-
ing using evidence accumulation. IEEE Trans Pattern
Analysis and Machine Intelligence, 27(6):835–850.
Fred, A. L. N. (2001). Finding consistent clusters in data
partitions. In Proceedings of the Second International
Workshop on Multiple Classifier Systems, MCS ’01,
pages 309–318, London, UK. Springer-Verlag.
Ge, R., Ester, M., Jin, W., and Davidson, I. (2007).
Constraint-driven clustering. In KDD ’07: Proceed-
ings of the 13th ACM SIGKDD international confer-
ence on Knowledge discovery and data mining, pages
320–329, New York, NY, USA. ACM.
Klein, D., Kamvar, S. D., and Manning, C. D. (2002). From
instance-level constraints to space-level constraints:
Making the most of prior knowledge in data cluster-
ing. In ICML ’02: Proceedings of the Nineteenth In-
ternational Conference on Machine Learning, pages
307–314, San Francisco, CA, USA. Morgan Kauf-
mann Publishers Inc.
MacQueen, J. B. (1967). Some methods for classification
and analysis of multivariate observations. In Cam, L.
M. L. and Neyman, J., editors, Proc. of the fifth Berke-
ley Symposium on Mathematical Statistics and Prob-
ability, volume 1, pages 281–297. University of Cali-
fornia Press.
Sneath, P. and Sokal, R. (1973). Numerical taxonomy. Free-
man, London, UK.
Sokal, R. R. and Michener, C. D. (1958). A statistical
method for evaluating systematic relationships. Uni-
versity of Kansas Scientific Bulletin, 28:1409–1438.
Strehl, A. and Ghosh, J. (2003). Cluster ensembles — a
knowledge reuse framework for combining multiple
partitions. J. Mach. Learn. Res., 3:583–617.
Topchy, A., Jain, A. K., and Punch, W. (2003). Combining
multiple weak clusterings. pages 331–338.
Topchy, A., Minaei-Bidgoli, B., Jain, A. K., and Punch,
W. F. (2004). Adaptive clustering ensembles. In ICPR
’04: Proceedings of the Pattern Recognition, 17th In-
ternational Conference on (ICPR’04) Volume 1, pages
272–275, Washington, DC, USA. IEEE Computer So-
ciety.
Tung, A. K. H., Hou, J., and Han, J. (2000). Coe: Clus-
tering with obstacles entities. a preliminary study. In
PADKK ’00: Proceedings of the 4th Pacific-Asia Con-
ference on Knowledge Discovery and Data Mining,
Current Issues and New Applications, pages 165–168,
London, UK. Springer-Verlag.
Wagstaff, K. L. (2002). Intelligent clustering with instance-
level constraints. PhD thesis, Ithaca, NY, USA. Chair-
Claire Cardie.
Wang, X. and Davidson, I. (2010). Flexible constrained
spectral clustering. In Proceedings of the 16th ACM
SIGKDD international conference on Knowledge dis-
covery and data mining, KDD ’10, pages 563–572,
New York, NY, USA. ACM.
EvidenceAccumulationClusteringusingPairwiseConstraints
299