ings, partially removing associations each time us-
ing a range of threshold values. The WAT approach
in combination with NegMM instead determines one
threshold, and generates the final consensus directly.
Our empirical results for the majority of datasets over
different generation mechanisms and varied consen-
sus sizes suggest that the WAT approach is success-
ful in removing the weak associations to attain simi-
lar quality clusterings in much-reduced runtime com-
pared to the original NegMM method. This was evi-
dent in particular in our experiments with 1000 and
with 100 base clusterings. Further studies will be
needed to determine why the WAT approach hurt
the clustering performance of NegMM more in the
case of 10 less diverse base clusterings. Moreover,
the WAT approach surprisingly improved the qual-
ity of some NegMM clusterings, for example, apply-
ing WAT(GMM) for the consensus of 100 and 10 K-
Means base clusterings for the Seeds dataset.
ACKNOWLEDGEMENTS
Funding for this research was provided by ISM
Canada and the Natural Sciences and Engineering Re-
search Council of Canada.
REFERENCES
Aggarwal, C. C. and Reddy, C. K. (2014). Data Cluster-
ing: Algorithms and Applications. CRC Press, Boca
Raton, Florida, USA.
Baller, T., Hamilton, H., and Zilles, S. (2018). A meta ap-
proach to removing weak links during consensus clus-
tering. Unpublished manuscript.
Bezdek, J. C. and Pal, N. R. (1998). Some new indexes of
cluster validity. IEEE Transactions on Systems, Man
& Cybernetics. Part B (Cybernetics), 28(3):301–315.
Chalamalla, A. K. (2010). A survey on consensus cluster-
ing techniques. Technical report, Department of Com-
puter Science, University of Waterloo.
Comaniciu, D. and Meer, P. (2002). Mean shift: A robust
approach toward feature space analysis. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
24(5):603–619.
Dua, D. and Graff, C. (2017). UCI machine learning repos-
itory. University of California, Irvine, School of Infor-
mation and Computer Sciences.
Fr
¨
anti, P. and Sieranoja, S. (2018). K-means properties
on six clustering benchmark datasets. Applied Intel-
ligence, 48(12):4743–4759.
Fred, A. L. and Jain, A. K. (2005). Combining multiple
clusterings using evidence accumulation. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
27(6):835–850.
Hubert, L. and Arabie, P. (1985). Comparing partitions.
Journal of Classification, 2(1):193–218.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer,
P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,
A., Cournapeau, D., Brucher, M., Perrot, M., and
Duchesnay, E. (2011). Scikit-learn: Machine learning
in Python. Journal of Machine Learning Research,
12(85):2825–2830.
Reynolds, D. A. (2015). Gaussian mixture models. In Li,
S. Z. and Jain, A. K., editors, Encyclopedia of Biomet-
rics, pages 827–832. Springer, Boston, MA, USA.
Roberts, S. J., Husmeier, D., Rezek, I., and Penny, W.
(1998). Bayesian approaches to Gaussian mixture
modeling. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 20(11):1133–1142.
Sculley, D. (2010). Web-scale K-means clustering. In
Proceedings of the 19th International Conference on
World Wide Web (WWW 2010), pages 1177–1178.
Shi, J. and Malik, J. (2000). Normalized cuts and image
segmentation. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 22(8):888–905.
Strehl, A. and Ghosh, J. (2003). Cluster ensembles: A
knowledge reuse framework for combining multiple
partitions. Journal of Machine Learning Research,
3:583–617.
Vega-Pons, S. and Ruiz-Shulcloper, J. (2011). A sur-
vey of clustering ensemble algorithms. Intl. Jour-
nal of Pattern Recognition and Artificial Intelligence,
25(3):337–372.
Vinh, N., Epps, J., and Bailey, J. (2010). Informa-
tion theoretic measures for clusterings comparison:
Variants, properties, normalization and correction for
chance. Journal of Machine Learning Research,
11(2010):2837–2854.
Wu, X., Ma, T., Cao, J., Tian, Y., and Alabdulkarim, A.
(2018). A comparative study of clustering ensem-
ble algorithms. Computers & Electrical Engineering,
68:603–615.
Yi, J., Yang, T., Jin, R., Jain, A. K., and Mahdavi, M.
(2012). Robust ensemble clustering by matrix com-
pletion. In Proceedings of the 12th IEEE International
Conference on Data Mining, (ICDM 2012), pages
1176–1181.
Zhang, T., Ramakrishnan, R., and Livny, M. (1997).
BIRCH: A new data clustering algorithm and its ap-
plications. Data Min. Knowl. Discov., 1(2):141–182.
Zhong, C., Hu, L., Yue, X., Luo, T., Qiang, F., and Haiyong,
X. (2019). Ensemble clustering based on evidence ex-
tracted from the co-association matrix. Pattern Recog-
nition, 92:93–106.
Efficient Removal of Weak Associations in Consensus Clustering
335