Table 3: Evaluation measures on ”Dermatology” dataset
(K = 6) with different numbers of constraints.
%
known
labels
Methods %
ML
% CL % To-
tal
MNCut Rand
Index
0 SL / / / 0.245 0.805
SL-L / / / 0.013 0.827
FCSC / / / 0.011 0.814
FCSC-θ / / / 0.011 0.814
FCSC-θSP / / / 0.013 0.827
FCSC-
θ
2
SP
/ / / 0.013 0.827
SSSC / / / 0.013 0.827
2 SL 100.0 87.1 93.5 0.251 0.808
SL-L 100.0 94.1 97.1 0.059 0.850
FCSC / / / / /
FCSC-θ 48.8 70.7 59.7 0.085 0.800
FCSC-θSP 37.4 80.7 59.1 0.109 0.869
FCSC-
θ
2
SP
100.0 100.0 100.0 0.036 0.894
SSSC 100.0 99.7 99.9 0.013 0.880
5 SL 100.0 84.1 92.1 0.273 0.806
SL-L 100.0 95.7 97.9 0.038 0.900
FCSC / / / / /
FCSC-θ 65.0 77.6 71.3 0.102 0.799
FCSC-θSP 62.8 92.3 77.6 0.139 0.890
FCSC-
θ
2
SP
96.7 95.0 95.9 0.040 0.909
SSSC 100.0 98.4 99.2 0.018 0.914
100 SL 100.0 100.0 100.0 0.063 1.000
SL-L 100.0 100.0 100.0 0.063 1.000
FCSC / / / / /
FCSC-θ 75.8 70.3 73.0 0.334 0.714
FCSC-θSP 72.5 87.7 80.1 0.095 0.847
FCSC-
θ
2
SP
87.6 93.9 90.8 0.037 0.927
SSSC 100.0 100.0 100.0 0.045 1.000
lower than other methods).
For example, for a small percentage of known la-
bels (5%), the total proportion of satisfied constraints
(ML and CL) for SSSC is better than for the oth-
ers methods (99.2%) and the MNCut value is small
(0.018). Moreover, this value is coherent with the one
obtained for the basic spectral clustering (correspond-
ing to 0% of known labels and equal to 0.013) and is
smaller than for SL, SL-L and the four FCSC meth-
ods. Best Rand index is achieved too (0.914): final
result for SSSC is then closer to the optimal cluster-
ing than other methods.
For a lower percentage (2%), SSSC method sat-
isfies not exactly all constraints (99.9%), contrary to
FCSC-θ
2
SP. But its MNCut is the lowest (0.13 versus
0.36).
5 CONCLUSIONS
In this paper, we proposed a new efficient K-way
spectral clustering algorithm, using Cannot-Link and
Must-Link as semi-supervised information. Like in
its unsupervised version, the clustering problem is set
as an optimization problem, consisting in minimiz-
ing an objective function proportional to the Multiple
Normalized Cut measure. This measure is here bal-
anced by a weighted penalty term assessing the non-
satisfaction of the given constraints.
Some comparisons with similar methods have
been carried on synthetic samples and some UCI
benchmarks. Different variants of the compared
methods have been proposed, in order to make the
methods more comparable, so as to get fair conclu-
sions. In all cases, the results illustrated that the most
performing methods, ours and the modified Wang’s
algorithms, are able to rapidly adjust the initial clus-
tering to a more convenient one, satisfying the given
constraints, even with quite low numbers of con-
straints. Our method seems to be part of this head
group of methods, its clusterings often achieving the
lowest MNCut values, and the highest satisfied con-
straints rates in the two-class and multi-class cases.
These experiments highlighted the importance of two
steps in this kind of semi-supervised spectral cluster-
ing methods: first, the usual projection step of basic
spectral clustering appears as crucial; then, a lot of
efforts have to be done to tune the constraints weight.
REFERENCES
Han, J. and Kamber, M. (2006). Data Mining: Concepts
and Techniques. Morgan Kaufmann Publishers.
Kamvar, S., Klein, D., and Manning, C. (2003). Spectral
learning. In IJCAI, International Joint Conference on
Artificial Intelligence, pages 561–566.
Luxburg, U. (2007). A tutorial on spectral clustering. In
Statistics and Computing, pages 395–416.
Meila, M. and Shi, J. (2000). Learning segmentation by
random walks. In NIPS12, Neural Information Pro-
cessing Systems, pages 873–879.
Ng, A., Jordan, M., and Weiss, Y. (2002). On spectral clus-
tering: Analysis and an algorithm. In NIPS14, Neural
Information Processing Systems, pages 849–856.
Shi, J. and Malik, J. (2000). Normalized cuts and image seg-
mentation. In PAMI, Transactions on Pattern Analysis
and Machine Intelligence, pages 888–905.
Wagstaff, K. and Cardie, C. (2000). Clustering with
instance-level constraints. In ICML, International
Conference on Machine Learning, pages 1103–1110.
Wang, X. and Davidson, I. (2010). Flexible constrained
spectral clustering. In KDD, International Conference
on Knowledge Discovery and Data Mining, pages
563–572.
Weiss, Y. (1999). Segmentation using eigenvectors: an
unifying view. In IEEE, International Conference on
Computer Vision, pages 975–982.
Zhang, D., Zhou, Z., and Chen, S. (2007). Semi-supervised
dimensionality reduction. In SIAM, 7th International
Conference on Data Mining, pages 629–634.
SEMI-SUPERVISED K-WAY SPECTRAL CLUSTERING USING PAIRWISE CONSTRAINTS
81