we remark that overlapping movies are more easily
detected in feature space and it lays in the surface be-
tween action and crime movies.
To estimate the number of clusters, we built
a Gram matrix using an RBF kernel with σ = 2.
Figure 4 shows the most significant eigenvalues of
the Gram matrix. We get between 3 and 4 signifi-
cant eigenvalues. Known that EachMovie subset is an
overlapping subset, the suitable choice is three clus-
ters.
Table 3: Comparison between OKM and KOKM on Each-
Movie dataset.
Method Precesion Recall F-measure
OKM based
Euclidean distance 0.582 0.827 0.687
KOKM with
RBF kernel 0.594 0.827 0.692
KOKM with
Polynomial kernel 0.628 0.851 0.722
Then, using OKM with Euclidean distance and
KOKM based on an RBF and on a Polynomial ker-
nel, and by fixing the number of cluster to k = 3, we
run each algorithm twenty times (with similar initial-
izations). Table 3 shows the results obtained where
KOKM algorithm with Polynomial kernel gives the
best results. These results confirm the first results ob-
tained on Iris and Ionosphere datasets. KOKM im-
proves overlapping clustering quality and Polynomial
kernel gives the best results on all tested datasets.
5 CONCLUSIONS
We have proposed in this paper the kernel overlapping
k-means clustering algorithm. This algorithm maps
data from input space to a higher dimensional feature
space through the use of a kernel Mercer function and
optimizes an objective function that looks for optimal
clusters in feature space. The main advantages of this
algorithm are its ability to identify nonlinearly sep-
arable clusters in input space and its ability to sepa-
rate clusters with complex boundary. Moreover, we
propose an estimation of the number of clusters us-
ing the Gram matrix. This estimation is based on the
assumption that we must add more clusters when the
overlap between clusters becomes larger. Empirical
results show that KOKM using both Polynomial and
RBF kernels outperforms OKM in terms of precision
recall and F-measure for overlapping clusters and for
non overlapping clusters.
As a future work, we plan to improve this kernel
overlapping k-means algorithm by proposing another
version of KOKM where prototypes and objects im-
ages are computed in feature space. In this way, ker-
nel overlapping clustering can be applied to structured
data, such as trees, strings, histograms and graphs.
REFERENCES
Banerjee, A., Krumpelman, C., Basu, S., Mooney, R., and
Ghosh, J. (2005). Model based overlapping clustering.
In International Conference on Knowledge Discovery
and Data Mining, Chicago, USA. SciTePress.
Ben-Hur, A., Horn, D., Siegelmann, H. T., and Vapnik, V.
(2000). Support vector clustering. In International
Conference on Pattern Recognition, pages 724–727,
Barcelona, Spain.
Bertrand, P. and Janowitz, M. F. (2003). The k-weak hier-
archical representations: an extension of the indexed
closed weak hierarchies. Discrete Applied Mathemat-
ics, 127(2):199–220.
Camastra, F. and Verri, A. (2005). A novel kernel method
for clustering. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 27:801–804.
Cleuziou, G. (2007). Okm : une extension des k-moyennes
pour la recherche de classes recouvrantes. Revue
des Nouvelles Technologies de l’Information, Cpadus-
Edition RNTI-E, 2:691–702.
Cleuziou, G. (2008). An extended version of the k-means
method for overlapping clustering. In International
Conference on Pattern Recognition ICPR, pages 1–4,
Florida, USA. IEEE.
Cleuziou, G. (2009). Okmed et wokm : deux variantes
de okm pour la classification recouvrante. Revue
des Nouvelles Technologies de l’Information, Cpadus-
Edition, 1:31–42.
Deodhar, M. and Ghosh, J. (2006). Consensus cluster-
ing for detection of overlapping clusters in microarray
data.workshop on data mining in bioinformatics. In
International Conference on data mining, pages 104–
108, Los Alamitos, CA, USA. IEEE Computer Soci-
ety.
Diday, E. (1984). Orders and overlapping clusters by pyra-
mids. Technical Report 730, INRIA, France.
Girolami, M. (2002). Mercer kernel-based clustering in fea-
ture space. IEEE Transactions on Neural Networks,
13(13):780–784.
Sch
¨
olkopf, B., Smola, A., and M
¨
uller, K.-R. (1998). Non-
linear component analysis as a kernel eigenvalue prob-
lem. Neural Computation, 10(5):1299–1319.
Zhang, D. and Chen, S. (2002). Fuzzy clustering using ker-
nel method. In International Conference on Control
and Automation, pages 123–127, Xiamen, China.
KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval
256