ciated to different clusters, the degree of membership
to each cluster being determined according to the dis-
tance function. This algorithm is known to yield bet-
ter results than the k-means algorithm in most cases.
The FCM-GK algorithm (Gustafson and Kessel, 1979)
uses an adaptive distance and thus it can more effi-
ciently fit the different cluster sizes and shapes.
In the probabilistic case, one makes use of the
Bayesian paradigm, which in most cases requires a
parametric modelling of class-conditional probability
density functions (pdf). A parametric modelling of
class-conditional pdfs is often difficult to obtain be-
cause of some non trivial cluster shapes which can
occur as in multispectral and hyperspectral image pro-
cessing. It is the case of the mixture modelling meth-
ods based on a statistical approach. Each cluster is
modelled by a multivariate distribution f with param-
eters θ
c
and the dataset is described by a linear com-
bination of those conditional distributions. A max-
imization of the likelihood is often used to find the
best parameters of each cluster. This maximization is
often performed by using the iterative EM algorithm
(Dempster et al., 1977). However the SEM algorithm,
which is a stochastic version of the EM algorithm, can
avoid some drawbacks of the EM algorithm such as its
slow convergence (Celeux and Diebolt, 1987). Using
one of these parameters estimation methods, a classi-
fication can be obtained for instance by associating to
each individual the class label with the highest poste-
rior probability.
In order to avoid the use of parametric (e.g. Gaus-
sian) conditional distributions, a recent approach us-
ing a Fourier-based description of those distributions
has been proposed (Zribi and Ghorbel, 2003). This
approach guarantees that the conditional distributions
are smooth enough to correctly model the variability
of each cluster without any parametric modeling as-
sumption, despite the fact that ”negative” probabili-
ties may artificially occur in the course of the itera-
tions.
Another approach to clustering is density-based
clustering. Its principle is to estimate the conditional
densities using the data samples. The high density
areas are characteristic of a cluster whereas the low
density areas correspond to the boundaries. A density
threshold and a volume are necessary to compute the
local densities, and then the number of clusters fol-
lows automatically. However, density based cluster-
ing methods often have difficulty to handling high di-
mensional data because of the very odd-shaped clus-
ter densities. In (Tran et al., 2006), a new algorithm
named KNNClust dealing with this problem is pre-
sented.
We present in this paper a new clustering algo-
rithm, based on the SEM algorithm called the Non
Parametric SEM algorithm (NPSEM). It is a non para-
metric and unsupervised clustering algorithm which
has the ability to estimate the number of clusters dur-
ing the clustering process. The originality of the work
is in the extension of the SEM algorithm to the es-
timation of non parametric conditional distributions
and the weighting of the posterior probabilities by a
coherence function which is based on the conditional
entropy of each cluster. It allows to regularize the es-
timation and to stabilize the decision step result.
The second section is devoted to the presentation
of our algorithm and its links to and inspirations from
the SEM and the k-means algorithm. In the third sec-
tion we present some results on different datasets.
Comparisons with other state of the art algorithms are
also given. Finally, a conclusion is given in the fourth
section.
2 PROPOSED CLUSTERING
METHOD
In this section we present the NPSEM clustering
method and show its similarities with the k-means and
SEM algorithms.
The SEM algorithm, as for the algorithm EM from
which it rises, aims to maximize, in an iterative way,
the likelihood of a parametric model when this model
depends on incomplete data. In the case of a mixture
density, the goal of the EM and SEM algorithms is to
estimate the mixture parameters of K distributions:
f(X) =
K
∑
k=1
f(X|θ
k
)p
k
, (1)
where { f(X|θ
k
)},k = 1. ..K are the conditional dis-
tributions of parameters θ
k
and p
k
are the clusters
prior probabilities. Although this algorithm is ba-
sically dedicated to parameter estimation, its use in
classification is also possible, in particular via the
Classification EM algorithm (CEM) (Celeux and Go-
vaert, 1992; Masson and Pieczynski, 1993). The dif-
ference between the algorithms EM and SEM comes
from the introduction into the latter of a stochastic
step aiming to produce a current partition of the data
(pseudo-sample), at each iteration, using a random
sampling according to the posterior distribution com-
puted thanks to the current parameter estimates. The
CEM algorithm was recognized as a generalization of
the k-means algorithm (Same et al., 2005). The SEM
is also close to it, and particularly at two points: (i) the
maximization step is mostly very similar, and consists
of parameter estimation of the clusters formed; (ii) the
SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications
102