UNSUPERVISED NON PARAMETRIC DATA CLUSTERING BY

MEANS OF BAYESIAN INFERENCE AND INFORMATION THEORY

Gilles Bougeni

ere, Claude Cariou, Kacem Chehdi

TSI2M Laboratory, University of Rennes 1 / ENSSAT, 6 rue de Kerampont, 22300 Lannion, France

Alan Gay

Institute of Grassland and Environmental Research (IGER), Plas Gogerddan, Aberystwyth, Ceredigion, SY23 3EB, UK

Keywords:

Clustering, Classiﬁcation, Bayesian method, Maximum A Posteriori, Information theory, Remote sensing,

Multispectral images.

Abstract:

In this communication, we propose a novel approach to perform the unsupervised and non parametric cluster-

ing of n-D data upon a Bayesian framework. The iterative approach developed is derived from the Classiﬁca-

tion Expectation-Maximization (CEM) algorithm, in which the parametric modelling of the mixture density

is replaced by a non parametric modelling using local kernels, and the posterior probabilities account for the

coherence of current clusters through the measure of class-conditional entropies. Applications of this method

to synthetic and real data including multispectral images are presented. The classiﬁcation issues are compared

with other recent unsupervised approaches, and we show that our method reaches a more reliable estimation

of the number of clusters while providing slightly better rates of correct classiﬁcation in average.

1 INTRODUCTION

Merging objects having similar characteristics is a

very important problem in various contrasting re-

search ﬁelds such as medicine, genetics, chemistry,

computer vision, etc. Despite several decades of re-

search in this area, the task is still difﬁcult because of

the continual improvement of the technology and the

increase of the size of the data to be analyzed. With-

out any prior information, the grouping of objects has

to be done in an unsupervised way. This processing

is called clustering, in contrast to the classiﬁcation

which is the grouping of samples in a supervised way.

The different groups are then called clusters, and they

are formed of the closest individuals, according to a

similarity measure. In the particular case of clustering

of multispectral images, the individuals are the pixels

which are grouped on their spectral information char-

acteristics. To help the clustering of image pixels, one

can also use the spatial information and the fact that

two neighboring pixels are more likely to belong to

the same cluster (Cariou et al., 2005).

Clustering methods can be distinguished by the

similarity function used to realize the clustering (Tran

et al., 2005). The similarity functions fall in two cat-

egories: the deterministic similarity functions and the

probabilistic similarity functions.

In the deterministic case, a distance function is of-

ten used. This is the case of the well known k-means

algorithm (MacQueen, 1967) which associates each

object with the cluster label for which the correspond-

ing representative object (typically the centroid of the

objects in that cluster) is the closest according to the

distance function used. At each iteration the centroid

is computed again. This algorithm is very simple and

has been improved since its initial development until

recently (Huang and Ng, 2005; Laszlo and Mukher-

jee, 2006). For instance, a modiﬁed version which can

automatically associate a weight to each feature dur-

ing the clustering process has been developed, leading

to a more accurate result (Huang and Ng, 2005). Ge-

netic algorithms have also been proposed as a reliable

approach of determining centers of clusters (Laszlo

and Mukherjee, 2006).

The k-means algorithm provides a hard partition-

ing of the individuals which involves a lack of pre-

cision, particularly in case of overlapping between

clusters. The fuzzy c-means (FCM) algorithm (Dunn,

1973; Bezdek, 1981) is the fuzzy equivalent of the

k-means algorithm. Each object is potentially asso-

101

Bougenière G., Cariou C., Chehdi K. and Gay A. (2007).

UNSUPERVISED NON PARAMETRIC DATA CLUSTERING BY MEANS OF BAYESIAN INFERENCE AND INFORMATION THEORY.

In Proceedings of the Second International Conference on Signal Processing and Multimedia Applications, pages 101-108

DOI: 10.5220/0002141301010108

 SciTePress

ciated to different clusters, the degree of membership

to each cluster being determined according to the dis-

tance function. This algorithm is known to yield bet-

ter results than the k-means algorithm in most cases.

The FCM-GK algorithm (Gustafson and Kessel, 1979)

uses an adaptive distance and thus it can more efﬁ-

ciently ﬁt the different cluster sizes and shapes.

In the probabilistic case, one makes use of the

Bayesian paradigm, which in most cases requires a

parametric modelling of class-conditional probability

density functions (pdf). A parametric modelling of

class-conditional pdfs is often difﬁcult to obtain be-

cause of some non trivial cluster shapes which can

occur as in multispectral and hyperspectral image pro-

cessing. It is the case of the mixture modelling meth-

ods based on a statistical approach. Each cluster is

modelled by a multivariate distribution f with param-

eters θ

and the dataset is described by a linear com-

bination of those conditional distributions. A max-

imization of the likelihood is often used to ﬁnd the

best parameters of each cluster. This maximization is

often performed by using the iterative EM algorithm

(Dempster et al., 1977). However the SEM algorithm,

which is a stochastic version of the EM algorithm, can

avoid some drawbacks of the EM algorithm such as its

slow convergence (Celeux and Diebolt, 1987). Using

one of these parameters estimation methods, a classi-

ﬁcation can be obtained for instance by associating to

each individual the class label with the highest poste-

rior probability.

In order to avoid the use of parametric (e.g. Gaus-

sian) conditional distributions, a recent approach us-

ing a Fourier-based description of those distributions

has been proposed (Zribi and Ghorbel, 2003). This

approach guarantees that the conditional distributions

are smooth enough to correctly model the variability

of each cluster without any parametric modeling as-

sumption, despite the fact that ”negative” probabili-

ties may artiﬁcially occur in the course of the itera-

tions.

Another approach to clustering is density-based

clustering. Its principle is to estimate the conditional

densities using the data samples. The high density

areas are characteristic of a cluster whereas the low

density areas correspond to the boundaries. A density

threshold and a volume are necessary to compute the

local densities, and then the number of clusters fol-

lows automatically. However, density based cluster-

ing methods often have difﬁculty to handling high di-

mensional data because of the very odd-shaped clus-

ter densities. In (Tran et al., 2006), a new algorithm

named KNNClust dealing with this problem is pre-

sented.

We present in this paper a new clustering algo-

rithm, based on the SEM algorithm called the Non

Parametric SEM algorithm (NPSEM). It is a non para-

metric and unsupervised clustering algorithm which

has the ability to estimate the number of clusters dur-

ing the clustering process. The originality of the work

is in the extension of the SEM algorithm to the es-

timation of non parametric conditional distributions

and the weighting of the posterior probabilities by a

coherence function which is based on the conditional

entropy of each cluster. It allows to regularize the es-

timation and to stabilize the decision step result.

The second section is devoted to the presentation

of our algorithm and its links to and inspirations from

the SEM and the k-means algorithm. In the third sec-

tion we present some results on different datasets.

Comparisons with other state of the art algorithms are

also given. Finally, a conclusion is given in the fourth

section.

2 PROPOSED CLUSTERING

METHOD

In this section we present the NPSEM clustering

method and show its similarities with the k-means and

SEM algorithms.

The SEM algorithm, as for the algorithm EM from

which it rises, aims to maximize, in an iterative way,

the likelihood of a parametric model when this model

depends on incomplete data. In the case of a mixture

density, the goal of the EM and SEM algorithms is to

estimate the mixture parameters of K distributions:

f(X) =

∑

k=1

f(X|θ

, (1)

where { f(X|θ

)},k = 1. ..K are the conditional dis-

tributions of parameters θ

and p

are the clusters

prior probabilities. Although this algorithm is ba-

sically dedicated to parameter estimation, its use in

classiﬁcation is also possible, in particular via the

Classiﬁcation EM algorithm (CEM) (Celeux and Go-

vaert, 1992; Masson and Pieczynski, 1993). The dif-

ference between the algorithms EM and SEM comes

from the introduction into the latter of a stochastic

step aiming to produce a current partition of the data

(pseudo-sample), at each iteration, using a random

sampling according to the posterior distribution com-

puted thanks to the current parameter estimates. The

CEM algorithm was recognized as a generalization of

the k-means algorithm (Same et al., 2005). The SEM

is also close to it, and particularly at two points: (i) the

maximization step is mostly very similar, and consists

of parameter estimation of the clusters formed; (ii) the

SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications

102

construction of a posterior pseudo-sample is carried

out by updating the estimated parameters. However,

the major difference between the two approaches is

in the purely deterministic feature of the k-means and

CEM algorithms: at each iteration, the label of an in-

dividual is given according to a decision criterion of

minimal distance to the current cluster representative

in the case of the k-means, or according to the MAP

criterion for the CEM. This deterministic aspect has a

major disadvantage, namely the convergence to a lo-

cal likelihood maximum, whereas the SEM algorithm

makes it possible to avoid this problem. In order to

carry out a compromise between the SEM and CEM

approaches, we ﬁrst propose to re-examine the E (Es-

timation) step of the SEM algorithm, by computing

membership posterior pseudo-probabilities of the in-

dividuals x

,1 ≤ m ≤ M to each cluster k in the fol-

lowing way:

(C = k|X = x

) =

f(X = x

|θ

)]

∑

k=1

f(X = x

|θ

)]

(2)

where C is the (random) cluster label of an individ-

ual, α ∈ [1,+∞[ is a parameter controlling the de-

gree of determinism in the construction of the pseudo-

sample: α = 1 corresponds to the SEM (stochastic) al-

gorithm, while α → +∞ corresponds to the CEM (de-

terministic) algorithm.

In the above form, the algorithm only allows the

use of parameterized conditional distributions (for ex-

ample normal distributions), which can sometimes be

insufﬁcient to manage complex shaped clusters, as for

instance in multispectral imagery. Consequently, we

have taken into account this constraint by replacing

at each iteration the parameterized conditional distri-

butions in (2) by non parametric conditional distribu-

tions f(X|C), estimated from the pseudo-sample by

the use of a Gaussian isotropic kernel g

(x) with aper-

ture γ. This aperture can be ﬁxed automatically with

respect to the dimensionality of the data, as soon as it

is centered and reduced. The joint distribution, esti-

mated by:

f(X = x

,C = k) =

∑

l=1

− x

) 1

C(m)=k

∑

m=1

∑

l=1

− x

)

, (3)

∀ 1 ≤ k ≤ K, ∀ 1 ≤ m ≤ M

where C(m) represents cluster label affected to the in-

dividual with index m in the current iteration, makes

it possible to estimate the prior probabilities p

and

the conditional distributions f(X = x

|C = k). Those

conditional distributions cannot be directly used in the

above algorithm because the mixture distribution is no

longer identiﬁable. We then propose a further modi-

ﬁcation of the posterior distribution computation by

introducing a regularizing heuristic as follows:

(C = k|X = x

) =

f(X = x

|C = k) e

−H(X|k)

∑

k=1



f(X = x

|C = k) e

−H(X|k)



, (4)

where H(X|k) measures the conditional entropy of

the current k-th cluster. Its effect on the posterior

probabilities is as follows: a low entropy conditional

distribution will support the membership of an indi-

vidual x

to the corresponding cluster if this individ-

ual strongly contributes to the coherence of this clus-

ter. This heuristic thus tends to agglomerate the indi-

viduals according to coherent and low entropy clus-

ters (and conditional distributions). Finally, the clus-

tering itself is carried out by using the MAP criterion,

i.e. one chooses for each individual m the cluster k

which maximizes Equation (4).

An important consequence of the proposed algo-

rithm is to allow the estimation of the number of clus-

ters. Indeed, starting from an upper bound of the num-

ber of clusters, the algorithm reduces the number of

clusters as soon as a cluster proportion is lower than

a previously speciﬁed threshold of representativeness.

In this case, the individuals which belong to the clus-

ter which disappears are redistributed into the remain-

ing clusters.

3 EXPERIMENTS AND RESULTS

In this section, we evaluate the efﬁciency of NPSEM

on the following three real datasets for which a

ground truth is available: (1) Fisher’s iris dataset (150

individuals with 4 variables partitioned in 3 classes)

(Fisher, 1936); (2) The wine dataset, composed of

178 individuals of 3 types (clusters) and 13 variables

per individual (Aeberhard et al., 1992); (3) the Morfa

dataset is a portion of a CASI hyperspectral image,

composed of 747 pixels with 48 spectral radiance

measurements each (equally spaced from 405nm to

947nm). This dataset was acquired in 2006 by the

IGER (Institute of Grassland and Environmental Re-

search) in Morfa Mawr, Wales, UK, during the survey

of a barley crop ﬁeld containing two different species

which are infected or not by the mildew (4 classes). In

addition we have used an additional synthetic dataset

used to assess our algorithm on non Gaussian data.

This 2D dataset is composed by two classes as shown

by the ground truth in Figure 5-a.

Comparisons have been carried out with some

other partitioning algorithms from the state of the art:

k-means, FCM, FCM-GK, EM-GM, witch is a clus-

tering algorithm based upon a Gaussian Mixture ob-

UNSUPERVISED NON PARAMETRIC DATA CLUSTERING BY MEANS OF BAYESIAN INFERENCE AND

INFORMATION THEORY

103

tained thanks to the EM algorithm, and the KNNClust,

a non parametric, non supervised version of the KNN

(k nearest neighbors) algorithm which can also es-

timate the number of clusters. This algorithm also

shares with the NPSEM the property that it is not de-

terministic, i.e. it can provide a different clustering

result at each run. The ground truth is available for

each dataset which makes it possible to compute the

correct classiﬁcation rate obtained by the different al-

gorithms in our experiments.

For the wine and Morfa datasets, the clustering

has been performed after a data reduction consisting

of keeping the ﬁrst three components resulting from

the principal component analysis (PCA).

A correct classiﬁcation rate is computed, as well

as the κ index which is a classiﬁcation rate weighted

to compensate the effect of chance on the clustering

results. The κ index is computed as follows :

κ = (P

− P

)/(1− P

) (5)

where P

is the correct classiﬁcation rate and

∑

k=1

clust

∑

k=1

truth

with n

clust

the number of individuals associated to

cluster k during the clustering process and n

truth

the

number of samples that are in cluster k according to

the ground truth. K is the number of clusters and M

the number of individuals to cluster.

Table 1: Correct classiﬁcation rates (in %) obtained by the

clustering algorithms on the different datasets and their av-

erage.

Synth Wine Iris Morfa avg.

k-means 62.3 88.9 78.0 64.3 73.4

EM-GM 61 92.7 94 72.6 80.1

FCM 57 97.1 88.0 73.8 79.0

FCM-GK 57.4 95.5 91.3 75.9 80.0

KNNClust 61.4 95.5 83.3 73.0 78.3

NPSEM 99.4 95.4 83.0 73.9 87.9

Table 2: Kappa index of agreement (in %).

Synth Wine Iris Morfa avg.

k-means 24.6 83.6 66.0 52.4 56.6

EM-GM 21 88.9 91.3 63.5 66.2

FCM 13.9 95.8 82.0 65.1 64.2

FCM-GK 14.7 93.3 87.0 67.9 65.7

KNNClust 22.3 91.3 75.0 63.9 63.1

NPSEM 98.8 93.1 76.9 65.2 83.5

For the KNNClust algorithm, different values for

the number of nearest neighbors have been tried.

Also, in all experiments with the NPSEM algorithm,

(a) ground truth

(b) FCM-GK

(d) NPSEM

Figure 1: Ground truth of Morfa hyperspectral image (48

bands, 4 classes) and clustering results by the FCM-GK, the

KNNClust and the NPSEM algorithm.

the upper bound for the number of clusters was ﬁxed

K = 5, the Gaussian kernel aperture in Eq. (3)

to γ = 0.2 and the pseudo-probabilities reinforcement

coefﬁcient in Eq. (4) to α = 1.2 . For each algo-

SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications

104

Table 3: Rate of correct estimation (in %) of the number of

clusters for fully unsupervised methods.

Wine Iris Morfa average

KNNClust 95 45 65 68.3

NPSEM 100 80 80 86.7

rithm, only the best results have been kept. To com-

pute the correct classiﬁcation rate for the KNNClust

and the NPSEM algorithms, which both can estimate

the number of clusters, we have taken into account

the only results where the correct number of clusters

has been found. The correct classiﬁcation rates are

shown in Table 1, and the κ index results are given

in Table 2. Table 3 shows the behavior of those two

algorithms regarding the estimation of the number of

clusters. For the other algorithms, the correct number

of clusters K was given a priori.

The overall correct classiﬁcation and kappa rates

show better results for the NPSEM algorithm. More

precisely we can see that the results obtained on real

datasets are equivalent to the GK version of the FCM,

the KNNClust and the NPSEM with a little advantage

to FCM-GK. But on our synthetic dataset all the algo-

rithms have failed to recover the two clusters, except

our NPSEM algorithm.

Moreover, as is shown in Table 3, the NPSEM

gives more reliable estimates of the number of clus-

ters than the KNNClust. This reliability is also con-

ﬁrmed in the iris dataset, where the correct number of

clusters has been obtained in 80% of the experiments

for the NPSEM against only 45% for the KNNClust,

both reaching nearly the same correct classiﬁcation

rate when the correct number of clusters was found.

Moreover, the overall κ index is slightly better for

the NPSEM method compared to the KNNClust method

which reveals a better agreement of the classiﬁcation

with the ground truth.

Figure 1 shows typical classiﬁcation results given

by the FCM-GK, NPSEM and KNNClust on the Morfa

dataset. On this example, the correct classiﬁcation

rate is 75.9% for the FCM-GK, 73.9% for the NPSEM

and 73% for the KNNClust. Figure 2 depicts the same

clustering results in the feature space, through a pro-

jection onto the ﬁrst two principal axis resulting from

the PCA of the Morfa dataset. Figure 3 shows the

ground truth and clustering results which were ob-

tained for the wine dataset, Figure 4 the ground truth

and clustering results obtained for the iris dataset and

Figure 5 the ground truth and clustering results ob-

tained for our synthetic dataset.

4 CONCLUSION

In this communication, we have described the be-

haviour of a new clustering algorithm, the Non-

Parametric Stochastic Expectation Maximisation

(NPSEM) algorithm. This algorithm, inspired from

the SEM algorithm and based on the use of a kernel

function and an entropy based weighting, has the ad-

vantage to deal with non parametric conditional pdfs.

This enables the algorithm to best ﬁt different shapes

of cluster. This feature is very important in the case

of multispectral image clustering where the shape of

clusters may be very different. Our algorithm can also

estimate the number of clusters during the clustering

process. The only parameter which is needed is an

upper bound estimate of the number of clusters.

We have tested this algorithm on three different

datasets, and we have compared the results with ﬁve

other clustering algorithms. four of them were clas-

sical algorithms (k-means, EM-GM, FCM, FCM-GK)

which are well known for their efﬁciency and/or sim-

plicity. Their main drawback is that they are not fully

unsupervised in the sense that the number of clusters

must be given. The ﬁfth one is the KNNClust algo-

rithm which can also estimate the number of clusters

automatically. This method also requires one param-

eter, i.e. the number of neighbors which most of the

time is not easier to determine than the number of

clusters.

The results of our ﬁrst experiments are promising:

NPSEM has shown to be more efﬁcient in terms of

estimation of the number of clusters while giving in

average better classiﬁcation rates than other compara-

ble approaches on datasets with clusters of different

shapes.

In further works, we plan to consider especially

the case of multispectral and hyperspectral image seg-

mentation by adding spatial information to the spec-

tral information for each pixel. By doing so, we hope

to be able to improve the clustering results, whilst

keeping the advantage of reliable estimation of num-

bers of clusters.

ACKNOWLEDGEMENTS

This work is supported by the European Union and

co-ﬁnanced by the ERDF and the Regional Council of

Brittany through the Interreg3B project number 190

PIMHAI.

UNSUPERVISED NON PARAMETRIC DATA CLUSTERING BY MEANS OF BAYESIAN INFERENCE AND

INFORMATION THEORY

105

REFERENCES

Aeberhard, S., Coomans, D., and deVel, O. (1992). The

classiﬁcation performance of RDA. Technical report,

92-01, Dept. of Computer Science and Dept. of Math-

ematics and Statistics, James Cook University, North

Queensland, Australia.

Bezdek, J. (1981). Pattern Recognition With Fuzzy Objec-

tive Function Algorithms. Plenum Press, New York.

Cariou, C., Chehdi, K., and Nagle, A. (2005). Gravita-

tional transform for data clustering - application to

multicomponent image classiﬁcation. In Proc. IEEE

ICASSP 2005, volume 2, pages 105–108, Philadel-

phia, USA.

Celeux, G. and Diebolt, J. (1987). A probabilistic teacher

algorithm for iterative maximum likelihood estima-

tion. In Classiﬁcation and Related Methods of

Data Analysis, pages 617–623. Amsterdam: Elsevier,

North-Holland.

Celeux, G. and Govaert, G. (1992). A classiﬁcation EM

algorithm for clustering and two stochastic versions.

In Computational Statistics and Data Analysis, num-

ber 3, pages 315–332.

Dempster, A., Laird, N., and Rubin, D. (1977). Maximum

Likelihood from Incomplete Data via the EM Algo-

rithm. Journal of the Royal Statistical Society. Series

B (Methodological), 39(1):1–38.

Dunn, J. (1973). A fuzzy relative of the ISODATA process

and its use in detecting compact well-separated clus-

ters. Journal of Cybernetics, 3(3):32–57.

Fisher, R. (1936). The use of multiple measurements in

taxonomic problems. Annals of Eugenics, 7(2):179–

188.

Gustafson, D. and Kessel, W. (1979). Fuzzy clustering with

a covariance matrix. IEEE Conference on Decision

and Control, pages 761–766.

Huang, J. and Ng, M. (2005). Automated variable weight-

ing in k-means type clustering. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

27(5):657–668.

Laszlo, M. and Mukherjee, S. (2006). A genetic algorithm

using hyper-quadtrees for low-dimensional k-means

clustering. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 28(4):533–543.

MacQueen, J. (1967). Some methods for classiﬁcation and

analysis of multivariate observations. Proceedings of

the Fifth Berkeley Symposium on Mathematical Statis-

tics and Probability, 1:281–297.

Masson, P. and Pieczynski, W. (1993). SEM algorithm and

unsupervised statistical segmentation of satellite im-

ages. IEEE Transactions on Geoscience and Remote

Sensing, 31(3):618–633.

Same, A., Govaert, G., and Ambroise, C. (2005). A mix-

ture model-based on-line cem algorithm. In Advances

in Intelligent Data Analysis, 6th International Sym-

posium on Data Analysis, IDA 2005, 8-10 Oct. 2005,

Madrid, Spain.

Tran, T., Wehrens, R., and Buydens, L. (2005). Clustering

multispectral images: a tutorial. Chemometrics and

Intelligent Laboratory Systems, 77:1–2.

Tran, T., Wehrens, R., and Buydens, L. (2006). KNN-kernel

density-based clustering for high-dimensional multi-

variate data. Computational Statistics and Data Anal-

ysis, 51:513–525.

Zribi, M. and Ghorbel, F. (2003). An unsupervised and

non-parametric bayesian classiﬁer. Pattern Recogni-

tion Letters, 24(1):97 – 112.

SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications

106

−6 −5 −4 −3 −2 −1 0 1 2 3

−2.5

−2

−1.5

−1

−0.5

0.5

1.5

2.5

(a) ground truth

−6 −5 −4 −3 −2 −1 0 1 2 3

−2.5

−2

−1.5

−1

−0.5

0.5

1.5

2.5

(b) FCM-GK

−6 −5 −4 −3 −2 −1 0 1 2 3

−2.5

−2

−1.5

−1

−0.5

0.5

1.5

2.5

−6 −5 −4 −3 −2 −1 0 1 2 3

−2.5

−2

−1.5

−1

−0.5

0.5

1.5

2.5

(d) NPSEM

Figure 2: Ground truth and clustering results on Morfa

dataset after selection of the ﬁrst three principal compo-

nents. The data is projected onto the ﬁrst two principal axis.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2.5

−2

−1.5

−1

−0.5

0.5

1.5

2.5

(a) ground truth

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2.5

−2

−1.5

−1

−0.5

0.5

1.5

2.5

(b) FCM-GK

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2.5

−2

−1.5

−1

−0.5

0.5

1.5

2.5

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2.5

−2

−1.5

−1

−0.5

0.5

1.5

2.5

(d) NPSEM

Figure 3: Ground truth and clustering results on wine

dataset after selection of the ﬁrst three principal compo-

nents. The data is projected onto the ﬁrst two principal axis.

UNSUPERVISED NON PARAMETRIC DATA CLUSTERING BY MEANS OF BAYESIAN INFERENCE AND

INFORMATION THEORY

107

−1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0.5

1.5

(a) ground truth

−1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0.5

1.5

(b) FCM-GK

−1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0.5

1.5

−1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0.5

1.5

(d) NPSEM

Figure 4: Ground truth and clustering results on iris dataset.

The data is projected onto the ﬁrst two principal axis.

100 150 200 250 300

100

120

140

160

180

200

220

240

260

280

300

(a) ground truth

100 150 200 250 300

100

120

140

160

180

200

220

240

260

280

300

(b) FCM-GK

100 150 200 250 300

100

120

140

160

180

200

220

240

260

280

300

100 150 200 250 300

100

120

140

160

180

200

220

240

260

280

300

(d) NPSEM

Figure 5: Ground truth and clustering results on a 2D syn-

thetic dataset.

SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications

108