use the dataset itself, except XB and FS involve the
dataset structure (Halkidi et al., 2001). In the next
section, we compare the performance of the above
mentioned indices in determining the true number of
clusters.
3.2 Experimental Results
To evaluate fuzzy partitions obtained from FCM and
FCS, cluster validity measures introduced previously
are compared using five data sets : Yeast, Breast Can-
cer, Abalone, Arrhythmia and Iris. Results of this
comparison are given in Tables from 1 to 5. All ex-
periments are repeated six times on each dataset by
increasing number of clusters. The number of clus-
ters ranging between intervals adaptively to the real
number of clusters (reported on the first column) and
the fuzziness parameter value is m = 1.2.
Yeast data consists of 1484 samples with nine fea-
ture values and ten classes. Table 1 shows results of
Yeast data for all validation methods when the num-
ber of clusters ranges from 5 to 10. While PC, CE, FS
and XB fail to identify the optimal number of Yeast
clusters, BS index correctly identifies it for both clus-
tering algorithms: FCM and FCS.
Wisconsin Breast Cancer data consists of 286
samples, where each pattern has nineteen features and
two clusters. Table 2 shows results of Wisconsin
Breast cancer dataset, where, CE, FS and XB fail to
recognize the optimum number of Yeast clusters, but
BS and PC indices correctly identify the optimal num-
ber of clusters for both clustering algorithms: FCM
and FCS.
Abalone dataset contains 4177 samples, where
each pattern has 279 features values, and sixteen clus-
ters. Results for Abalone dataset are given in Table 3.
They show that CE fails to identify the optimal num-
ber of Yeast clusters, while, XB correctly identifies
the optimam number except when the FCM algorithm
is applied. However, PC, FS and BS correctly iden-
tify the optimal number of clusters for both clustering
algorithms: FCM and FCS.
Arrhythmia dataset consists of 452 samples with 8
dimensional measurement spaces and 29 classes. Ta-
ble 4 shows results of Arrhythmia dataset. We notice
that FS and XB fail to identify the optimal number of
Yeast clusters. While PC and CE correctly identify
the optimal number except FCM algorithm is applied.
BS correctly identifies the optimal number of clusters
for both clustering algorithms.
Iris dataset contains 150 samples with four at-
tributes and has three classes. Table 5 gives results
for Iris dataset. We note that FS, XB and BS fail to
identify the optimal number of Yeast clusters. CE cor-
rectly identifies the optimal number of clusters except
when the FCM algorithm is applied. Only, PC cor-
rectly identifies the optimum number for both cluster-
ing algorithms: FCM and FCS.
After clustering the five datasets using the FCM
and FCS, we compare them in terms of the PC and
CE, FS, XB and BS values. Results show that PC
yields the optimal number of clusters three times, CE
identifies the correct number of clusters three times
and only with FCM. Since, FS yields the correct num-
ber of clusters only one time and XB does not yield
the correct number of clusters for any dataset, they
are the most unreliable indices. BS yields the optimal
number of clusters four times. Hence, we confirm re-
sults of (Cho and Yoo, 2006) that the Bayesian score
is the most reliable clustering validity measure. How-
ever, none of the above mentioned indices correctly
finds the optimal number of clusters for all data sets.
Therefore, a suitable index must be selected for each
data.
4 CONCLUSIONS
In order to evaluate fuzzy partitions of two clustering
algorithms, FCM and FCS, four conventional valid-
ity measures and Bayesian validation are used on five
datasets. Results show the good performance of the
Bayesian score as a cluster validity index and demon-
strates that in comparison with conventional fuzzy
indices the Bayesian validation leads to superior re-
sults. We conclude that none of the above mentioned
indices leads to the correct number of clusters for
all mentioned datasets. Hence, fuzzy clustering re-
quires more investigations, where most clustering al-
gorithms may not provide satisfactory result because
no single validity measure works efficiently on differ-
ent kinds of datasets. As future research, fuzzy clus-
tering analysis from a multiple objective optimization
perspective where the search should be performed
overa number of often conflicting objectivefunctions,
needs to be studied.
REFERENCES
Bezdek, J. (1974). Cluster validity with fuzzy sets. Journal
of Cybernetics and Systems, 3(3):58–72.
Carvalho, F. (2006). A fuzzy clustering algorithm for
symbolic interval data based on a single adaptive eu-
clidean distance. In ICONIP (3), pages 1012–1021.
Cho, S. and Yoo, S. (2006). Fuzzy bayesian validation
for cluster analysis of yeast cell-cycle data. Pattern
Recognition, 39(12):2405–2414.
CONVENTIONAL AND BAYESIAN VALIDATION FOR FUZZY CLUSTERING ANALYSIS
139