order CETMCM, the current pixel was considered in
the central position and together with the other two
pixels, they were either collinear, or formed a right
angle triangle, the current pixel being situated in the
position of the right angle. The following
combinations of orientations were taken into account
for the two displacement vectors: (0°, 180°), (90°,
270°), (45°, 225°), (135°, 315°) in the case of the
collinear pixels; (0°, 90°), (90°, 180°), (180°, 270°),
(0°, 270°), (45°, 135°), (135°, 225°), (225°, 315°),
(45°, 315°), in the case of the right angle triangle. The
displacement vectors had the absolute value 2, in both
cases. We determined the CETMCM and the
pCETMCM matrices for all the considered direction
combinations, the final Haralick feature values
resulting as an arithmetic mean between the values of
the Haralick features of the individual matrices.
2.2 The Learning Phase
Each of the clustering methods described below was
applied and assessed individually, before and after
relevant feature selection. Then the number of the
clusters in the data was decided, based on the
combination of the results provided by the three
methods (a majority voting procedure). The results
were validated through supervised classification.
2.2.1 Clustering Methods
The method of Expectation Maximization (EM) is a
powerful technique that iteratively estimates the
desired parameters, by maximizing the log-likelihood
of the model (Witten, 2005). The parameters
estimated in our work through this technique were the
number of clusters and the sample distributions
within the clusters. The X-means clustering method
was employed as well, being an improved version of
k-means clustering (Pelleg, 2000). The method of X-
means clustering expects a maximum and a minimum
value for the k parameter and performs the following
steps: (1.) Run conventional k-means (Witten, 2005)
to convergence, for a certain value of k. (2.) Decide
whether new cluster centroids should appear or not,
by splitting the old centroids into two. (3.) If k>k
max
,
then stop and report the best model identified during
the algorithm, according to the Bayesian Information
Criterion – BIC (XMeans). The BIC criterion is used
both for deciding which centroids to split,
respectively in order to identify the best model. The
overall algorithm performance is estimated by the
distortion, computed as the average squared distance
from the points to their centroids, for the best model.
The method of Particle Swarm Optimization (PSO)
aims to optimize the solution of a problem by
simulating the movement of a particle swarm and by
determining the best position for each particle (Das,
2008). Each particle has associated a position and a
velocity. The velocity of a particle k increases from
an iteration to another. The speed is influenced by a
cognitive component, which refers to the distance
from the personal best position, as well as by a social
component, referring to the distance from the best
global position. The optimal particle positions are
determined through an evaluation function, defined
according to the specific of each problem.
Considering our problem, of unsupervised
classification through clustering (grouping), a particle
is represented by a certain cluster configuration,
respectively by the way the cluster labels are
associated to the input data, for a given number of
clusters. We combined the PSO technique with the k-
means clustering method. The initial configuration of
the swarm resulted after the application of the k-
means method upon the initial data. We defined the
evaluation function using the specific metrics for
assessing the unsupervised classification
performance, in the case of the k-means clustering
method, meaning, the Within Cluster Sum of Squares
(WCSS). The maximum difference between the
cluster proportions, as well as the number of
insignificant clusters (having a proportion less than
10%), were also taken into account. Thus, the
evaluation function, in our case, was a weighted
mean, as described in (6). All the terms of this
weighted mean were normalized between 0 and 1.
Eval = 0.5*WCSS + 0.2*max_dif_clust_prop +
0.3*no_insignifiant_clust
(6)
2.2.2 Relevant Textural Feature Selection
Our method for relevant feature selection aims to
achieve best class separation, in the context of the
unsupervised classification. Thus, the overlapping
area between two neighbouring clusters must be as
small as possible. For each textural feature f, a
relevance score was defined, as described below:
−=
ji
ji
sizereggOverlappinflevance
,
)
,
__1()(Re
(7)
In (7), i and j are neighbouring clusters. The relevance
of f depends on the sizes of the overlapping regions
that exist between each pair of Gaussian distributions
of f corresponding to each pair of neighbouring
clusters. The overlapping region size was computed
as in (Mitrea D., 2015).