and data type. The most common similarity measure
is distance between points, for example, Euclidean
metric for continuous attributes. There is no univer-
sal method to assess clustering results. One of the
approaches is to measure quality of partitioning by
special indicants (validity indices). The most com-
mon measures are: Davies-Bouldin’s (DB), Dunn’s
(Halkidi and Batistakis, 2001), Silhouette Index (SI)
(Kaufman and Rousseeuw, 1990) and CDbw (Halkidi
and Vazirgiannis, 2002). Clustering algorithms have
wide applications in pattern recognition, image pro-
cessing, statistical data analysis and knowledge dis-
covery. Quoting definitions mentioned above, where
granule is determined as a set of objects, one can
consider groups identified by clustering algorithms as
data granules. According to that definition, a granule
can contain other granules as well as be the part of an-
other granule. It makes possible to employ clustering
algorithms to create granulation structures of data.
The article proposes an approach of information
granulation by clustering data, that are in form of hy-
perboxes. Hyperboxes are created in the first step of
the algorithm and then they are clustered by SOSIG
(Stepaniuk and Ku
˙
zelewska, 2008) method. This so-
lution is effective with regard to time complexity and
interpretability of generated groups of data. The pa-
per is organized as follows: the next section, Section
2, describes proposed approach, Section 3 reports col-
lected data sets as well as executed experiments. The
last section concludes the article.
2 GRANULAR CLUSTERING BY
SOSIG
The proposed method of data granulation is composed
of two phases. First phase prepares data objects in
form of granules (hperboxes), whereas second detects
similar groups of the granules. The final result of
granulation is a three-level structure, where the main
granulation is defined by clusters of granules and the
following level consists of granules from components
of the top level cluster. The down third level consists
of point-type objects.
The method of hyperboxes creation is designed
to reduce the complexity of the description of real-
world systems. The improved generality of informa-
tion granules is attained through sacrificing some of
the numerical precision of point-data (Bargiela and
Pedrycz, 2001). The hyperboxes (referred as I) are
multi-dimensional structures described by a pair of
values a and b for every dimension. The point a
i
represents minimal and b
i
maximal value of the gran-
ule in i-th dimension, thus width of i-th dimensional
edge equals |b
i
− a
i
|. Creation of hyperboxes is based
on maximization of ”information density” of gran-
ules (the algorithm is described in details in (Bargiela
and Pedrycz, 2006)). Information density can be ex-
pressed by Equation 1.
σ =
card(I)
φ(width(I))
(1)
Maximization of σ is a problem of balancing the pos-
sible shortest dimensions against the greatest cardi-
nality of formed granule I. In presented experiments
in the following section, cardinality of the granule I
is considered as the number of point-type objects be-
longing to the granule. Belonging means that the val-
ues of point attributes are between or equal to the min-
imal and maximal values of the hyperbox attributes.
For that reason there is necessity to re-calculate car-
dinality in every case of forming a new largest gran-
ule from combination of two granules. In multi-
dimensional case of granules, as a function of hyper-
boxes width, is applied a function from Equation 2:
φ(u) = exp(K · max
i
(u
i
) − min
i
(u
j
)),i, j = 1, ..., n
(2)
where u = (u
1
,u
2
,. .. ,u
n
) and u
i
= width([a
i
,b
i
]) for
i, j = 1,.. ., n. The points a
i
and b
i
denote respec-
tively minimal and maximal value in i-th dimension.
The constant K originally equals 2, however in the
experiments there were used different values of K
given as a parameter. Computational complexity of
this algorithm is O(N
3
). However, in every step of
the method, the size of data is decreased by 1, what
in practice significantly reduces the general complex-
ity. The data granulation algorithm assumes process-
ing hyperboxes as well as point-type data. To make
it possible new data are characterized by 2 · n val-
ues in comparison with original data. The first n at-
tributes describe minimal, whereas the following n
describe maximal values for every dimension. To as-
sure topological ”compatibility” point-type data and
hyperboxes dimensionality of the data is doubled ini-
tially.
2.1 Self-Organizing System for
Information Granulation
The SOSIG (Self-Organizing System for Information
Granulation) algorithm is a system designed for de-
tecting granules present in data. The granulation is
performed by clustering and the clusters can be iden-
tified on the different level of resolution. The proto-
type of the algorithm is a method described in (Wierz-
cho
´
n and Ku
˙
zelewska, 2006). However, in SOSIG
granulation property and application to cope with dif-
ferent attributes types was introduced. This follows
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
90