Authors:
Joel Luis Carbonera
and
Mara Abel
Affiliation:
Federal University of Rio Grande do Sul, Brazil
Keyword(s):
Clustering, Subspace Clustering, Categorical Data, Attribute Weighting, Data Mining.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Industrial Applications of Artificial Intelligence
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Categorical data sets are often high-dimensional. For handling the high-dimensionality in the clustering process,
some works take advantage of the fact that clusters usually occur in a subspace. In soft subspace clustering
approaches, different weights are assigned to each attribute in each cluster, for measuring their respective
contributions to the formation of each cluster. In this paper, we adopt an approach that uses the correlation
among categorical attributes for measuring their relevancies in clustering tasks. We use this approach
for developing the CBK-Modes (Correlation-based K-modes); a soft subspace clustering algorithm that extends
the basic k-modes by using the correlation-based approach for measuring the relevance of the attributes.
We conducted experiments on five real-world datasets, comparing the performance of our algorithm with five
state-of-the-art algorithms, using three well-known evaluation metrics: accuracy, f-measure and adjusted Rand
index. The results show
that the performance of CBK-Modes outperforms the algorithms that were considered
in the evaluation, regarding the considered metrics.
(More)