Authors:
Benjamin Ertl
1
;
Jörg Meyer
1
;
Matthias Schneider
2
and
Achim Streit
1
Affiliations:
1
Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
;
2
Institute for Meteorology and Climate Research (IMK-ASF), Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Keyword(s):
Data Mining, Machine Learning, Pattern Recognition, Clustering, Correlation Clustering, Constrained Clustering, DBSCAN, Spatio-temporal Data, Climate Research.
Abstract:
Full space clustering methods suffer the curse of dimensionality, for example points tend to become equidistant from one another as the dimensionality increases. Subspace clustering and correlation clustering algorithms overcome these issues, but still face challenges when data points have complex relations or clusters overlap. In these cases, clustering with constraints can improve the clustering results, by including a priori knowledge into the clustering process. This article proposes a new clustering algorithm CoExDBSCAN, density-based clustering with constrained expansion, which combines traditional, density-based clustering with techniques from subspace, correlation and constrained clustering. The proposed algorithm uses DBSCAN to find density-connected clusters in a defined subspace of features and restricts the expansion of clusters to a priori constraints. We provide verification and runtime analysis of the algorithm on a synthetic dataset and experimental evaluation on a cl
imatology dataset of satellite observations. The experimental dataset demonstrates, that our algorithm is especially suited for spatio-temporal data, where one subspace of features defines the spatial extent of the data and another correlations between features.
(More)