subspace of features and restricts the expansion of
clusters to a priori constraints. Incorporating a pri-
ori knowledge into the clustering process can signif-
icantly improve the clustering results and can align
the outcome of the clustering process with the ob-
jective of the data analysis, as demonstrated in Sec-
tion 4. Our approach combines different techniques
form subspace, correlation and constrained cluster-
ing. Specifically, we introduce two user-defined pa-
rameters to the original DBSCAN algorithm, one to
define the dimensions of the subspace to be used to
discover density-based clusters, and one to define the
dimensions of the subspace to be used to apply con-
straints to the cluster expansion. Further, we modify
the cluster expansion step in the original DBSCAN
algorithm to be restricted to these user-defined con-
straints. Our validation of the algorithm on an ex-
perimental and real-world dataset demonstrates, that
our algorithm is especially suited for spatio-temporal
data, where one subspace of features defines the spa-
tial extent of the data and another correlations be-
tween features.
In the future, we plan to evaluate different con-
straints in terms of their feasibility and added over-
head compared to the improvement of the clustering
results, as well as propose a machine learning based
selection of suitable constraints, according to the in-
herent structure of the data. In addition, we plan
to work on an optimized implementation of the al-
gorithm that allows us to provide additional runtime
measurements and detailed comparison studies with
other algorithms in the field of subspace, correlation
and constrained clustering.
REFERENCES
Achtert, E., B
¨
ohm, C., David, J., Kr
¨
oger, P., and Zimek,
A. (2008). Robust clustering in arbitrarily oriented
subspaces. In Proceedings of the 2008 SIAM Interna-
tional Conference on Data Mining, ICDM ’08, pages
763–774, Philadelphia, PA. Society for Industrial and
Applied Mathematics.
Aggarwal, C. C. and Yu, P. S. (2000). Finding general-
ized projected clusters in high dimensional spaces. In
Proceedings of the 2000 ACM SIGMOD International
Conference on Management of Data, SIGMOD ’00,
pages 70–81, New York, NY, USA. ACM.
Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P.
(1998). Automatic subspace clustering of high dimen-
sional data for data mining applications. In Proceed-
ings of the 1998 ACM SIGMOD International Confer-
ence on Management of Data, SIGMOD ’98, pages
94–105, New York, NY, USA. ACM.
Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J.
(1999). Optics: Ordering points to identify the clus-
tering structure. SIGMOD Rec., 28(2):49–60.
Basu, S., Davidson, I., and Wagstaff, K. (2008). Con-
strained clustering: Advances in algorithms, theory,
and applications. CRC Press, Boca Raton, Florida.
Bentley, J. L. (1975). Multidimensional binary search
trees used for associative searching. Commun. ACM,
18(9):509–517.
Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U.
(1999). When is “nearest neighbor” meaningful? In
Beeri, C. and Buneman, P., editors, Database The-
ory — ICDT’99, pages 217–235, Berlin, Heidelberg.
Springer Berlin Heidelberg.
B
¨
ohm, C., Kailing, K., Kr
¨
oger, P., and Zimek, A. (2004a).
Computing clusters of correlation connected objects.
In Proceedings of the 2004 ACM SIGMOD Interna-
tional Conference on Management of Data, SIGMOD
’04, pages 455–466, New York, NY, USA. ACM.
B
¨
ohm, C., Kailing, K., Kr
¨
oger, P., and Zimek, A. (2004b).
Computing clusters of correlation connected objects.
In Proceedings of the 2004 ACM SIGMOD Interna-
tional Conference on Management of Data, SIGMOD
’04, pages 455–466, New York, NY, USA. Associa-
tion for Computing Machinery.
Campello, R. J. G. B., Moulavi, D., and Sander, J. (2013).
Density-based clustering based on hierarchical den-
sity estimates. In Pei, J., Tseng, V. S., Cao, L., Mo-
toda, H., and Xu, G., editors, Advances in Knowledge
Discovery and Data Mining, pages 160–172, Berlin,
Heidelberg. Springer Berlin Heidelberg.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein,
C. (2009). Introduction to algorithms. MIT Press,
Cambridge, Massachusetts.
Dinler, D. and Tural, M. K. (2016). A Survey of Con-
strained Clustering, pages 207–235. Springer Inter-
national Publishing, Cham.
Duda, R. O. and Hart, P. E. (1972). Use of the hough trans-
formation to detect lines and curves in pictures. Com-
mun. ACM, 15(1):11–15.
Ert
¨
oz, L., Steinbach, M., and Kumar, V. (2003). Finding
Clusters of Different Sizes, Shapes, and Densities in
Noisy, High Dimensional Data, pages 47–58. Society
for Industrial and Applied Mathematics, Philadelphia,
PA.
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996).
A density-based algorithm for discovering clusters a
density-based algorithm for discovering clusters in
large spatial databases with noise. In Proceedings of
the Second International Conference on Knowledge
Discovery and Data Mining, KDD ’96, pages 226–
231, Palo Alto, California. AAAI Press.
Fisher, R. A. (1936). The use of multiple measurements in
taxonomic problems. Annals of Eugenics, 7(2):179–
188.
Friedman, J. H. (1994). An overview of predictive learning
and function approximation. In Cherkassky, V., Fried-
man, J. H., and Wechsler, H., editors, From Statistics
to Neural Networks, pages 1–61, Berlin, Heidelberg.
Springer Berlin Heidelberg.
KDIR 2020 - 12th International Conference on Knowledge Discovery and Information Retrieval
114