Density-based Clustering using Automatic Density Peak Detection
Huanqian Yan, Yonggang Lu and Heng Ma
School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu 730000, China
Keywords: Clustering, Pattern Recognition, Decision Graph, Image Segmentation.
Abstract: Clustering is an important unsupervised machine learning method which has played an important role in
various fields. Density-based clustering methods are capable of dealing with clusters of different sizes and
shapes. As suggested by Alex Rodriguez et al. in a paper published in Science in 2014, the 2D decision
graph of the estimated density value versus the minimum distance from the points with higher density
values for all the data points can be used to identify the cluster centroids. However, there lack automatic
methods for the determination of the cluster centroids from the decision graph. In this work, a novel
statistic-based method is designed to identify the cluster centroids automatically from the decision graph. So
the number of clusters is also automatically determined. Experiments on several synthetic and real-world
datasets show the superiority of the proposed method in centroid identification from the datasets with
various distributions and dimensionalities. Furthermore, it is also shown that the proposed method can be
effectively applied to image segmentation.
1 INTRODUCTION
Clustering is the process of grouping a set of data
objects into multiple groups or clusters so that
objects within a cluster have high similarity, but are
very dissimilar to objects in other clusters.
Dissimilarities or similarities are assessed based on
the attribute values describing the objects using
certain distance measures (Law, Urtasun, and Zemel,
2017). Clustering is an important technique for
exploratory data analysis, and has been studied for
many years. It has been shown to be useful in many
practical domains such as data classification and
image processing (Piotr, 2012).
Clustering is generally considered as a difficult
problem because the optimal number of clusters
cannot be easily determined and clusters may have
different distributions, shapes and sizes (Lu and
Wan, 2012). It has been shown that clustering is a
nonconvex, discrete optimization problem. Due to
the existence of many local minima, there is
typically no way to find a globally minimal solution
without trying all possible partitions (Kleinberg,
2003). Although many heuristic methods have been
developed, most of them are not generic enough and
can only be used for particular clustering problems.
Most clustering algorithms are based on two popular
techniques known as hierarchical and partitional
clustering. The partitional clustering algorithms
include square-error-based clustering methods,
density-based clustering methods, distribution-based
clustering methods and so on.
For hierarchical methods, they can be classified
as being either agglomerative or divisive, based on
how the hierarchical decomposition of the given set
of data objects is formed (Grant and Flynn, 2016;
Charikar and Chatziafratis, 2017). Hierarchical
clustering methods don’t need some strict initial
conditions, but they suffer from the mechanism that
a previous merge or split cannot be changed during
the following process.
For square-error-based clustering methods, such
as k-means (Wagstaff et al., 2001), k-medoids
(Kaufman and Rousseeuw, 2009), and affinity
propagation (Frey and Dueck, 2007; Serdah and
Ashour, 2016). An objective function, typically the
sum of the distance to a set of putative cluster
centers, is optimized until the best cluster center
candidates are found (Serdah and Ashour, 2016;
Ward, 1963; Hoppner, 1999; Jain, 2010). However,
for k-means and k-medoids, because a data point is
always assigned to the nearest center, they cannot be
used to detect non-globular clusters (Jain, 2010). For
affinity propagation method, with an improper initial
exemplar preference, it may fail to work properly.
Most square-error-based methods are greedy
Yan, H., Lu, Y. and Ma, H.
Density-based Clustering using Automatic Density Peak Detection.
DOI: 10.5220/0006572300950102
In Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2018), pages 95-102
ISBN: 978-989-758-276-9
Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
95