Authors:
Raneem Qaddoura
1
;
Hossam Faris
2
;
Ibrahim Aljarah
2
;
J. J. Merelo
3
and
Pedro A. Castillo
3
Affiliations:
1
Information Technology, Philadelphia University, Amman, Jordan
;
2
Department of Business Information Technology, King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan
;
3
ETSIIT-CITIC, University of Granada, Granada, Spain
Keyword(s):
Clustering, Cluster Analysis, Distance Measure, Nearest Point with Indexing Ratio, NPIR, Nearest Point, Indexing Ratio, Nearest Neighbor Search Technique.
Abstract:
Selecting the proper distance measure is very challenging for most clustering algorithms. Some common distance measures include Manhattan (City-block), Euclidean, Minkowski, and Chebyshev. The so called Nearest Point with Indexing Ratio (NPIR) is a recent clustering algorithm, which tries to overcome the limitations of other algorithms by identifying arbitrary shapes of clusters, non-spherical distribution of points, and shapes with different densities. It does so by iteratively utilizing the nearest neighbors search technique to find different clusters. The current implementation of the algorithm considers the Euclidean distance measure, which is used for the experiments presented in the original paper of the algorithm. In this paper, the impact of the four common distance measures on NPIR clustering algorithm is investigated. The performance of NPIR algorithm in accordance to purity and entropy measures is investigated on nine data sets. The comparative study demonstrates that the
NPIR generates better results when Manhattan distance measure is used compared to the other distance measures for the studied high dimensional data sets in terms of purity and entropy.
(More)