Authors:
Dhrgam AL Kafaf
;
Dae-Kyoo Kim
and
Lunjin Lu
Affiliation:
Oakland University, United States
Keyword(s):
Efficiency, kNN, k Nearest Neighbor.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Business Analytics
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Datamining
;
Enterprise Information Systems
;
Health Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
The kNN algorithm typically relies on the exhaustive use of training datasets, which aggravates efficiency on large datasets. In this paper, we present the B-kNN algorithm to improve the efficiency of kNN using a two-fold preprocess scheme built upon the notion of minimum and maximum points and boundary subsets. For a given training dataset, B-kNN first identifies classes and for each class, it further identifies the minimum and maximum points (MMP) of the class. A given testing object is evaluated to the MMP of each class. If the object belongs to the MMP, the object is predicted belonging to the class. If not, a boundary subset (BS) is defined for each class. Then, BSs are fed into kNN for determining the class of the object. As BSs are significantly smaller in size than their classes, the efficiency of kNN improves. We present two case studies to evaluate B-kNN. The results show an average of 97\% improvement in efficiency over kNN using the entire training dataset, while making
little sacrifice of the accuracy compared to kNN.
(More)