Authors:
Aarti
;
Jagat Challa
;
Hrishikesh Harsh
;
Utkarsh D.
;
Mansi Agarwal
;
Raghav Chaudhary
;
Navneet Goyal
and
Poonam Goyal
Affiliation:
ADAPT Lab, Dept. of Computer Science, Pilani Campus, Birla Institute of Technology & Science, Pilani, 333031, India
Keyword(s):
Data Streams, Classification, k-Nearest Neighbors Classifier, Anytime Algorithms.
Abstract:
The k-Nearest Neighbor Classifier (k-NN) is a widely used classification technique used in data streams. However, traditional k-NN-based stream classification algorithms can’t handle varying inter-arrival rates of objects in the streams. Anytime algorithms are a class of algorithms that effectively handle data streams that have variable stream speed and trade execution time with the quality of results. In this paper, we introduce a novel anytime k-NN classification method for data streams namely, ANY-k-NN. This method employs a proposed hierarchical structure, the Any-NN-forest, as its classification model. The Any-NN-forest maintains a hierarchy of micro-clusters with different levels of granularity in its trees. This enables ANY-k-NN to effectively handle variable stream speeds and incrementally adapt its classification model using incoming labeled data. Moreover, it can efficiently manage large data streams as the model construction is less expensive. It is also capable of handlin
g concept drift and class evolution. Additionally, this paper also presents ANY-MP-k-NN, a first-of-its-kind framework for anytime k-NN classification of multi-port data streams over distributed memory architectures. ANY-MP-k-NN can efficiently manage very large and high-speed data streams and deliver highly accurate classification results. The experimental findings confirm the superior performance of the proposed methods compared to the state-of-the-art in terms of classification accuracy.
(More)