Authors:
Moïri Gamboni
;
Abhijai Garg
;
Oleg Grishin
;
Seung Man Oh
;
Francis Sowani
;
Anthony Spalvieri-Kruse
;
Godfried T. Toussaint
and
Lingliang Zhang
Affiliation:
New York University Abu Dhabi, United Arab Emirates
Keyword(s):
Machine Learning, Data Mining, Support Vector Machines, SMO, Training Data Condensation, k-nearest Neighbour Methods, Blind Random Sampling, Guided Random Sampling, Wilson Editing, Gaussian Condensing.
Related
Ontology
Subjects/Areas/Topics:
Classification
;
Instance-Based Learning
;
Kernel Methods
;
Pattern Recognition
;
Theory and Methods
Abstract:
Several methods for reducing the running time of support vector machines (SVMs) are compared in terms of speed-up factor and classification accuracy using seven large real world datasets obtained from the UCI Machine Learning Repository. All the methods tested are based on reducing the size of the training data that is then fed to the SVM. Two probabilistic methods are investigated that run in linear time with respect to the size of the training data: blind random sampling and a new method for guided random sampling (Gaussian Condensing). These methods are compared with k-Nearest Neighbour methods for reducing the size of the training set and for smoothing the decision boundary. For all the datasets tested blind random sampling gave the best results for speeding up SVMs without significantly sacrificing classification accuracy.