Authors:
Thanh-Nghi Do
and
François Poulet
Affiliation:
ESIEA Recherche, France
Keyword(s):
Data mining, Classification, Machine learning, SVM, Incremental learning, Boosting, Visualization.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Business Analytics
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Datamining
;
Enterprise Information Systems
;
Health Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
We present a new supervised classification algorithm using boosting with support vector machines (SVM) and able to deal with very large data sets. Training a SVM usually needs a quadratic programming, so that the learning task for large data sets requires large memory capacity and a long time. Proximal SVM proposed by Fung and Mangasarian is another SVM formulation very fast to train because it requires only the solution of a linear system. We have used the Sherman-Morrison-Woodbury formula to adapt the PSVM to process data sets with a very large number of attributes. We have extended this idea by applying boosting to PSVM for mining massive data sets with simultaneously very large number of datapoints and attributes. We have evaluated its performance on several large data sets. We also propose a new graphical tool for trying to interpret the results of the new algorithm by displaying the separating frontier between classes of the data set. This can help the user to deeply understand
how the new algorithm can work.
(More)