TOWARDS HIGH DIMENSIONAL DATA MINING WITH BOOSTING OF PSVM AND VISUALIZATION TOOLS

Thanh-Nghi Do, François Poulet

Abstract

We present a new supervised classification algorithm using boosting with support vector machines (SVM) and able to deal with very large data sets. Training a SVM usually needs a quadratic programming, so that the learning task for large data sets requires large memory capacity and a long time. Proximal SVM proposed by Fung and Mangasarian is another SVM formulation very fast to train because it requires only the solution of a linear system. We have used the Sherman-Morrison-Woodbury formula to adapt the PSVM to process data sets with a very large number of attributes. We have extended this idea by applying boosting to PSVM for mining massive data sets with simultaneously very large number of datapoints and attributes. We have evaluated its performance on several large data sets. We also propose a new graphical tool for trying to interpret the results of the new algorithm by displaying the separating frontier between classes of the data set. This can help the user to deeply understand how the new algorithm can work.

References

  1. Blake, C., Merz, C., 1998. UCI Repository of Machine Learning Databases.
  2. Breiman, L., 1996. Bias, Variance and Arcing Classifiers. In Technical Report 460, Statistics Department, University of California.
  3. Chang, C-C., Lin C-J., 2003. A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/cjlin/libsvm
  4. Cristianini, N. and Shawe-Taylor, J., 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
  5. Do, T-N., Poulet, F., 2003. Incremental SVM and Visualization Tools for Bio-medical Data Mining. In proc. of Workshop on Data Mining and Text Mining in Bioinformatics, ECML/PKDD'03, CavtatDubrovnik, Croatia, pp. 14-19.
  6. Fayyad, U., Grinstein, G., Wierse, A., 2001. Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann Publishers.
  7. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., 1996. Advances in Knowledge Discovery and Data Mining. AAAI Press.
  8. Freund, Y., Chapire, R., 1999. A Short Introduction to Boosting. In Journal of Japanese Society for Artificial
Download


Paper Citation


in Harvard Style

Do T. and Poulet F. (2004). TOWARDS HIGH DIMENSIONAL DATA MINING WITH BOOSTING OF PSVM AND VISUALIZATION TOOLS . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 972-8865-00-7, pages 36-41. DOI: 10.5220/0002639500360041


in Bibtex Style

@conference{iceis04,
author={Thanh-Nghi Do and François Poulet},
title={TOWARDS HIGH DIMENSIONAL DATA MINING WITH BOOSTING OF PSVM AND VISUALIZATION TOOLS},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2004},
pages={36-41},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002639500360041},
isbn={972-8865-00-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - TOWARDS HIGH DIMENSIONAL DATA MINING WITH BOOSTING OF PSVM AND VISUALIZATION TOOLS
SN - 972-8865-00-7
AU - Do T.
AU - Poulet F.
PY - 2004
SP - 36
EP - 41
DO - 10.5220/0002639500360041