6 CONCLUSION AND FUTURE
WORK
We have presented new graphical or cooperative
(using both a graphical and an automatic part)
methods useful for classification tasks in data
mining.
The first method is a graphical evaluation of the
quality of the SVM result by the way of a histogram
displaying the data distribution according to the
distance to the separating surface. This method is
very useful to evaluate the quality of the frontier. It
has been presented to evaluate the results of SVM
algorithms but it can be used for any other type of
frontier (like a cut in a decision tree, a regression
line, etc) and for any dataset size.
Then this tool is linked with scatter-plot matrices to
try to explain the results of the SVM. Today, all
SVM algorithms are used as "black-box", they give
good results (high accuracy) but it is impossible to
explain them. We use a set of two-dimensional
projections to try to explain these results. The same
linked views can also be used to help the user in the
parameter tuning step (for example by avoiding fine
tuning when the margin is very large, or avoiding to
tune parameters with a wrong kernel function). Here
the accuracy will not be increased, it is only the time
needed to perform the classification that is reduced.
And last cooperative algorithms, using both
automatic and interactive parts, are used to deal with
very large (either in row or column) datasets. This
allows us to increase the accuracy and the
comprehensibility of the obtained models and to
reduce the time needed to perform the classification.
We have started to use the same kind of approach
for the unsupervised classification (clustering) and
outlier detection tasks in high-dimensional datasets.
REFERENCES
Becker R., Cleveland W. and Wilks A., 1987. Dynamics
Graphics for Data Analysis, Statistical Science, 2:355-
395.
Blake C. and Merz C., 1998. UCI Repository of Machine
Learning Databases.
http://www.ics.uci.edu/~mlearn/ML-Repository.html.
Caragea, D., Cook, D. and Honavar, V., 2003. Towards
Simple, Easy-to-Understand, yet Accurate Classifiers,
in proc. of VDM@ICDM’03, the 3rd Int. Workshop on
Visual Data Mining, Melbourne, USA, pp. 19-31.
Collobert, R., Bengio, S. and Bengio, Y., 2002. A parallel
Mixture of SVMs for Very Large Scale Problems, in
proc. of Advances in Neural Information Processing
Systems, NIPS’02, Vol. 14, MIT Press, pp. 633-640.
Cristianini, N. and Shawe-Taylor, J., 2000. An
Introduction to Support Vector Machines and Other
Kernel-based Learning Methods, Cambridge
University Press.
Fayyad U., Piatetsky-Shapiro G., Smyth P., Uthurusamy
R., 1996. Advances in Knowledge Discovery and Data
Mining, AAAI Press.
Fung, G. and Mangasarian O., 2001. Proximal Support
Vector Machine Classifiers, in proc. of the 7th ACM
SIGKDD, Int. Conf. on KDD’01, San Francisco, USA,
pp. 77-86.
Fung G., Mangasarian O. and Shavlik J., 2002.
Knowledge-Based Support Vector Machine
Classifiers, in proc. of Neural Information Processing
Systems, NIPS'2002, Vancouver.
Fung G. and Mangasarian O., 2004. A Feature Selection
Newton Method for Support Vector Machine
Classification, Computational Optimization and
Applications, 28(2):185-202.
Inselberg A. and Avidan T., 1999. The Automated
Multidimensional Detective, in proc. of IEEE
Infoviz'99, 112-119.
Jinyan, L. and Huiqing, L., 2002. Kent Ridge Bio-medical
Data Set Repository.
http://sdmc.lit.org.sg/GEDatasets.
Lee, Y-J. and Mangasarian, O., 2000. RSVM, Reduced
Support Vector Machines, Data Mining Institute
Technical Report 00-07, Computer Sciences
Department, University of Wisconsin, Madison, USA.
Poulet F., 2002. Cooperation between Automatic
Algorithms, Interactive Algorithms and Visualization
Tools for Visual Data Mining, in proc.
VDM@ECML/PKDD'2002, the 2nd Int. Workshop on
Visual Data Mining, Helsinki, Finland.
Poulet, F., 2004, Towards Visual Data Mining, in proc. of
ICEIS'04, the 6th Int. Conf. on Enterprise Information
Systems, Porto, Portugal, Vol. 2, pp. 349-356.
Poulet, F. and Do, T-N., 2004. Mining Very Large
Datasets with Support Vector Machine Algorithms, in
Enterprise Information Systems V, Camp O., Piattini
M. and Hammoudi S. Eds, Kluwer, 177-184.
Poulet F., 2002. FullView: A Visual Data Mining
Environment, in International Journal of Image and
Graphics, 2(1):127-143.
Shneiderman B., 2002. Inventing Discovery Tools:
Combining Information Visualization with Data
Mining, in Information Visualization 1(1), 5-12.
Vapnik V., 1995,
The Nature of Statistical Learning
Theory, Springer-Verlag, New York.
Wong P., 1999. Visual Data Mining, in IEEE Computer
Graphics and Applications, 19(5), 20-21.
ICEIS 2005 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
314