
 
6 CONCLUSION AND FUTURE 
WORK 
We have presented new graphical or cooperative 
(using both a graphical and an automatic part) 
methods useful for classification tasks in data 
mining. 
The first method is a graphical evaluation of the 
quality of the SVM result by the way of a histogram 
displaying the data distribution according to the 
distance to the separating surface. This method is 
very useful to evaluate the quality of the frontier. It 
has been presented to evaluate the results of SVM 
algorithms but it can be used for any other type of 
frontier (like a cut in a decision tree, a regression 
line, etc) and for any dataset size. 
Then this tool is linked with scatter-plot matrices to 
try to explain the results of the SVM. Today, all 
SVM algorithms are used as "black-box", they give 
good results (high accuracy) but it is impossible to 
explain them. We use a set of two-dimensional 
projections to try to explain these results. The same 
linked views can also be used to help the user in the 
parameter tuning step (for example by avoiding fine 
tuning when the margin is very large, or avoiding to 
tune parameters with a wrong kernel function). Here 
the accuracy will not be increased, it is only the time 
needed to perform the classification that is reduced. 
And last cooperative algorithms, using both 
automatic and interactive parts, are used to deal with 
very large (either in row or column) datasets. This 
allows us to increase the accuracy and the 
comprehensibility of the obtained models and to 
reduce the time needed to perform the classification. 
We have started to use the same kind of approach 
for the unsupervised classification (clustering) and 
outlier detection tasks in high-dimensional datasets. 
REFERENCES 
Becker R., Cleveland W. and Wilks A., 1987. Dynamics 
Graphics for Data Analysis, Statistical Science, 2:355-
395. 
Blake C. and Merz C., 1998. UCI Repository of Machine 
Learning Databases. 
http://www.ics.uci.edu/~mlearn/ML-Repository.html. 
Caragea, D., Cook, D. and Honavar, V., 2003. Towards 
Simple, Easy-to-Understand, yet Accurate Classifiers, 
in proc. of VDM@ICDM’03, the 3rd Int. Workshop on 
Visual Data Mining, Melbourne, USA, pp. 19-31. 
Collobert, R., Bengio, S. and Bengio, Y., 2002. A parallel 
Mixture of SVMs for Very Large Scale Problems, in 
proc. of Advances in Neural Information Processing 
Systems, NIPS’02, Vol. 14, MIT Press, pp. 633-640. 
Cristianini, N. and Shawe-Taylor, J., 2000. An 
Introduction to Support Vector Machines and Other 
Kernel-based Learning Methods, Cambridge 
University Press. 
Fayyad U., Piatetsky-Shapiro G., Smyth P., Uthurusamy 
R., 1996. Advances in Knowledge Discovery and Data 
Mining, AAAI Press. 
Fung, G. and Mangasarian O., 2001. Proximal Support 
Vector Machine Classifiers, in proc. of the 7th ACM 
SIGKDD, Int. Conf. on KDD’01, San Francisco, USA, 
pp. 77-86. 
Fung G., Mangasarian O. and Shavlik J., 2002. 
Knowledge-Based Support Vector Machine 
Classifiers, in proc. of Neural Information Processing 
Systems, NIPS'2002, Vancouver. 
Fung G. and Mangasarian O., 2004. A Feature Selection 
Newton Method for Support Vector Machine 
Classification,  Computational Optimization and 
Applications, 28(2):185-202. 
Inselberg A. and Avidan T., 1999. The Automated 
Multidimensional Detective, in proc. of IEEE 
Infoviz'99, 112-119. 
Jinyan, L. and Huiqing, L., 2002. Kent Ridge Bio-medical 
Data Set Repository. 
http://sdmc.lit.org.sg/GEDatasets. 
Lee, Y-J. and Mangasarian, O., 2000. RSVM, Reduced 
Support Vector Machines, Data Mining Institute 
Technical Report 00-07, Computer Sciences 
Department, University of Wisconsin, Madison, USA. 
Poulet F., 2002. Cooperation between Automatic 
Algorithms, Interactive Algorithms and Visualization 
Tools for Visual Data Mining, in proc. 
VDM@ECML/PKDD'2002, the 2nd Int. Workshop on 
Visual Data Mining, Helsinki, Finland. 
Poulet, F., 2004, Towards Visual Data Mining, in proc. of 
ICEIS'04, the 6th Int. Conf. on Enterprise Information 
Systems, Porto, Portugal, Vol. 2, pp. 349-356. 
Poulet, F. and Do, T-N., 2004. Mining Very Large 
Datasets with Support Vector Machine Algorithms, in 
Enterprise Information Systems V, Camp O., Piattini 
M. and Hammoudi S. Eds, Kluwer, 177-184. 
Poulet F., 2002. FullView: A Visual Data Mining 
Environment, in International Journal of Image and 
Graphics, 2(1):127-143. 
Shneiderman B., 2002. Inventing Discovery Tools: 
Combining Information Visualization with Data 
Mining, in Information Visualization 1(1), 5-12. 
Vapnik V., 1995, 
The Nature of Statistical Learning 
Theory, Springer-Verlag, New York. 
Wong P., 1999. Visual Data Mining, in IEEE Computer 
Graphics and Applications, 19(5), 20-21. 
ICEIS 2005 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
314