The paper is organised as follows: Section I
provides an overview of SVM, its working and how
it lends itself for the type of categorisation work
necessary in e-learning paradigm, Section II
provides a brief description of the SVM Torch
software. Section III describes the approach used
and the pre-processing of the document. Section IV
gives the complete implementation of the Text
categorisation using the SVM software and gives
experimental results and the analysis. The method
for representing the documents and the use of SVM
Torch in classification of text documents has been
dealt therein. The code for the document pre-
processing, commands to be executed for training
SVM and classification of documents and sample
screen shots have also been included.
2 SECTION I
2.1 Support Vector Machines
The foundations of Support Vector Machines (SVM)
have been developed by Vapnik (Vapnik,V.,1995)
and are gaining popularity due to many attractive
features, and promising empirical performance. The
formulation embodies the Structural Risk
Minimisation (SRM) principle, which is superior
(Gunn, S.R., et al., 1997), to traditional Empirical
Risk Minimisation (ERM) principle, employed by
conventional neural networks. SRM minimises an
upper bound on the expected risk, as opposed to
ERM that minimises the error on the training data.
This is the difference that equips SVM with a greater
ability to generalise, which is the goal in statistical
learning. SVMs were developed to solve the
classification problem, but recently they have been
extended to the domain of regression problems
(Vapnik,V., et al., 1997). The term SVM is typically
used to describe classification with support vector
methods and support vector regression is used to
describe regression with support vector methods.
Kernel Selection is an important issue and the
obvious question that arises is that with so many
different mappings to choose from, which is the best
for a particular problem? This is not a new question,
but with the inclusion of many mappings within one
framework it is easier to make a comparison. The
upper bound on the VC dimension is a potential
avenue to provide a means of comparing the kernels.
However, it requires the estimation of the radius of
the hypersphere enclosing the data in the non-linear
feature space.The problem of empirical data
modelling is germane to many engineering
applications. In empirical data modelling a process
of induction is used to build up a model of the
system, from which it is hoped to deduce responses
of the system that are yet to be observed. Ultimately
the quantity and quality of the observations govern
the performance of this empirical model. By its
observational nature data obtained is finite and
sampled. This sampling is non-uniform and due to
the high dimensional nature of the problem the data
will form only a sparse distribution in the input
space. Consequently the problem is nearly always ill
posed (Poggio,T. et al., 1985) in the sense of
Hadamard (Hadamard,J., 1923). Traditional neural
network approaches have suffered difficulties with
generalisation, producing models that can over fit
the data. This is a consequence of the optimisation
algorithms used for parameter selection and the
statistical measures used to select the ’best’ model.
As a final caution, even if a strong theoretical
method for selecting a kernel is developed, unless
this can be validated using independent test sets on a
large number of problems, methods such as
bootstrapping and cross-validation will remain the
preferred method for kernel selection.
2.2 SVM and Document
Classification
The emergence of Student Centred Learning Models
in E-Learning has led to lecture presentation as per
the learner’s choice and state of knowledge. In this
context text categorization has become an area of
intense research. Several methods have been
proposed for such purpose(Y.H.Li, et.al.in The
Computer Journal, Vol.41, No.8, 1998) but we chose
Support Vector Machines as they can be used
efficiently for document classification by using a
feature representation and a kernel method that best
represents the features. The process of representing
the document by a feature representation is termed
as pre-processing. The features need to be mapped
to numerals as SVM software work with numerals.
Support Vector Machine is essentially calculated as
weighted sum of kernel function outputs. The kernel
itself could be polynomial, RBF Gaussian or any
other function satisfying Mercer’s conditions. We
used the polynomial as the kernel function.
Conceptually the Lagrangian formed with the
objective function (constructed from the SVM
output expression) is minimised and the
classification margin maximised for the training set.
2.3 Feature representation
A feature is a word in the document classification. A
feature vector consists of the various words from a
dictionary formed by analysing the documents.
There are various alternatives and enhancements in
ICEIS 2005 - SPECIAL SESSION ON EFFICACY OF E-LEARNING SYSTEMS
192