TF-IDF method because the ConfWeight method re-
quires time-consuming statistical calculations such as
Student distribution calculation and confidence inter-
val definition for each word.
3.4 Novel Term Relevance Estimation
(TRE)
The main idea of the method (Gasanova et al., 2013)
is similar to ConfWeight but it is not so time-
consuming. The idea is that every word that appears
in the article has to contribute some value to the cer-
tain class and the class with the biggest value we de-
fine as a winner for this article.
For each term we assign a real number term rel-
evance that depends on the frequency in utterances.
Term weight is calculated using a modified formula
of fuzzy rules relevance estimation for fuzzy classi-
fiers (Ishibuchi et al., 1999). Membership function
has been replaced by word frequency in the current
class. The details of the procedure are the following:
Let L be the number of classes; n
i
is the number of
articles which belong to the i
th
class; N
i j
is the number
of there j
th
word occurrence in all articles from the i
th
class; T
ji
= N
ji
/n
i
is the relative frequency of the j
th
word occurrence in the i
th
class.
R
j
= max
i
(T
ji
), S
j
= arg(max
i
(T
ji
))is the number
of class which we assign to the j
th
word;
The term relevance, C
j
, is given by
C
j
=
1
∑
L
i=1
T
ji
·
R
j
−
1
L − 1
·
L
∑
i=1,i6=S
j
T
ji
!
. (4)
C
j
is higher if the word occurs more often in one
class than if it appears in many classes. We use novel
TW as an analogy of IDF for text preprocessing.
The learning phase consists of counting the C val-
ues for each term; it means that this algorithm uses
the statistical information obtained from the training
set.
4 CLASSIFICATION
ALGORITHMS AND FEATURES
SELECTION
We have considered 4 different text preprocessing
methods (binary representation, TF-IDF, ConfWeight
and novel TRE method) and compared them using
different classification algorithms. The methods have
been implemented using RapidMiner (Shafait et al.,
2010). The classification methods are:
-k-nearest neighbours algorithm with weighted
vote (we have varied k from 1 to 15);
-kernel Bayes classifier with Laplace correction;
-neural network with error back propagation (stan-
dard setting in RapidMiner);
-fast large margin based on support vector ma-
chine (FLM) (standard setting in RapidMiner).
We use macro F-score as a criterion of classifica-
tion effectiveness. Precision for each class i is cal-
culated as the number of correctly classified articles
for class i divided by the number of all articles which
algorithm assigned for this class. Recall is the num-
ber of correctly classified articles for class i divided
by the number of articles that should have been in this
class. Overall precision and recall are calculated as
the arithmetic mean of the precisions and recalls for
all classes (macro-average). F-score is calculated as
the harmonic mean of precision and recall.
Term weighting provides also a simple features se-
lection. We can ignore terms with low weight values
(idf, Maxstr, or novel TW). In our paper we propose a
novel feature selection method using term weighting.
This method can be applied only for text classification
problems. At first we calculate relative frequency of
each word in each class with the training sample. Af-
ter that we choose the class with the maximum value
of the relative frequency for each word. Therefore,
each word in the vocabulary has the corresponding
class. When we get a new text for classification we
calculate for each class the sum of the weights of the
words which belong to the this class. After this proce-
dure we have number of attributes equals to number
of classes. Therefore, the method provides very small
number of attributes. Also the method can be applied
for binary preprocessing (standard features selection
is impossible for binary preprocessing).
5 RESULTS OF NUMERICAL
EXPERIMENTS
We have implemented 4 different text preprocessing
methods (binary method, TF-IDF, ConfWeight and
the novel TRE method). At first we have measured
computational effectiveness of each text preprocess-
ing technique. We have tested each method 20 times
with the same computer (Intel Core i7 2.90 GHz, 8
GB RAM). Figure 1 compares computational times
for different preprocessing methods.
We can see in Figure 1 that binary preprocessing
is the fastest one. TF-IDF and the novel TRE are ap-
proximately one and a half times slower than binary
preprocessing and they have almost the same compu-
tational efficiency. The most time-consuming method
TextCategorizationMethodsApplicationforNaturalLanguageCallRouting
829