4.1 An Idea of the Reinforcement
Main idea of the reinforcement (Sutton, 1996) is to
modify a behavior of the neural network depending
of a weight of a keywords candidate. At the begin-
ning we need to initiate the weight attribute (with val-
ues from interval [0,1]) of every word from a docu-
ment. Each word, which can be found in previously
mentioned trash word set has weight equal to 0, rest
of them have weights equal to 1. The neural net-
work algorithm is modified in such way, that words
with smaller weight are pushed away from words with
greater weight. It means that they are pushed aside the
main categories. Of course they will have also small
influence on category rank. Moreover we need to add
parent iteration, which will modify weights of words
and repeat neural network steps until proper words
will be selected. After performance of word catego-
rization a set of proposed keywords is generated. At
this stage we need to check every keyword for it’s ac-
curacy. This is performed by checking a number (in
our tests it was 10) of articles (containing tested key-
word) randomly selected from repository and compar-
ing normalized distances between them and the an-
alyzed document. If these documents are relatively
close (in the terms of counted distances) to initial one,
a keyword is prized with increasing it’s weight. If dis-
tances are relatively far, weight is decreased. In other
words, if an selected keyword is good, a network is re-
warded. With this improvement, algorithm continues
with steps of neural network learning.
4.2 The Results Propriety
Methodology of creating repository, which is de-
scribed in (Zyglarski et al., 2008) guarantees, that col-
lected documents has various subjects. They are gath-
ered with using of most frequent words appeared in
each document. Additional variety is an result of the
collection which initiates repository - containing ar-
ticles from various areas of interests. It means that
there is a big chance, for articles containing tested
keyword to be really connected with the same sub-
ject. If a keyword candidate isn’t really a keyword,
these documents will probably differ from tested one
and network will not be reinforced.
5 THE COMPARISON OF
RESULTS
Presented method gives better results than the sim-
ple statistical method. In table 2 we show keywords
found over this article, chosen with using all three
methods (with italic font there are marked actual (sub-
jectively selected by authors) keywords). It’s clear
that Kohonen Networks related methods gives better
results than statistical method and also reinforcement
has very good influence on final results.
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Figure 13: Effects of statistical method. X axis shows ef-
fectiveness, Y axis shows number of documents with pro-
cessed with this effectiveness.
Figure 14: Effects of neural network method. X axis shows
effectiveness, Y axis shows number of documents with pro-
cessed with this effectiveness.
In our tests we’ve used about 200 various articles.
In most cases results given by our approach was more
accurate than other approaches. The accuracy was
checked manually and is subjective. Finally, accord-
ing to executed tests, statistical methods gave very
poor results (see Figure 13). In the best case list of
proposed keywords achieved 65% accuracy. In the
worst case it was about 5%. At the figure 13 there
are presented accuracies of results from tested arti-
cles (for example: in 66% of articles accuracy of key-
words was at level between 20% and 40%). Better
results archived with Kohonen Networks without the
reinforcement are presented at the figure 15). In the
best case, list of proposed keywords achieved almost
80% accuracy. 10%-40% accuracy was in this case
very less often.
The best result were generated with using Rein-
forced Kohonen Networks, where best results reached
KMIS 2009 - International Conference on Knowledge Management and Information Sharing
60