however, the parameter dose not influence the results
so much. We use a linear kernel function for SVM.
All features for a key are normalized so that their max
value equals 1.
In addition to the features explained in Section
three, we also use the squares of outlink
2
, category
2
,
relate, define
2
, and webHit. Thus, we use 19 features.
Table 1 tells us that methods using SVM are supe-
rior to others in all evaluation measurements. Rank-
Boost and Coordinate Ascent are relatively good ac-
cording to NDCG@10. RankNet and AdaRank are
worse than Base Line.
Table 1: Results of the proposed method.
method MAP NDCG@10 P@10
Base Line 0.672 0.803 0.624
RankNet 0.672 0.800 0.624
AdaRank 0.680 0.805 0.634
SVM-MAP 0.723 0.830 0.682
Coordinate Ascent 0.740 0.838 0.682
SVM Regression 0.744 0.844 0.689
Ranking SVM 0.744 0.843 0.692
Rank Boost 0.746 0.844 0.696
Table 2 shows comparison of Wikipedia thesaurus
and Ranking SVM. We get 300 words related to a
word with a WebAPI of Wikipedia thesaurus. On the
other hand, we get 30 words related to the word from
the Web page of Wikipedia thesaurus. Accordingly,
we regards both the 30 words from the Web page and
300 words from WebAPI as the related words from
Wikipedia thesaurus.
There are several keys for which we cannot
obtain related words from Wikipedia thesaurus.
Additionally, there are many words which Ranking
Table 2: Comparison of Wikipedia thesaurus.
method MAP NDCG@10 P@10
Wikipedia thesaurus 0.670 0.820 0.500
Ranking SVM 0.761 0.853 0.561
SVM evaluates but Wikipedia thesaurus does not
mention. Therefore, for the comparison, we utilize
words which both Wikipedia thesaurus and Ranking
SVM deal with. Table 2 tells us that Ranking SVM
is superior to Wikipedia thesaurus. This is a debat-
able point because we exclude many words for the
comparison. According to the literature (Nakayama
et al., 2009), Wikipedia thesaurus also utilizes a ma-
chine learning technique with training data different
from ours. Thus, there is a room for further investiga-
tion.
4.2 Effect of Features
Table 3 shows the difference between the effect of an
individual feature and that of the collection of fea-
tures. All values are 10-fold of their original values.
The last row (ALL) shows the evaluations with all fea-
tures described in 3. The other rows show the evalu-
ations with all features except the specific feature or
the collection of features indicated in the first column.
The number in parentheses indicates the difference
between the evaluation and that of ALL. The out-
links collection is a collection of outlink
1
, outlink
2
,
outlink
3
, and define
1
. The inlinks collection is a col-
lection of inlink
1
and define
2
. The search collection
is a collection of ngd and webHit.
The table tells us that inlink
1
is the most effective
feature and define
2
is the second most effective fea-
ture. Then, webHit, relate, outlink
3
, category
2
and
morpSim follow. The outlinks collection is the most
effective collection of features because its sum of dif-
ferences is the lowest. The inlinks collection and
search collection are also effective in general because
their sums of differences are small.
We tried several normalizations. Table 4 shows
the evaluations of the five normalizations. These eval-
uations are obtained by Ranking SVM. The table tells
us that the normalization, such that the maximum of
each feature is 1 for each key, is the best one. This
suggests that rank learning with key and a set of its
related words is suitable for a task to extract good re-
lated words.
5 CONCLUSIONS
In this study, we extracted and scored various features
from Wikipedia pages. We have proposed a method
for extracting some related words by rank learning.
As resources, not only Wikipedia but also informa-
tion given by a search engine are used. Our proposed
methods are able to find the suitable combination of
features based on machine learning. The results indi-
cate that Ranking SVM with combining various fea-
tures achieves the best accuracy. Normalization ex-
periments show that the framework of rank learning
is effective for extracting related words. Compared to
the Base Line and Wikipedia thesaurus, the best com-
bination of learning machines contributes to improve
accuracy more significantly.
In the future research, we are going to extract re-
latedness between two words and semantic relation-
ship from Web by using machine learning, a proba-
bilistic model and Web ontology.
EVALUATING RERANKING METHODS BASED ON LINK CO-OCCURRENCE AND CATEGORY IN WIKIPEDIA
281