Table 5: Classification results (MAP).
Method MAP
cosine 0.45
cosine + JST Thesaurus 0.46
cosine + expanded thesaurus 0.46
Naive Bayes 0.63
Naive Bayes + JST Thesaurus 0.64
Naive Bayes + expanded thesaurus 0.64
k-NN (BOLA1 (Kim et al., 2005)) 0.69
Naive Bayes (JSPAT2) 0.66
k-NN (WGLAB9) 0.62
VSM (FXDM3) 0.49
We used Mean Average Precision (MAP) to compare
these results. MAP is defined by the following equa-
tion
MAP(Q) =
1
|Q|
|Q|
∑
j=1
1
m
j
m
j
∑
k=1
Precision(R
jk
) (4)
where Q is set of test documents, m
j
is the number of
relevant documents of document
j
, and R
jk
means kth
ranked retrieval results of document
j
.
Table 5 shows results of document classification.
In Table 5, BOLA1, JSPAT2, WGLAB9 and FXDM3
are RunID of NTCIR5 Patent Retrieval Task. BOLA1
used k-NN and structure of patent documents. JS-
PAT2 used Naive Bayes. WGLAB9 used k-NN,
where retrieval model is BM11 or the vector space
model. FXDM3 used vector space model.
4 DISCUSSION
We expanded a technical term thesaurus using
Japanese patent documents. To confirm our thesaurus
is useful for text classification, we compared the re-
sults using our thesaurus with results without the the-
saurus. As the results, we found that our thesaurus
is effective for document classification. We also com-
pared our method with the methods in NTCIR5 patent
classification task. Although our method is very sim-
ple, we found our system is competitive to other sys-
tems. We classified 6 semantic tags in the experi-
ments, and applied word expansion in “purpose”.
Future work includes (i) applying the method to
other data for quantitativeevaluation, and (ii) compar-
ing the method with other classification techniques to
evaluate the effectiveness of the method.
ACKNOWLEDGEMENTS
The authors would like to thank the referees for their
comments on the earlier version of this paper. This
work was partially supported by The Telecommuni-
cations Advancement Fundation.
REFERENCES
Fellbaum, C. (1998). WordNet: An Electronic Lexical
Database. Bradford Books.
Hagiwara, M., Ogawa, Y., and Toyama, K. (2006). Selec-
tion of effective contextual information for automatic
synonym acquisition. In In Proc. of the 21st Interna-
tional Conference on Computational Linguistics and
44th Annual Meeting of the ACL, pages 353–360.
Hindle, D. (1990). Noun classification from predicate-
argument structures. In Proceedings of 28th Annual
Meeting of the Association for Computational Lin-
guistics, pages 268–275.
Iwayama, M., Fujii, A., and Kando, N. (2005). Overview
of classification subtask at ntcir-5 patent retrieval task.
In Proceedings of NTCIR-5 Workshop Meeting.
Japan Science and Technology Agency
(1999). JST (JICST) Thesaurus 1999.
http://jois.jst.go.jp/JOIS/html/thesaurus index.htm.
Kim, J.-H., Huang, J.-X., Jung, H.-Y., and Choi, K.-S.
(2005). Patent document retrieval and classification at
kaist. In Proceedings of NTCIR-5 Workshop Meeting.
Lin, D. (1998). Automatic retrieval and clustering of similar
words. In Proceedings of 36th Annual Meeting of the
Association for Computational Linguistics and 17th
International Conference on Computational Linguis-
tics Proceedings of the Conference, pages 768–774.
National Language Research Institute (1964). Bunruigoi-
hyo. Shuei publisher (In Japanese).
Tokunaga, T. (1997). Extending a thesaurus by classifying
words. In In Proceedings of the ACL-EACL Workshop
on Automatic Information Extraction and Building of
Lexical Semantic Resources, pages 16–21.
Uramoto, N. (1996). Positioning unknown words in a the-
saurus by using information extracted from a corpus.
In In proceedings of COLING’96, pages 956–961.
KEOD 2011 - International Conference on Knowledge Engineering and Ontology Development
428