Figure 1: Concept (automobile) is extended to second or-
der attribute. This figure shows association words for the
concept (Automobile).
ies and newspapers. Headwords in dictionaries were
assumed to be concepts and content words in explana-
tion sentences were assumed to be attributes for head-
words (concepts). A concept (A) consists of pairs of
attributes (a
i
) which characterizing the concept (A)
and weights (w
i
) which mean the importance of each
attributes (eif is a natural number for each concepts,
’znum’ is a number of attributes)(1).
A = (a
i
, w
i
)|0 < i < znum+ 1. (1)
Attributes for each concept were also defined in
Concept-base as concepts. Therefore, one concept
was defined as attributes chain model of n-th-order
dimension. In this paper, Concept-base has about
120,000 concepts, and each concept has 30 attributes
on average. Fig.1 shows the example of concept (au-
tomobile). eAutomobilef has attributes (engine, car,
tire, etc.). eEnginef, eCarf, and eTiref are also de-
fined in Concept-base. Thus, eEnginef has attributes
(Combustion, Motor, etc.).
In this paper, we aim to construct an automati-
cally learning method for Concept-base using search
engine.
2.2 Auto Feedback
An undefined concept in Concept-base was input, and
the documents that were de-scribed about the unde-
fined concept, were obtained from the retrieval result
pages of search engine. The words included in the re-
trieval result pages were attributes of undefined con-
cepts. The weight of each attribute was granted by
t f and id f. t f was the frequency that undefined con-
cepts appear in the retrieval result pages. id f was cal-
culated from the number of the retrieval pages and the
number of all pages of search engine. Table.1 showed
examples of the obtained attributes of undefined con-
cepts.
In this research, we obtained 100 candidate at-
tributes descending in weight order by Auto Feed-
back. The Auto Feedback got attributes at the point
in time when I retrieved undefined words. Therefore
retrieval results were influenced by a temporary topic,
and it was considered that Auto Feedback was not
able to obtain attributes definitely.
Table 1: The attributes of undefined concepts gHarrison
Fordh and gFinePixh.
Harrison Ford FinePix
attributes weights attributes weights
movie 225.16 digital 331.21
actor 120.77 camera 326.95
appearance 87.46 pixel 301.11
2.3 Revision of Morphological Analysis
This paper used MeCab(Kudo et al., 2004) as a Mor-
phological Analyzer. Japanese have no custom leav-
ing a space between words like English. A problem
to divide sentences needlessly too much happened
when we used a Morphological Analyzer. It unnec-
essarily divid-ed a sentence into words by the default
MeCabfs setting. It had an original revision rule for
this problem. However, we set a simple rule without
using its rule.
1. Connecting words and phrases in the parenthesis.
2. Connecting if nouns were next to each other.
For example, in the case of a sentence gJiEVJh,
uJiEVJiNAUSICAA/of Valley of the Windjvwas di-
vided withuv,uJv,uiEVJv before reviewing setting,
and the title of the movie is divided needlessly. We
united nouns to be adjacent by uv after the setting
changed, we can extract uJiEVJv(Table 2).
2.4 Proposal Method
Auto Feedback was a method to learn undefined con-
cepts on the spot. Consequently, the method paid no
attention to changing with time of Internet. Proposal
method re-peats the Auto Feedback trial many times
and refines attributes and weights of undefined con-
cepts statistically (Fig. 2).
2.5 Evaluation Method
In this section, we explain the evaluation method of
our work.
Evaluation Method. Three subjects evaluated
these acquired all attributes (about 20,000 words).
We adopted attributes which two or three subjects
answered suitable as correct words. In all of Auto
Feedback trials, we calculated precision (Eq.3), recall
(Eq.4), and F-measure (Eq.4).
About Recall, there may be correct attributes other
than acquired attributes using Auto Feedback. How-
ever, it was difficult to collect attributes that human
beings thought suitable by a questionnaire. In this
research, we adopted correct attributes evaluated by
KEOD 2011 - International Conference on Knowledge Engineering and Ontology Development
494