the system detects a nondominant notational variant
of a word in an input sentence, it is underlined and
turns red, and the system shows the frequency infor-
mation of notational variants of the word and gives
users chances to consider the reasons why they used
nondominant variants. In Figure 6 (a), a user gives an
input sentence, tabako wo yameru no ha muzukashii
[it is hard to stop smoking], to the system. Then,
as shown in Figure 6 (b), the system detects a non-
dominant notational variant, muzukashii [hard], in the
input sentence. muzukashii [hard] is underlined and
turns red, and the frequency information is shown. In
this way, the key to detecting nondominant notational
variants is notational variant dictionaries. In section
3.2, we show how to develop notational variant dic-
tionaries.
3.2 Development of Notational Variant
Dictionaries
In this study, we assumed that suitable notational vari-
ants are used dominantly in official, business, or tech-
nical documents, on the other hand, unsuitable ones
are inferior or not found in these documents. If the as-
sumption is proper, unsuitable notational variants can
be detected by confirming whether they are used dom-
inantly in official, business, or technical documents.
In order to confirm whether notational variants are
used dominantly, we extracted examples of notational
variants from
• 296364 newspaper articles published in the
Mainichi Newspaper from January 2006 to June
2006 (Mainichi 07).
• 319 technical reports published in the 12th Annual
Meeting of the Association for Natural Language
Processing (2006).
and developed notational variant dictionaries. In this
study, we used newspaper articles because we aimed
to acquire notational variants of words which used in
various domains. On the other hand, we used tech-
nical reports because we aimed to acquire notational
variants of words in specific domains and develop do-
main specific dictionaries of notational variants. The
reason why we developed domain specific dictionar-
ies of notational variants was that dominant nota-
tional variants may vary with document domains. By
switching domain specific dictionaries of notational
variants, our system can confirm whether notational
variants are suitable to compose documents in the spe-
cific domains. In this study, we acquired notational
variants in a specific domain from technical reports
published in the Annual Meeting of the Association
for Natural Language Processing (2006). Some of the
technical reports were given to the students, who took
part in the experiment described in Section 4, as ref-
erence works. This is one reason why we extracted
examples of notational variants from the technical re-
ports. Sentences in these documents were segmented
into words by using a Japanese morphological ana-
lyzer, JUMAN (Kurohashi 05). When JUMAN finds
a notational variant, it gives a variant label to the vari-
ant. The same variant label is given to notational vari-
ants of a word. By using these variant labels, we ex-
tracted notational variants and developed two dictio-
naries of
• notational variants in newspaper articles, and
• notational variants in technical reports of natural
language processing.
Table 1 shows the results of the notational variant ex-
traction from newspaper articles and technical docu-
ments. The most frequent notational variant of each
word was considered as the dominant notational vari-
ant.
As shown in Table 1, notational variants of 27988
and 9211 words were extracted from the newspaper
articles and technical documents, respectively. These
words can be classified into two types:
TYPE I a word of this type has actually two or more
notational variants, however, only one of them
was found in the newspaper articles or technical
documents.
TYPE II a word of this type has two or more nota-
tional variants which were found in the newspaper
articles or technical documents.
Table 2 shows the unique and total number of no-
tational variants of TYPE II words in the newspa-
per articles and technical documents. In order to
show how much the dominant notational variant of
a word is used dominantly, we introduced dominant
degree. Suppose that a word has notational variant i
(i = 1, ··· ,N). The dominant degree of the word is
calculated as follows:
d =
f
d
N
∑
i=1
f
i
where d is the dominant degree of the word, f
i
and
f
d
are the frequencies of notational variant i and the
dominant notational variant of the word, respectively.
Figure 7 shows the histograms of the dominant de-
grees of TYPE II words in the newspaper articles and
technical documents. In Figure 7, the broken lines
showthe histograms of the dominantdegreesof all the
TYPE II words in the newspaper articles and technical
documents. On the other hand, the thick lines show
WRITING SUPPORT SYSTEM DEALING WITH NOTATIONAL VARIANT SELECTION
77