we have small size training data. SVM shows good
performance under these circumstances, so we adopt
SVM as classification algorithm.
We choose 12 features for classifying comma roles
as shown in Table 2. The features are the number of
words, part-of-speech (POS), the number of commas,
the ordinal number of the comma in the sentence, word
forms such as past/present participle, and the existence
of coordinate conjunctions. For example, a sentence,
“My frequent uses of the Internet is sending e-mail, surfing
the Web, and using chat room”, has two commas. [My
frequent uses of the Internet is sending e-mail] is the left
side of the first comma, [surfing the Web] is the right
side. Also [surfing the Web] is the left side of the second
comma, and [and using chat room] is the right side. As a
result, the features for the first comma are as follows:
l_length=9, r_length=3, l_first_POS=ADJ,
l_last_POS=NOUN, r_first_POS=VERB, c_count=2,
c_ord=1, c_count_ord=21, r_first_coordConj=0,
r_first_pastp=0, r_first_three_presp=1,
r_coordConj_exist=0. They are represented as integer
values as follows: 9, 3, 1, 3, 4, 2, 1, 21, 0, 0, 1, 0. In the
same manner, features for the second comma as
follows: 3, 4, 4, 3, 8, 2, 2, 22, 1, 0, 1, 1. These are the
input for SVM training and test.
Table 2: Features for Classifying Comma Roles.
# of words in left side of the comma
# of words in right side of the comma
POS of the first word in left-side of the
comma
POS of the last word in left-side of the
comma
POS of the first word in right-side of the
comma
# of commas in the sentence
order of the comma in the sentence
Combination of total comma count and
the order of the comma
whether POS of the first word in the
right side of the comma is coordinate
conjunction or not
whether the first word in the right side
of the comma is a past participle form or
not
whether one of the the first three words
in the right side of the comma is a
present participle form or not
Whether one of the words in the right
side of the comma is a coordinate
conjunction or not
4 COMMA PROCESSING
METHOD
Figure 1 shows the translation process of the English-
Korean machine translation system in this paper. The
system uses sentence segmentation method for
efficient analysis. An input sentence is segmented at
the comma positions in the 1
st
segmentation. Among
the resulting segments, long segments (currently
longer than 15 words) are again split in the 2
nd
segmentation step. In parsing step, each segment is
parsed and the resulting structures are combine in
parse tree combination step to generate the final
sentence structure. In order to generate accurate
translation, 1
st
segmentation and parse tree
combination steps are performed differently
according to the identified role of the commas. This
section describes the comma processing methods
according to the roles of the commas. The comma
role classification step, explained in section 3, lies
between lexical analysis and 1
st
segmentation as
shown in Figure 1.
In Table 1, we classify comma uses into 3 types:
connection, separation, and special pattern. Table 3, 4
and 5 present the comma processing methods for each
type respectively.
Table 3: Comma Processing Method 1.
Processing Methods and Examples
Connection
of sentence
elements
Rewrite commas into “and”
(usage1) My frequent uses of the
Internet is sending e-mail and surfing
the Web and using chat rooms.
(usage7) It was a long and noisy and
nauseating flight.
split into independent translation units -
> translate each translation unit ->
combine the translation results
(usage2) ① The public seems eager for
some kind of gun control legislation.
② but the congress is obviously too
timid to enact any truly effective
measures.
Commas with connection role (usage 1, 7) can be
rewritten to “and”. As a result, the elements separated
by commas are analysed together instead of being
treated independently. The elements now are not
segmented in 1
st
segmentation step. For the usage 2,
the input sentence is split into two translation units.
The translation process (from 1
st
segmentation to
parse tree combination steps) performs on each
translation unit. In this case two translation process
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
476