have not made detailed definitions such as (Okumura,
2016) for considering symmetry, it is not suitable for
grouping.
The author’s team is developing a classification
method for Kaomoji using neural networks and cosine
similality. (Okumura and Okumura, 2018)(Okumura,
2017) As a result, it is possible to extract the only ori-
ginal form inferred from any Kaomoji with about 70
% accuracy rate. On the other hand, we do not im-
plement a multiclass classification for Kaomoji with
multiple primitives.
In this paper, we aim to construct a system that
can be estimated even if it belongs to multiple classes
by labeling a string that can be regarded as Kaomoji
as a class called an original form. Also, in this paper,
we examine the number of units in the middle layer
of the neural network, which is necessary to estimate
the original form from the character string that can be
regarded as a Kaomoji.
3 THE METHOD TO ESTIMATE
THE ORIGINAL FORM OF
KAOMOJI CORRESPONDING
TO MULTI-CLASS
CLASSIFICATION
Our conventional method aims at outputting one of
3,110 original form of Kaomoji. However, for exam-
ple,
( ) " o(* *)o ( )
in the case of such Kaomoji, the system cannot judge
which original form to extract because of fo including
three types of character sequences considered to re-
present a face. Therefore, extracting just one original
form is not enough as a grouping, and it is necessary
to identify all the faces included in the Kaomoji-like
character sequence and its original form.
In this paper, we implement multiclass classifica-
tion using fixed-length input feed-forward neural ne-
twork using the character Embedding. In the previous
example, we have to construct a model that is correct
if we can extract the three original forms ( ( )
, ( ) , ( ) , strictly, ( ) has
appeared twice, so our system have to extract two ty-
pes of original form of Kaomoji).
3.1 Multiclass Classification of Kaomoji
The neural network used in this paper is a simple mo-
del with only one middle layer. On the other hand,
in the case of Kaomoji with linguistic features, it is
known that the information to be given to the input
layer is insufficient with the One-hot vector. There-
fore, this system vectorizes the input (Kaomoji) using
the character Embedding. At the time of writing this
article, the longest of the emoticons registered in the
emoticon database collected is 65 characters, so the
input to the Embedding layer is 65 units. Each cha-
racter input to the Embedding layer is converted to
a 100-dimensional vector in order from the left side
of the emoticon and combined in order. For Kaomoji
less than 65 characters, generate a fixed-length input
vector with zero paddings (NULL characters) for the
shortfall. The figure 1 shows the configuration of the
neural network adopted in this paper.
In this paper, we change the number of units in the
middle layer in the figure 1, and we want to derive the
appropriate number of units based on both the evalu-
ation by the correct answer rate (described later) and
the learning time. Some researchers noted that altho-
ugh there is an argument that there is a standard of
(number of input units + number of output units) x
2/3 as a standard of the number of units, it is various
factors such as the complexity of the problem to be
solved and the size of learning data. Because of the
influence, it does not go beyond the range of heuris-
tics.
3.2 Evaluation Method
In our past studies, it was regarded as correct as long
as at least one correct original form is included in the
outputted original form group as an evaluation scale
in the estimation of the emoticon base form. Howe-
ver, in multiclass classification, we have to evaluate
whether outputs of our system include all the original
forms. In this paper, if our system classified three ori-
ginal forms, then; the conventional evaluation method
(Easy) is corresponding to the correct if one of the ou-
tput is a correct answer; Normal evaluation method is
corresponding to the ratio between correct answers in
output and the number of correct answers (Normal);
Hard evaluation method is corresponding to the cor-
rect if the system’s output is corresponding to all of
the correct answers.
For example, if there are three types of correct
answers and the estimated results are these, For exam-
ple, if there are three types of correct answers and the
estimated results are these, then the evaluation me-
thod for Easy answers correct, the evaluation method
for Normal calculates a ratio as a correct rate of 2/3,
and the evaluation method for Hard answers incorrect.
Evaluation is performed by the following formula 1,
2, 3.
KMIS 2019 - 11th International Conference on Knowledge Management and Information Systems
378