5 DISCUSSIONS
Emotion estimation experiments suggest a large gap
between natural uttered voice and acting voice in
terms of acoustic features for the same emotion. The
accuracy was higher when the train and test data types
were the same. However, the natural utterances voice
had an accuracy of about 30% compared to the acting
voice, shows that there is more variant in the acoustic
features of the natural utterances voice compared to
the acting voice, even for data with the same emotion
labels. This may be since the emotional evaluation of
OGVC's (Online Game Voice Chat) natural uttered
voice is remain to only three raters, which makes the
ratings unstable.
6 CONCLUSION
This research aims to develop a system that, in
addition to converting voice input into text, estimates
the user's emotions from the acoustic information and
automatically inputs emoticons at the end of the text
that match those emotions. In particular, this paper
presents comparative experiments conducted on
natural utterances voice and acting voice of the model
voice data (OGVC) to be trained when creating an
emotion classification learning model. The results
show that the natural utterances voice is more variant
on assessment score than the acting voice, even the
emotions are the same. Since the proposed system
aimes to estimate emotions from natural utterances
voice rather than the user's acting voice, it would be
preferable to train natural utterances voice in the same
situation. Based on the experimental results, it is
necessary to explore a new methods to evaluate the
emotion’s asessment data model and the new data
model with emotion labels.
As a future works, after reviewing the data model,
we will consider appropriate feature selection to
achieve higher classification accuracy and a method
for selecting emoticons.
REFERENCES
Arimoto, Y., & Kawatsu, H. (2013). Online Gaming Voice
Chat Corpus with Emotional Label (OGVC). Speech
Resources Consortium, National Institute of
Informatics, dataset. https://doi.org/10.32130/src.
OGVC
Cambridge university press. (2023). Cambridge Dictionary.
https://dictionary.cambridge.org/ja/dictionary/
Edwards, E., Dognin, C., Bollepalli, B., Singh, M. K., &
Analytics, V. (2020, October). Multiscale System for
Alzheimer's Dementia Recognition Through
Spontaneous Speech. In INTERSPEECH (pp. 2197-
2201).
Eyben, F., Wöllmer, M., & Schuller, B. (2010).
OpenSMILE ‒ TheMunich Versatile and Fast Open-
Source AudioFeature Extractor. ACM Multimedia
Conference ‒ MM, pp.1459–1462.
MIC Information and Communications Policy Institute.
(2021). Report on Survey on Information and
Communication Media Usage Time and Information
Behaviour in FY2020. p.66. https://www.
soumu.go.jp/main_content/000765258.pdf
Mizuho information & research institute, inc. (2015).
Survey Research on People’s Awareness of New ICT
Services and Technologies for Solving Social Issues.
p.36. https://www.soumu.go.jp/johotsusintokei/linkda
ta/h27_06_houkoku.pdf
Okuma. (2020, December 2). Hatena Blog. Ledge Tech
Blog. https://tech.ledge.co.jp/entry/metrics
Plutchik, R. (2001). The Nature of Emotions. American
Scientist, Vol.89(No.4(JULY‒AUGUST2001)),
pp.344–356.
Schuller, B., Steidl, S., Batliner, A., Hirschberg, J.,
Burgoon, J. K., Baird, A., ... & Evanini, K. (2016). The
interspeech 2016 computational paralinguistics
challenge: Deception, sincerity & native language. In
17TH Annual Conference of the International Speech
Communication Association (Interspeech 2016), Vols
1-5 (Vol. 8, pp. 2001-2005). ISCA.
Yu, H., & Kim, S. (2012). SVM Tutorial-Classification,
Regression and Ranking. Handbook of Natural
computing, 1, pp.1-13.