the val label without considering spontaneity is 0.761,
but when considering MAE, it is 0.749. Therefore, by
considering spontaneity, we were able to show a con-
tribution to emotion estimation.
In this study, we experimented with three sim-
ple layers of DNN. However, if the network was
changed, the optimal hyperparameter settings could
also change. Moreover, we simply decided hyper-
parameters by experiments, hence it will be a future
work to use known optimization algorithms to decide
them. Further, we used a normal distribution with
variance 1 as the distribution for measuring KL di-
vergence, but it may be possible to improve the ac-
curacy of emotion estimation by changing the vari-
ance and verifying the change in VAT accuracy. Fur-
thermore, in this study, we conducted a cross-corpus
experiment between Japanese and English, however
as a future task, we will investigate the improvement
of robustness by VAT by conducting a cross-corpus
experiment in other languages and culture areas. In
addition, in this study, we compared the estimation
accuracy using IEMOCAP. IEMOCAP was used for
both training and evaluating. Therefore it is consid-
ered future works to evaluate the contribution of esti-
mation accuracy considering spontaneity using other
language corpora. Finally, we need to conduct sub-
jective assessment experiments to understand how the
estimation error affects the human perception.
This work was supported by JSPS KAKENHI Grant
Numbers JP17H04705, JP18H03229, JP18H03340,
18K19835, JP19H04113, JP19K12107.
