sion with star generative adversarial networks. CoRR,
abs/1806.02169.
Kaneko, T. and Kameoka, H. (2017). Parallel-data-free
voice conversion using cycle-consistent adversarial
networks. CoRR, abs/1711.11293.
Kawahara, H., Morise, M., Takahashi, T., Nishimura, R.,
Irino, T., and Banno, H. (2008). Tandem-straight:
A temporally stable power spectral representation for
periodic signals and applications to interference-free
spectrum, f0, and aperiodicity estimation. Proc. IEEE
International Conference on Acoustics, Speech and
Signal Processing (ICASSP), pages 3933–3936.
Kingma, D. P. and Ba, J. (2015). Adam: A method for
stochastic optimization. ICLR.
Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. In Proceedings of the IEEE, pages 2278–2324.
Liu, D., Domoto, K., Inoue, Y., and Utsuro, T. (2014). Emo-
tional voice conversion utilizing f0 contour and dura-
tion of word accent type. IEICE Tech. Rep. Speech,
114(52):159–164.
Miyoshi, H., Saito, Y., Takamichi, S., and Saruwatari, H.
(2017). Voice conversion using sequence-to-sequence
learning of context posterior probabilities. In Lacerda,
F., editor, Interspeech 2017, 18th Annual Conference
of the International Speech Communication Associa-
tion, Stockholm, Sweden, August 20-24, 2017, pages
1268–1272. ISCA.
Nakashika, T. and Minami, Y. (2017). Speaker-adaptive-
trainable boltzmann machine and its application to
non-parallel voice conversion. EURASIP Journal on
Audio, Speech, and Music Processing, 2017(1):16.
Orihara, R., Narasaki, R., Yoshinaga, Y., Morioka, Y.,
and Kokojima, Y. (2018). Approximation of time-
consuming simulation based on generative adversarial
network. Proc. 42nd IEEE International Conference
on Computer Software and Applications, pages 171–
176.
Radford, A., Metz, L., and Chintala, S. (2015). Unsu-
pervised representation learning with deep convo-
lutional generative adversarial networks. CoRR,
abs/1511.06434.
Sakurai, A. and Kimura, S. (2013). The use of speech
technologies in call centers - including para- and non-
linguistic information. IPSJ SIG Technical Reports,
2013(2):1–6.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V.,
Radford, A., Chen, X., and Chen, X. (2016). Impro-
ved techniques for training gans. In Lee, D. D., Su-
giyama, M., Luxburg, U. V., Guyon, I., and Garnett,
R., editors, Advances in Neural Information Proces-
sing Systems 29, pages 2234–2242. Curran Associa-
tes, Inc.
Sekii, Y., Orihara, R., Kojima, K., Sei, Y., Tahara, Y., and
Ohsuga, A. (2017). Fast many-to-one voice conver-
sion using autoencoders. In ICAART (2), pages 164–
174. SciTePress.
Toda, T., Black, A. W., and Tokuda, K. (2007). Voice
conversion based on maximum-likelihood estimation
of spectral parameter trajectory. IEEE Trans. Audio,
Speech Language Processing, 15(8):2222–2235.
Ulyanov, D., Vedaldi, A., and Lempitsky, V. S. (2016). In-
stance normalization: The missing ingredient for fast
stylization. CoRR, abs/1607.08022.
Yasuda, K., Orihara, R., Sei, Y., Tahara, Y., and Ohsuga, A.
(2018a). An experimental study on transforming the
emotion in speech using cyclegan. Joint Agent Works-
hops and Sympososium, page 5B.
Yasuda, K., Orihara, R., Sei, Y., Tahara, Y., and Ohsuga, A.
(2018b). An experimental study on transforming the
emotion in speech using gan. IEICE Tech. Rep. SP,
118(198):19–22.
Yasuda, K., Orihara, R., Sei, Y., Tahara, Y., and Ohsuga, A.
(2018c). Transforming the emotion in speech using
cyclegan. IEICE Tech. Rep. AI, 118(116):61–66.
Yin, W., Fu, Y., Sigal, L., and Xue, X. (2017). Semi-latent
gan: Learning to generate and modify facial images
from attributes. CoRR, abs/1704.02166.
Zhou, T., Kr
¨
ahenb
¨
uhl, P., Aubry, M., Huang, Q., and Efros,
A. A. (2016). Learning dense correspondence via 3d-
guided cycle consistency. In 2016 IEEE Conference
on Computer Vision and Pattern Recognition, CVPR
2016, Las Vegas, NV, USA, June 27-30, 2016 , pages
117–126. IEEE Computer Society.
Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017).
Unpaired image-to-image translation using cycle-
consistent adversarial networks. In Computer Vision
(ICCV), 2017 IEEE International Conference on.
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
434