
for multi-corpus speech emotion recognition. IEEE
Transactions on Affective Computing.
Gideon, J., McInnis, M. G., and Provost, E. M. (2019).
Improving cross-corpus speech emotion recognition
with adversarial discriminative domain generalization
(addog). IEEE Transactions on Affective Computing,
12(4):1055–1068.
Goel, S. and Beigi, H. (2020). Cross lingual cross cor-
pus speech emotion recognition. arXiv preprint
arXiv:2003.07996.
Gretton, A., Borgwardt, K., Rasch, M., Sch
¨
olkopf, B., and
Smola, A. (2006). A kernel method for the two-
sample-problem. Advances in neural information pro-
cessing systems, 19.
Hu, J., Ruder, S., Siddhant, A., Neubig, G., Firat, O.,
and Johnson, M. (2020). Xtreme: A massively mul-
tilingual multi-task benchmark for evaluating cross-
lingual generalisation. In International Conference on
Machine Learning, pages 4411–4421. PMLR.
Huang, Z., Dong, M., Mao, Q., and Zhan, Y. (2014). Speech
emotion recognition using cnn. In Proceedings of the
22nd ACM international conference on Multimedia,
pages 801–804.
Huijuan, Z., Ning, Y., and Ruchuan, W. (2023). Improved
cross-corpus speech emotion recognition using deep
local domain adaptation. Chinese Journal of Electron-
ics, 32(3):1–7.
Kexin, Z. and Yunxiang, L. (2023). Speech emotion recog-
nition based on transfer emotion-discriminative fea-
tures subspace learning. IEEE Access.
Khalil, R. A., Jones, E., Babar, M. I., Jan, T., Zafar, M. H.,
and Alhussain, T. (2019). Speech emotion recogni-
tion using deep learning techniques: A review. IEEE
Access, 7:117327–117345.
Latif, S., Qadir, J., and Bilal, M. (2019). Unsupervised ad-
versarial domain adaptation for cross-lingual speech
emotion recognition. In 2019 8th international con-
ference on affective computing and intelligent inter-
action (ACII), pages 732–737. IEEE.
Latif, S., Rana, R., Khalifa, S., Jurdak, R., and Schuller,
B. W. (2022). Self supervised adversarial do-
main adaptation for cross-corpus and cross-language
speech emotion recognition. IEEE Transactions on
Affective Computing.
Latif, S., Rana, R., Younis, S., Qadir, J., and Epps, J. (2018).
Transfer learning for improving speech emotion clas-
sification accuracy. arXiv preprint arXiv:1801.06353.
Lech, M., Stolar, M., Best, C., and Bolia, R. (2020). Real-
time speech emotion recognition using a pre-trained
image classification network: Effects of bandwidth re-
duction and companding. Frontiers in Computer Sci-
ence, 2:14.
Liu, J., Zheng, W., Zong, Y., Lu, C., and Tang, C. (2020).
Cross-corpus speech emotion recognition based on
deep domain-adaptive convolutional neural network.
IEICE TRANSACTIONS on Information and Systems,
103(2):459–463.
Liu, S., Zhang, M., Fang, M., Zhao, J., Hou, K., and
Hung, C.-C. (2021). Speech emotion recognition
based on transfer learning from the facenet frame-
work. The Journal of the Acoustical Society of Amer-
ica, 149(2):1338–1345.
Livingstone, S. R. and Russo, F. A. (2018). The ryerson
audio-visual database of emotional speech and song
(ravdess): A dynamic, multimodal set of facial and
vocal expressions in north american english. PloS one,
13(5):e0196391.
Lugovi
´
c, S., Dunder, I., and Horvat, M. (2016). Techniques
and applications of emotion recognition in speech.
In 2016 39th international convention on information
and communication technology, electronics and mi-
croelectronics (mipro), pages 1278–1283. IEEE.
Ma, A., Filippi, A. M., Wang, Z., and Yin, Z. (2019).
Hyperspectral image classification using similarity
measurements-based deep recurrent neural networks.
Remote Sensing, 11(2):194.
McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M.,
Battenberg, E., and Nieto, O. (2015). librosa: Audio
and music signal analysis in python. In Proceedings
of the 14th python in science conference, volume 8,
pages 18–25.
Padi, S., Sadjadi, S. O., Sriram, R. D., and Manocha, D.
(2021). Improved speech emotion recognition using
transfer learning and spectrogram augmentation. In
Proceedings of the 2021 international conference on
multimodal interaction, pages 645–652.
Parry, J., Palaz, D., Clarke, G., Lecomte, P., Mead, R.,
Berger, M., and Hofer, G. (2019). Analysis of deep
learning architectures for cross-corpus speech emo-
tion recognition. In Interspeech, pages 1656–1660.
Rezaeianjouybari, B. and Shang, Y. (2020). Deep learn-
ing for prognostics and health management: State of
the art, challenges, and opportunities. Measurement,
163:107929.
Scheidwasser-Clow, N., Kegler, M., Beckmann, P., and Cer-
nak, M. (2022). Serab: A multi-lingual benchmark for
speech emotion recognition. In ICASSP 2022-2022
IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pages 7697–7701.
IEEE.
Senthilkumar, N., Karpakam, S., Devi, M. G., Balakumare-
san, R., and Dhilipkumar, P. (2022). Speech emotion
recognition based on bi-directional lstm architecture
and deep belief networks. Materials Today: Proceed-
ings, 57:2180–2184.
Sharma, M. (2022). Multi-lingual multi-task speech emo-
tion recognition using wav2vec 2.0. In ICASSP 2022-
2022 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pages 6907–
6911. IEEE.
Singh, P., Saha, G., and Sahidullah, M. (2021). Non-
linear frequency warping using constant-q transforma-
tion for speech emotion recognition. In 2021 Interna-
tional Conference on Computer Communication and
Informatics (ICCCI), pages 1–6. IEEE.
Van der Maaten, L. and Hinton, G. (2008). Visualizing data
using t-sne. Journal of machine learning research,
9(11).
Cross-Lingual Low-Resources Speech Emotion Recognition with Domain Adaptive Transfer Learning
127