REFERENCES
Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyil-
maz, A., and Choi, Y. (2019). COMET: Common-
sense transformers for automatic knowledge graph
construction. In Proceedings of the 57th Annual Meet-
ing of the Association for Computational Linguis-
tics, pages 4762–4779, Florence, Italy. Association
for Computational Linguistics.
Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower,
E., Kim, S., Chang, J. N., Lee, S., and Narayanan,
S. S. (2008). Iemocap: interactive emotional dyadic
motion capture database. Language Resources and
Evaluation, 42(4):335–359.
Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014).
Empirical evaluation of gated recurrent neural net-
works on sequence modeling. In NIPS 2014 Workshop
on Deep Learning, December 2014.
Defferrard, M., Bresson, X., and Vandergheynst, P. (2016).
Convolutional neural networks on graphs with fast
localized spectral filtering. In Lee, D., Sugiyama,
M., Luxburg, U., Guyon, I., and Garnett, R., editors,
Advances in Neural Information Processing Systems,
volume 29. Curran Associates, Inc.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2019). BERT: Pre-training of deep bidirectional
transformers for language understanding. In Pro-
ceedings of the 2019 Conference of the North Amer-
ican Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume
1 (Long and Short Papers), pages 4171–4186, Min-
neapolis, Minnesota. Association for Computational
Linguistics.
Ghosal, D., Majumder, N., Gelbukh, A., Mihalcea, R., and
Poria, S. (2020a). COSMIC: COmmonSense knowl-
edge for eMotion identification in conversations. In
Findings of the Association for Computational Lin-
guistics: EMNLP 2020, pages 2470–2481, Online.
Association for Computational Linguistics.
Ghosal, D., Majumder, N., Mihalcea, R., and Poria, S.
(2020b). Utterance-level dialogue understanding: An
empirical study. CoRR, abs/2009.13902.
Ghosal, D., Majumder, N., Poria, S., Chhaya, N., and Gel-
bukh, A. (2019). DialogueGCN: A graph convolu-
tional neural network for emotion recognition in con-
versation. In Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing
and the 9th International Joint Conference on Nat-
ural Language Processing (EMNLP-IJCNLP), pages
154–164, Hong Kong, China. Association for Com-
putational Linguistics.
Guibon, G., Labeau, M., Flamein, H., Lefeuvre, L., and
Clavel, C. (2021). Few-shot emotion recognition in
conversation with sequential prototypical networks.
In Proceedings of the 2021 Conference on Empiri-
cal Methods in Natural Language Processing, pages
6858–6870, Online and Punta Cana, Dominican Re-
public. Association for Computational Linguistics.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural Comput., 9(8):1735–1780.
Kim, T. and Vossen, P. (2021). Emoberta: Speaker-aware
emotion recognition in conversation with roberta.
CoRR, abs/2108.12009.
Lee, B. and Choi, Y. S. (2021). Graph based network with
contextualized representations of turns in dialogue.
In Proceedings of the 2021 Conference on Empiri-
cal Methods in Natural Language Processing, pages
443–455, Online and Punta Cana, Dominican Repub-
lic. Association for Computational Linguistics.
Li, Y., Su, H., Shen, X., Li, W., Cao, Z., and Niu, S.
(2017). DailyDialog: A manually labelled multi-turn
dialogue dataset. In Proceedings of the Eighth Inter-
national Joint Conference on Natural Language Pro-
cessing (Volume 1: Long Papers), pages 986–995,
Taipei, Taiwan. Asian Federation of Natural Language
Processing.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,
V. (2019). Roberta: A robustly optimized bert pre-
training approach.
Ma, X. and Hovy, E. (2016). End-to-end sequence label-
ing via bi-directional LSTM-CNNs-CRF. In Proceed-
ings of the 54th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers),
pages 1064–1074, Berlin, Germany. Association for
Computational Linguistics.
Majumder, N., Poria, S., Hazarika, D., Mihalcea, R., Gel-
bukh, A., and Cambria, E. (2019). Dialoguernn: An
attentive rnn for emotion detection in conversations.
Proceedings of the AAAI Conference on Artificial In-
telligence, 33(01):6818–6825.
Poria, S., Cambria, E., Hazarika, D., Majumder, N., Zadeh,
A., and Morency, L.-P. (2017). Context-dependent
sentiment analysis in user-generated videos. In Pro-
ceedings of the 55th Annual Meeting of the Associa-
tion for Computational Linguistics (Volume 1: Long
Papers), pages 873–883, Vancouver, Canada. Associ-
ation for Computational Linguistics.
Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria,
E., and Mihalcea, R. (2019). MELD: A multimodal
multi-party dataset for emotion recognition in conver-
sations. In Proceedings of the 57th Annual Meeting of
the Association for Computational Linguistics, pages
527–536, Florence, Italy. Association for Computa-
tional Linguistics.
Song, X., Zang, L., Zhang, R., Hu, S., and Huang, L.
(2022). Emotionflow: Capture the dialogue level emo-
tion transitions. In ICASSP 2022 - 2022 IEEE Inter-
national Conference on Acoustics, Speech and Signal
Processing (ICASSP), pages 8542–8546.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I.
(2017). Attention is all you need. In Guyon, I.,
Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R.,
Vishwanathan, S., and Garnett, R., editors, Advances
in Neural Information Processing Systems 30, pages
5998–6008. Curran Associates, Inc.
Zahiri, S. and Choi, J. D. (2018). Emotion Detection
on TV Show Transcripts with Sequence-based Con-
volutional Neural Networks. In Proceedings of the
Emotions Relationship Modeling in the Conversation-Level Sentiment Analysis
283