problem of data scarcity and reduce reliance on
extensive datasets.
Another important direction is the exploration of
general interaction theories that can manage the
complex interactions within discourse flows,
especially in settings involving multiple speakers.
Creating a systematic framework for modeling these
relationships could improve sentiment analysis with
multi-model.
All in all, multimodal sentiment analysis has the
potential for major advances. Researchers can
develop more complex, ethical, and practical
technology-based solutions for understanding human
emotions by addressing these problems and
researching the suggested future possibilities.
REFERENCES
P. Melville, W. Gryc, R.D. Lawrence. "Sentiment analysis
of blogs by combining lexical knowledge with text
classification." ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining,
pp. 1275–1284 (2009).
B. Liu, L. Zhang. "A survey of opinion mining and
sentiment analysis." Mining Text Data, Springer U.S.,
pp. 415–463 (2012).
S. L. C. & C.J. Sun. "A review of natural language
processing techniques for opinion mining systems."
Inform. Fusion, 36, 10-25 (2017).
Y. Z. Zhang, D. W. Song, P. Zhang, et al. "A Quantum-
Inspired Multimodal Sentiment Analysis Framework."
Theoretical Computer Science, 752, 21-40 (2018).
A. Zadeh, R. Zellers, E. Pincus, L.P. Morency. "Mosi:
multimodal corpus of sentiment intensity and
subjectivity analysis in online opinion videos." arXiv
preprint arXiv:1606.06259 (2016).
M. Soleymani, D. Garcia, B. Jou, B. Schuller, S.F. Chang,
M. Pantic. "A survey of multimodal sentiment
analysis." Image Vis. Comput., 65, 3-14 (2017).
S. Poria, E. Cambria, R. Bajpai, A. Hussain. "A review of
affective computing: from unimodal analysis to
multimodal fusion." Inform. Fusion, 37, 98-125 (2017).
Y. F. Qian, Y. Zhang, X. Ma, et al. "EARS: Emotion-Aware
Recommender System Based on Hybrid Information
Fusion." Information Fusion, 46, 141-146 (2019).
S. Verma, C. Wang, L. M. Zhu, et al. "DeepCU: Integrating
Both Common and Unique Latent Information for
Multimodal Sentiment Analysis." Proceedings of the
28th International Joint Conference on Artificial
Intelligence, New York, USA: ACM, 3627-3634
(2019).
P. K. Atrey, M. A. Hossain, A. El Saddik, et al.
"Multimodal Fusion for Multimedia Analysis: A
Survey." Multimedia Systems, 16(6), 345-379 (2010).
Y. Zhang, L. Rong, D. Song, P. Zhang. "A Survey on
Multimodal Sentiment Analysis." Pattern Recognition
and Artificial Intelligence, 33(5), 426-438 (2020).
J. B. Yuan, S. McDonough, Q. Z. You, et al. "Sentribute:
Image Sentiment Analysis from a Mid-level
Perspective." Proceedings of the 2nd International
Workshop on Issues of Sentiment Discovery and
Opinion Mining, pp. 10-12 (2013).
Cao, D. L., Ji, R. R., Li, "Visual Sentiment Topic Model
Based Microblog Image Sentiment Analysis."
Multimedia Tools and Applications, 75(15), 8955-
8968.
J. Wagner, E. Andre, F. Lingenfelser, et al. "Exploring
Fusion Methods for Multimodal Emotion Recognition
with Missing Data." IEEE Transactions on Affective
Computing, 2(4), 206-218 (2011).
J. Devlin, M. W. Chang, K. Lee, et al. "BERT: Pre-training
of Deep Bidirectional Transformers for Language
Understanding." Proceedings of the Annual Conference
of the North American Chapter of the Association for
Computational Linguistics, pp. 4171-4186 (2019).
Q. Z. You, L. L. Cao, H. L. Jin, et al. "Robust Visual-
Textual Sentiment Analysis: When Attention Meets
Tree-Structured Recursive Neural Networks."
Proceedings of the 24th ACM International Conference
on Multimedia, pp. 1008-1017 (2016).
S. Poria, E. Cambria, D. Hazarika, et al. "Multi-level
Multiple Attentions for Contextual Multimodal
Sentiment Analysis." Proceedings of the IEEE
International Conference on Data Mining, pp. 1033-
1038 (2017).
S. L. C. & C.J. Sun. "A review of natural language
processing techniques for opinion mining systems."
Inform. Fusion, 36, 10-25 (2017).
J. H. Tao. "A Novel Prosody Adaptation Method for
Mandarin Concatenation-Based Text-to-Speech
System." Acoustical Science and Technology, 30(1),
33-41 (2009).
N. Sebe, I. Cohen, T. Gevers, et al. "Emotion Recognition
Based on Joint Visual and Audio Cues." Proceedings of
the 18th International Conference on Pattern
Recognition, pp. 1136-1139 (2006).
X. Y. Zhang, C. S. Xu, J. Cheng, et al. "Effective
Annotation and Search for Video Blogs with
Integration of Context and Content Analysis." IEEE
Transactions on Multimedia, 11(2), 272-285 (2009).
S. Poria, E. Cambria, D. Hazarika, et al. "Context-
Dependent Sentiment Analysis in User-Generated
Videos." Proceedings of the 55th Annual Meeting of the
Association for Computational Linguistics,
Stroudsburg, USA: ACL, 873-883 (2017).
N. Majumder, S. Poria, D. Hazarika, et al. "DialogueRNN:
An Attentive RNN for Emotion Detection in
Conversations." Proceedings of the AAAI Conference
on Artificial Intelligence, Palo Alto, USA: AAAI Press,
6818-6825 (2019).
M. E. Peters, M. Neumann, M. Iyyer, et al. "Deep
Contextualized Word Representations." Proceedings of
the Conference of the North American Chapter of the
Association for Computational Linguistics,
Stroudsburg, USA: ACL, 2227-2237 (2018).