Multimodal Sentiment Analysis on Video Streams using Lightweight Deep Neural Networks
Atitaya Yakaew, Matthew Dailey, Teeradaj Racharak
2021
Abstract
Real-time sentiment analysis on video streams involves classifying a subject’s emotional expressions over time based on visual and/or audio information in the data stream. Sentiment can be analyzed using various modalities such as speech, mouth motion, and facial expression. This paper proposes a deep learning approach based on multiple modalities in which extracted features of an audiovisual data stream are fused in real time for sentiment classification. The proposed system comprises four small deep neural network models that analyze visual features and audio features concurrently. We fuse the visual and audio sentiment features into a single stream and accumulate evidence over time using an exponentially-weighted moving average to make a final prediction. Our work provides a promising solution to the problem of building real-time sentiment analysis systems that have constrained software or hardware capabilities. Experiments on the Ryerson audio-video database of emotional speech (RAVDESS) show that deep audiovisual feature fusion yields substantial improvements over analysis of either single modality. We obtain an accuracy of 90.74%, which is better than baselines of 11.11% – 31.48% on a challenging test dataset.
DownloadPaper Citation
in Harvard Style
Yakaew A., Dailey M. and Racharak T. (2021). Multimodal Sentiment Analysis on Video Streams using Lightweight Deep Neural Networks.In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-486-2, pages 442-451. DOI: 10.5220/0010304404420451
in Bibtex Style
@conference{icpram21,
author={Atitaya Yakaew and Matthew Dailey and Teeradaj Racharak},
title={Multimodal Sentiment Analysis on Video Streams using Lightweight Deep Neural Networks},
booktitle={Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2021},
pages={442-451},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010304404420451},
isbn={978-989-758-486-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Multimodal Sentiment Analysis on Video Streams using Lightweight Deep Neural Networks
SN - 978-989-758-486-2
AU - Yakaew A.
AU - Dailey M.
AU - Racharak T.
PY - 2021
SP - 442
EP - 451
DO - 10.5220/0010304404420451