LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION

Matthias Wimmer; Björn Schuller; Dejan Arsic; Gerhard Rigoll; Bernd Radig

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION

Topics: 2D and 3D Scene Understanding; Cognitive & Biologically Inspired Vision; Early Vision and Image Representation; Face and Gesture Recognition; Feature Extraction; Human Activity Recognition; Pattern Recognition in Image Understanding; Real-Time Vision; Segmentation and Grouping; Statistical Approach; Time-Frequency Analysis; Video Analysis

In Proceedings of the Third International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, 145-151, 2008 , Funchal, Madeira, Portugal

Authors: Matthias Wimmer ¹ ; Björn Schuller ² ; Dejan Arsic ² ; Gerhard Rigoll ² and Bernd Radig ³

Affiliations: ¹ Perceptual Computing Lab, Faculty of Science and Engineering, Waseda University, Japan ; ² Institute for Human-Machine Communication, Technische Universität München, Germany ; ³ Technische Universität München, Germany

Keyword(s): Emotion Recognition, Audio-visual Processing, Multi-modal Fusion.

Related Ontology Subjects/Areas/Topics: Applications ; Artificial Intelligence ; Biomedical Engineering ; Biomedical Signal Processing ; Computer Vision, Visualization and Computer Graphics ; Data Manipulation ; Early Vision and Image Representation ; Feature Extraction ; Features Extraction ; Health Engineering and Technology Applications ; Human-Computer Interaction ; Image and Video Analysis ; Informatics in Control, Automation and Robotics ; Methodologies and Methods ; Motion, Tracking and Stereo Vision ; Neurocomputing ; Neurotechnology, Electronics and Informatics ; Pattern Recognition ; Physiological Computing Systems ; Real-Time Vision ; Segmentation and Grouping ; Sensor Networks ; Signal Processing, Sensors, Systems Modeling and Control ; Soft Computing ; Software Engineering ; Statistical Approach ; Time-Frequency Analysis ; Video Analysis

Abstract: Bimodal emotion recognition through audiovisual feature fusion has been shown superior over each individual modality in the past. Still, synchronization of the two streams is a challenge, as many vision approaches work on a frame basis opposing audio turn- or chunk-basis. Therefore, late fusion schemes such as simple logic or voting strategies are commonly used for the overall estimation of underlying affect. However, early fusion is known to be more effective in many other multimodal recognition tasks. We therefore suggest a combined analysis by descriptive statistics of audio and video Low-Level-Descriptors for subsequent static SVM Classification. This strategy also allows for a combined feature-space optimization which will be discussed herein. The high effectiveness of this approach is shown on a database of 11.5h containing six emotional situations in an airplane scenario.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.118.0.240

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Wimmer, M.; Schuller, B.; Arsic, D.; Rigoll, G. and Radig, B. (2008). LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION. In Proceedings of the Third International Conference on Computer Vision Theory and Applications (VISIGRAPP 2008) - Volume 1: VISAPP; ISBN 978-989-8111-21-0; ISSN 2184-4321, SciTePress, pages 145-151. DOI: 10.5220/0001082801450151

@conference{visapp08,
author={Matthias Wimmer. and Björn Schuller. and Dejan Arsic. and Gerhard Rigoll. and Bernd Radig.},
title={LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION},
booktitle={Proceedings of the Third International Conference on Computer Vision Theory and Applications (VISIGRAPP 2008) - Volume 1: VISAPP},
year={2008},
pages={145-151},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001082801450151},
isbn={978-989-8111-21-0},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the Third International Conference on Computer Vision Theory and Applications (VISIGRAPP 2008) - Volume 1: VISAPP
TI - LOW-LEVEL FUSION OF AUDIO AND VIDEO FEATURE FOR MULTI-MODAL EMOTION RECOGNITION
SN - 978-989-8111-21-0
IS - 2184-4321
AU - Wimmer, M.
AU - Schuller, B.
AU - Arsic, D.
AU - Rigoll, G.
AU - Radig, B.
PY - 2008
SP - 145
EP - 151
DO - 10.5220/0001082801450151
PB - SciTePress