Authors:
            
                    Maxim Sidorov
                    
                        
                                1
                            
                    
                    ; 
                
                    Evgenii Sopov
                    
                        
                                2
                            
                    
                    ; 
                
                    Ilia Ivanov
                    
                        
                                2
                            
                    
                     and
                
                    Wolfgang Minker
                    
                        
                                1
                            
                    
                    
                
        
        
            Affiliations:
            
                    
                        
                                1
                            
                    
                    Ulm University, Germany
                
                    ; 
                
                    
                        
                                2
                            
                    
                    Siberian State Aerospace University, Russian Federation
                
        
        
        
        
        
             Keyword(s):
            Emotion Recognition, Speech, Vision, PCA, Neural Network, Human-Computer Interaction (HCI), Feature Level Fusion, Decision Level Fusion.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Human-Machine Interfaces
                    ; 
                        Hybrid Learning Systems
                    ; 
                        Image Processing
                    ; 
                        Informatics in Control, Automation and Robotics
                    ; 
                        Intelligent Control Systems and Optimization
                    ; 
                        Neural Networks Based Control Systems
                    ; 
                        Robotics and Automation
                    ; 
                        Vision, Recognition and Reconstruction
                    
            
        
        
            
                Abstract: 
                The speech-based emotion recognition problem has already been investigated by many authors, and reasonable results have been achieved. This article focuses on applying audio-visual data fusion approach to emotion recognition. Two state-of-the-art classification algorithms were applied to one audio and three visual feature datasets. Feature level data fusion was applied to build a multimodal emotion classification system, which helped increase emotion classification accuracy by 4% compared to the best accuracy achieved by unimodal systems. The class precisions achieved by applying algorithms on unimodal and multimodal datasets helped to reveal that different data-classifier combinations are good at recognizing certain emotions. These data-classifier combinations were fused on the decision level using several approaches, which still helped increase the accuracy by 3% compared to the best accuracy achieved by feature level fusion.