Authors:
            
                    Stephan Repp
                    
                        
                    
                     and
                
                    Christoph Meinel
                    
                        
                    
                    
                
        
        
            Affiliation:
            
                    
                        
                    
                    Hasso-Plattner-Institut for Software System Engineering (HPI), University of Potsdam, Germany
                
        
        
        
        
        
             Keyword(s):
            Topic segmentation, recorded lecture videos, imperfect and erroneous transcripts, indexing, retrieval.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Computer-Supported Education
                    ; 
                        Distributed Multimedia Systems
                    ; 
                        e-Learning
                    ; 
                        e-Learning, e-Commerce and e-Society Applications
                    ; 
                        Multimedia
                    ; 
                        Multimedia Databases, Indexing, Recognition and Retrieval
                    ; 
                        Multimedia Systems and Applications
                    ; 
                        Telecommunications
                    
            
        
        
            
                Abstract: 
                In the past decade, we have witnessed a dramatic increase in the availability of online academic lecture videos. There are technical problems in the use of recorded lectures for learning: the problem of easy access to the multimedia lecture video content and the problem of finding the semantically appropriate information very quickly. The first step to a semantic lecture-browser is the segmenting of the large video-corpus into a smaller cohesion area. The task of breaking documents into topically coherent subparts is called topic segmentation. In this paper, we present a segmenting algorithm for recorded lecture videos based on their imperfect transcripts. The recorded lectures are transcripted by an out-of-the-box speech recognition software with a accuracy of approximately 70%-80%. Words as well as a time stamp for each word are stored in a database. This data acts as the input to our algorithm. We will show that the clustering of similar words, the generation of vectors with the v
                alues from the clusters and the calculation of the cosine-mass of adjacent vectors, leads to a better segmenting result compared to a standard algorithm.
                (More)