Authors:
Stephan Repp
and
Christoph Meinel
Affiliation:
Hasso-Plattner-Institut for Software System Engineering (HPI), University of Potsdam, Germany
Keyword(s):
Topic segmentation, recorded lecture videos, imperfect and erroneous transcripts, indexing, retrieval.
Related
Ontology
Subjects/Areas/Topics:
Computer-Supported Education
;
Distributed Multimedia Systems
;
e-Learning
;
e-Learning, e-Commerce and e-Society Applications
;
Multimedia
;
Multimedia Databases, Indexing, Recognition and Retrieval
;
Multimedia Systems and Applications
;
Telecommunications
Abstract:
In the past decade, we have witnessed a dramatic increase in the availability of online academic lecture videos. There are technical problems in the use of recorded lectures for learning: the problem of easy access to the multimedia lecture video content and the problem of finding the semantically appropriate information very quickly. The first step to a semantic lecture-browser is the segmenting of the large video-corpus into a smaller cohesion area. The task of breaking documents into topically coherent subparts is called topic segmentation. In this paper, we present a segmenting algorithm for recorded lecture videos based on their imperfect transcripts. The recorded lectures are transcripted by an out-of-the-box speech recognition software with a accuracy of approximately 70%-80%. Words as well as a time stamp for each word are stored in a database. This data acts as the input to our algorithm. We will show that the clustering of similar words, the generation of vectors with the v
alues from the clusters and the calculation of the cosine-mass of adjacent vectors, leads to a better segmenting result compared to a standard algorithm.
(More)