Authors:
Antti E. Ainasoja
;
Antti Hietanen
;
Jukka Lankinen
and
Joni-Kristian Kämäräinen
Affiliation:
Tampere University of Technology, Finland
Keyword(s):
Video Summarization, Visual Bag-of-Words, Region Descriptors, Optical Flow Descriptors.
Related
Ontology
Subjects/Areas/Topics:
Computer Vision, Visualization and Computer Graphics
;
Features Extraction
;
Image and Video Analysis
;
Motion, Tracking and Stereo Vision
;
Optical Flow and Motion Analyses
Abstract:
In this work, we focus on the popular keyframe-based approach for video summarization. Keyframes represent
important and diverse content of an input video and a summary is generated by temporally expanding the
keyframes to key shots which are merged to a continuous dynamic video summary. In our approach, keyframes
are selected from scenes that represent semantically similar content. For scene detection, we propose a simple
yet effective dynamic extension of a video Bag-of-Words (BoW) method which provides over segmentation
(high recall) for keyframe selection. For keyframe selection, we investigate two effective approaches: local
region descriptors (visual content) and optical flow descriptors (motion content). We provide several interesting
findings. 1) While scenes (visually similar content) can be effectively detected by region descriptors,
optical flow (motion changes) provides better keyframes. 2) However, the suitable parameters of the motion
descriptor based keyframe
selection vary from one video to another and average performances remain low.
To avoid more complex processing, we introduce a human-in-the-loop step where user selects keyframes produced
by the three best methods. 3) Our human assisted and learning-free method achieves superior accuracy
to learning-based methods and for many videos is on par with average human accuracy.
(More)