broadcast which uses both of low level features and
high level features to generate pictorial summaries
of news stories. As high level features, we choose to
work with textual information. However, we have
escaped the problem of text extraction from which
NLP approaches are suffering.
The rest of paper is organized as follows. In section
2, we discuss works related to news segmentation.
Section 3 describes our genetic algorithm proposed
to summarize stories. Results of our approach are
shown in section 4. We conclude with directions for
future work.
Figure 1: User interface of news segmentation and stories
summarization. Stories are selected from the list on the left
side. Summary of every story appears on the right side.
2 NEWS SEGMENTATION
Many works have been done in the field of
extracting stories from news video. The idea is to
detect anchor shots. The first anchor shot detector
dates back to 1995. It was proposed in (Zhang et al.,
1995) suggesting to classify shots basing on the
anchorperson shot model .As part of Informedia
project (Informedia Project, 2006), authors in (Yang
et al., 2005) use high level information (speech, text
transcript, and facial information) to classify persons
appearing in the news program into three types:
anchor, reporter, or person involving in a news
event.
News story segmentation is also well studied in
the TRECVID workshops (TRECVID 2004, 2003),
in news story segmentation sessions. As part of
TRECVID 2004, we can refer to the work proposed
in (Hoashi et al., 2004) in which authors proposed
SVM-based story segmentation method using low-
level audio-video features. In our work, we used the
approach proposed in (Zhai et al., 2005) in which
the news program is segmented by detecting and
classifying bodies to find group of anchor shots. It is
based on the assumption that the anchor’s wears are
the same through out the entire program.
3 STORIES SUMMARIZATION
3.1 Problem Description
Summarizing a video consists in providing an other
version which contains pertinent and important
items needful for quick content browsing. In fact,
our approach aims at accelerating the browsing
operation by producing pictorial summaries helpful
to judge if the news video is interesting or not. In the
web context, it indicates to persons who are
connected to online archives, if a given news video
is interesting and if it may be downloaded or not.
3.2 Classical Solutions for Key-frames
Extraction
Many works have been proposed in the field of
video pictorial summary. As it is defined in (Ma,
Zhang, 2005) a pictorial summary must respond to
three criteria. First, the summary must be structured
to give to the viewer a clear image of the entire
video. Second, the summary must be well filtered.
Finally, the summary must have a suitable
visualization form. Authors in (Taniguchi et al.,
1997) have summarized video using a 2-D packing
of “panoramas” which are large images formed by
compositing video pans. In this work, key-frames
were extracted from every shot and used for a 2-D
representation of the video content. Because frame
sizes were not adjusted for better packing, much
white space can be seen in the summary results.
Besides, no effective filtering mechanism was
defined. Authors in (Uchihachi et al., 1999) have
proposed to summarize video by a set of key-frames
with different sizes. The selection of key-frames is
based on eliminating uninteresting and redundant
shots. Selected key-frames are sized according to the
importance of the shots from which they were
extracted. In this pictorial summary the time order is
not conserved due to the arrangement of pictures
with different sizes.
Later, in (Ma, Zhang, 2005) authors have proposed a
pictorial summary, called video snapshot. In this
approach the summary is evaluated for 4 criteria. It
must be visually pleasurable, representative,
informative and distinctive. A weighting scheme is
proposed to evaluate every summary. However this
approach suggests a real filtering mechanism but it
uses only low level features (color, saturation …).
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
304