tures. Given the results of rally-rank evaluation, users
can watch the resulting short video summary that is
composed of only important rally shots. Additionally,
we present an efficient viewing system “RSViewer,”
a double meaning of “RSV viewer” and “Rally Scene
Viewer.” With its functions of important rally scene
playback and its fast-forwarding, users can gain their
satisfying experience to watch RSV focusing on rally
scenes.
2 RELATED WORK
There are two approaches to efficiently watch a video:
scene-based summarization and fast-forward view-
ing. Scene-based summarization aims to generate the
highlight video called “video summary” and is ap-
plied to various sports video (Liu et al., 2009), (Tjon-
dronegoro et al., 2004), (Zhao et al., 2012). Liu et
al. proposed a method for rally shot detection based
on unsupervised shot clustering and supervised au-
dio classification using Support Vector Machines (Liu
et al., 2009). This method can detect rally shots with
high accuracy. However, video editors must manu-
ally annotate labels of audio information for the first
30 minutes to create the summary. Tjondronegoro
et al. proposed a highlight scene detection for var-
ious sports video based on a cheer, whistle and text
information (Tjondronegoro et al., 2004). Zhao et
al. extracted the replay scene by searching logos that
located before and after replays (Zhao et al., 2012).
Highlights and replays can attract viewers and are
very important scene because they are selected by
skilled editors. Such scenes enable viewers to atten-
tively watch a specific motion, while it is difficult to
understand the tide of the game by the lack of infor-
mation such as scores. Hence, highlights and replays
are inadequate for the video summarization with the
understanding of the game.
Fast-forward viewing approach aims to let view-
ers watch all of the video in a short time without re-
moving any scene (Cheng et al., 2009), (Kurihara,
2012). Cheng et al. presented a system to watch a
video with positively or negatively accelerating the
playback speed (Cheng et al., 2009). Users can watch
their interested scenes on low speed playback and skip
their insensitive ones on high speed playback. How-
ever, this system has some limitation; for example,
if users do not understand the scene structure of the
video, to control the playback speed is difficult since
they cannot predict when their interested scenes start.
Kurihara proposed two-level fast-forwarding system
for movies based on subtitles (Kurihara, 2012). How-
ever, this method is not suited to RSV because usually
Table 1: Mainly representative scenes on broadcast RSV.
Period Scenes
broadcast start
commentator’s talk,
∼ before game start
player introduction,
practice, fan
rally, change court, fan,
game start ∼ game set replay, player’s zoom,
court maintenance
after game set commentator’s talk, fan,
∼ broadcast end interview, ceremony
there is no subtitle in RSV and fewer speech than in a
movie content.
As the related work of racquet sports recognition,
methods to detect events, such as services and net
play, have been proposed (Chang et al., 2012), (Chen
and Zhang, 2006), (Huang et al., 2012). While event
detection is helpful for evaluating rally importance,
it is inadequate for RSV summarization because con-
sidering lots of events makes the summarizing pro-
cess much complicated. To detect events in RSV,
rally scene detection is essential and has been dis-
cussed in previous studies (Kijak et al., 2003), (Liu
et al., 2009), (Zhong and Chang, 2001). However,
these methods require models which are adapted for
an input video. Our approach overcomes such prob-
lem and enables automatic detection and summariza-
tion of rally scenes only by video input.
Compared with the prior studies, our main con-
tributions are following two points: i) our method
is fully automatic while preserving the quality of the
video summary and ii) our system is the first work to
present an user interface specialized for RSV includ-
ing summarization and fast-forwarding functions.
3 RSV’S STRUCTURE
We will describe the structure of RSV treated in this
study. On our observation, the period of broadcast-
ing RSV is divided into three parts: “broadcast start
∼ before game start,” “game start ∼ game set,” “after
game set ∼ broadcast end.” Table 1 shows the scenes
mainly included in the three periods. In table 1, the
scenes that allow viewers to understand the racquet
sports match are “rally scene” and “replay scene.” In
rally scenes, a scoreboard is always displayed and
all of players’ ball hits are included, whereas replay
scenes have only a few ones without any scoreboard.
Therefore, we assume that the most important scene
for understanding the match is a rally scene, and aim
to generate the video summary composed of only im-
portant rally scenes.