bration and player detection. The acquisition is most
commonly based on multiple fixed cameras around
the stadium or sports hall, covering all the play-
field (Figueroa et al., 2006; Ren et al., 2009; Choi and
Seo, 2011; Iwase and Saito, 2004). With these topolo-
gies, the spatial segmentation task can be easily per-
formed applying an approach based on background
subtraction. On the other hand, simpler image acqui-
sition architectures, such as with a single camera (Lu
et al., 2009; Dearden et al., 2006) or using TV broad-
casting sequences (Ekin et al., 2003; Khatoonabadi
and Rahmati, 2009) require more complex processing
for the background/foreground segmentation, but also
on the following stages, mainly on player detection
and camera calibration. The most usual techniques
for image segmentation range from background sub-
traction using a background model created from ini-
tial frames (Iwase and Saito, 2004) to more complex
dynamic model using a representation on a specific
colorspace taking advantage of a dominant and homo-
geneous color field (Figueroa et al., 2006; Ren et al.,
2009; Ekin et al., 2003). However, when background
is neither static nor there is a dominant field color as
in indoor sports, the basic methods presented above
are not suitable for players’ segmentation.
The relation between image coordinates and world
coordinates is a fundamental task of the sports analy-
sis problem, solved using the calibration of the cam-
era with respect to the field. By knowing the cam-
era parameters, it is possible to relate the position
of the players in the image with their actual position
on the field (Figueroa et al., 2006; Lu et al., 2013a).
When fixed cameras are used, this stage is trivially
accomplished and can be performed manually; oth-
erwise, when the camera moves, dynamic and auto-
matic methods are required (Lu et al., 2013b).
The detection of the players has been addressed
with different techniques. Some of these methods
rely on the extraction of features and posterior clas-
sification (Lu et al., 2009; Liu et al., 2009). However,
if temporal tracking is not taken into account, false
positives and missed detections are frequent. The dy-
namics of the players together with complex obser-
vation models are therefore also used to improve the
detection and tracking of the players. In this sense,
mean shift (Kheng, 2011), Kalman Filters (Welch and
Bishop, 1995), and Particle Filters (Doucet and Jo-
hansen, 2011) are the most commonly adopted solu-
tions. Recently, Linear Programming based methods
(Shitrit et al., 2011) or Conditional Random Fields
(Lu et al., 2013b) were used to overcome the diffi-
culties of multiple people tracking.
Most of the work found in literature focus mainly
on players’ position, trajectories and high level and
collective performance information. On the other
hand, just a few examples of research works include
ball, goals, passes and set pieces detection (Santiago
et al., 2010). From the technological point of view,
relevant aspects are left out. For example, computa-
tion time and real time constraints are barely consid-
ered. Finally, all the image acquisition architectures
use one or more fixed cameras and there is no relevant
work using portable or moving systems for image ac-
quisition.
2 A FRAMEWORK FOR VIDEO
ANALYSIS IN USING AN UAV
In this section we present the proposed methodology,
designed to automatically capture, process, and ex-
tract player and team performance statistics from a
soccer video stream captured with an UAV. Only in
the beginning of the processing, the system interac-
tively queries the user to obtain initial information to
support the automatic processing of the whole video.
A block diagram of the main steps of the proposed
methodology is presented in Figure 1. The main
stages of the framework are: video stabilization, cam-
era calibration, player detection and tracking and high
level interpretation of the game.
2.1 Image Acquisition
The images from indoor soccer games used in this
research are shot by the Ar.Drone’s frontal camera.
The Drone is controlled using Parrot’s commercial
application for mobile devices
1
. The drone was pro-
grammed to hover on a static position, 5 to 7 meters
above floor, close to the side line of the pitch (see Fig-
ure 2). The structure of the Ar.Drone was modified
to make its frontal camera point 30 degrees down, in
order to capture the game action and simultaneously
avoid occlusions with other objects.
2.2 Video Stabilization
Due to unavoidable drone’s motion, the image se-
quences will suffer from undesired global movement.
Since most of the techniques for the following pro-
cessing stages assume a static background, it is fun-
damental to stabilize the video. This step will severely
impact the global performance of the framework.
We designed a method based on feature match-
ing between two consecutive frames. Feature extrac-
1
https://play.google.com/store/apps/details?id=com.par
rot.free flight
ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods
78