SIFT APPROACH FOR BALL RECOGNITION IN SOCCER
IMAGES
M. Leo, T. D’Orazio, N. Mosca and A. Distante
Institute of Intelligent Systems for Automation, Via Amendola 122/D, 70126 Bari, Italy
Keywords: Pattern Recognition, Hough Transform, Motion Analysis, Invariant Features.
Abstract: In this paper a new method for ball recognition in soccer images is proposed. It detects the ball position in
each frame but, differently from related previous approaches, it does not require a long and tedious phase to
build different positive training sets in order to properly manage the great variance in ball appearance.
Moreover it does not need any negative training set, avoiding the difficulties to build it that occur when, as
in the soccer context, negative examples abound. A large number of experiments have been carried out on
image sequences acquired during real matches of the Italian “Serie A” soccer championship. The reported
experiments demonstrate the satisfactory capability of the proposed approach to recognize the ball.
1 INTRODUCTION
Automatic ball recognition in image sequences is a
fundamental task: a number of doubtful cases occurs
during the game, as for example detecting the
outside event or the goal event. An automatic
method that detects in each image of the sequence
the ball position is the first and the most important
step to build a (non invasive) vision based decision
support tool for the referee committee during the
game.
In the last decade different methods to
automatically recognize the ball have been proposed.
They could be conveniently divided in two
categories: appearance based methods and motion
based methods.
Motion based methods do not search the ball in
each frame but distinguish the ball from other
objects by means of a priori knowledge about its
motion.
In (Tong, Lu & Liu, 2004) a strategy based on
non-ball elimination is applied using a coarse-to-fine
process. ‘Condensation’ algorithm is used in ball
tracking and a confidence measure representing the
ball region’s reliability is presented to guide possible
ball re-detection for continuous tracking.
In (Yu, Leong, Xu & Tian, 2006) the ball
recognition task is achieved by a trajectory
verification procedure based on Kalman filter rather
than the low-level features. Two different
procedures run iteratively (trajectory discrimination
and extension) to identify the ball trajectory.
In (Ren, Orwell, Jones & Xu, 2004) soccer ball
estimation and tracking using trajectory modeling
from multiple image sequences is proposed.
Motion based methods seem to be well suited for
video indexing applications but not for real time
events detection (as, for example, to solve the goal
line crossing problem) due to their intrinsic
characteristic to not detect the ball in each frame but
to analyze all the objects in the scene during a period
of time and to recognize, at the end, the ball on the
basis of some a-priori assumption about its motion.
Appearance based methods, instead, perform ball
recognition in each frame using information such as
shape, size and color.
From this point of view the ball recognition in
soccer images is one of the applications of the most
general problem of object recognition, where the
mainly used approach is based on classifying the
pattern images after a suitable pre-processing.
This approach has been applied in many contexts
(face recognition, people detection, car detection,…)
using histogram equalization, wavelet based
preprocessing, parametric eigenspace decomposition
(Murase, 1995; Papageorgiou, Oren & Poggio, 1998;
Rowley, Baluja & Kanade, 1998; Jones & Poggio,
1997; Mohan, Papageorgiou & Poggio, 2001) to
obtain a new target representation in a more suitable
vector space.
207
Leo M., D’Orazio T., Mosca N. and Distante A. (2008).
SIFT APPROACH FOR BALL RECOGNITION IN SOCCER IMAGES.
In Proceedings of the Tenth International Conference on Enterprise Information Systems - AIDSS, pages 207-212
DOI: 10.5220/0001688702070212
Copyright
c
SciTePress
In particular in (D’Orazio, Guaragnella, Leo &
Distante, 2004; Leo, D’Orazio & Distante, 2003)
wavelet and independent component analysis (ICA)
were used in the soccer context in combination with
a Neural Network classifier.
Unfortunately existing appearance methods for
ball recognition are limited (in accuracy) by several
inherent difficulties unless a long and tedious
learning procedure based on multiple positive
training sets (covering all the possible appearances
of the ball) is optimized accurately selecting by hand
representative patterns for each set.
This drawback is due to the great variance of the
ball appearance over frames depending on many
factors including the view, the speed of the ball, the
lighting conditions, the possible partial occlusion.
Moreover appearance based methods could
require negative examples (no-ball examples) that
usually abound in soccer fields (player’s socks,
pants or shirts, advertising posters, etc.). For this
reason collecting negative training examples
requires hard work and caution: negative training
examples must uniformly represent the universal set
excluding the positive class and, at the same time,
they should not heavily outnumber the positive
examples.
In this paper a new appearance based ball
detection approach is proposed. It consists of three
steps: at first a circle detection algorithm, CHT,
(Atherton & Kerbyson, 1999) is applied on the
moving objects in the scene to select the region that
is the best candidate to contain the ball considering
only the edge information; then Scale Invariant
Feature Transform, SIFT, (Lowe, 2004) is applied
on the candidate region to extract representative
feature vectors (keypoints); finally they are
compared by nearest neighbour with those contained
in the database of keypoints extracted from a set of
few positive images.
Considering that both CHT and SIFT are
invariant to image scale, rotation, affine distortion
(only in a limited range), addition of noise, and
changes in illumination, the proposed method does
not require long and tedious work to build different
and huge positive training sets. Moreover it does not
require any negative training set.
A large number of experiments have been carried
out on real image sequences acquired during Italian
“Serie A” soccer championship and satisfactory ball
recognition results were obtained using only few
positive training images.
The rest of the paper is organized as follows:
section 2 gives an overview of the proposed system,
section 3 explains experimental setup and section 4
reports experimental results. Finally, discussion and
conclusions are reported in section 5.
2 SYSTEM OVERVIEW
Our system operates in three stages (see Figure 1): it
first applies a circle detection (implemented as
convolutions on the edge magnitude image) on all
the moving regions of the whole image to select the
area that best fits the sought pattern as proposed by
(D’Orazio, Guaragnella, Leo & Distante, 2004).
Then a two steps validation procedure is used to
detect if the pattern corresponding to the highest
peak in the accumulation space, is really a ball or a
wrong pattern has been erroneously detected. This
procedure is necessary due to the impossibility of
the Hough Transform to detect occlusions and ball
absence in the image, so that the ball position is
wrongly determined in such situations in
correspondence of the highest peak in the
accumulator space.
In the first step of the validation procedure, from
the sub-image containing the result of the previous
detection procedure, distinctive invariant features
(SIFT keypoints) are extracted by a cascade filtering
approach consisting of four stages as proposed by
(Lowe, 2004).
Circle
Hough
Transform
SIFT Feature
Matching
Proposed Ball Detection Approach
Ball
No-Ball
I(t)
SIFT Feature
Extraction
Candidate Region
SIFT Feature Database
Figure 1: The Ball recognition system. I(t) is the frame
acquired at the time t.
After Scale Invariant Feature Transform
application, each extracted set of features is
individually compared to those extracted from a
small set of reference ball images stored in a
database (keypoint matching).
The matching procedure is based on nearest
neighbor algorithm and incorrect matching are
determined by taking the ratio of distance from the
closest neighbour to the distance of the second
ICEIS 2008 - International Conference on Enterprise Information Systems
208
closest. If this value is greater than a proper
threshold the keypoint match is considered correct,
otherwise it is discarded.
3 EXPERIMENTAL SETUP
Experiments were performed on real image
sequences acquired at the Friuli Stadium in Udine
(Italy) during different matches of the “Serie A”
Italian Soccer Championship 2006/2007.
Images were acquired using a DALSA TM 6740
monochrome camera able to record up to 200
frames/sec with a resolution of 640x480 pixels.
The camera was placed on the stands of the
stadium with its optical axis lying on goal-mouth
plane (see Figure 2).
a b
Figure 2: a) The DALSA TM 6740 camera placed on the
stands of the stadium and used to acquire the image
sequence during real soccer matches. In the figure it is
protected by an enclosure. b) An image acquired by the
camera during a match.
Different matches in different light conditions
(evening matches with artificial lighting conditions,
afternoon matches in both cloudy and sunny days)
were acquired.
The acquired images demonstrate the great
variance in the appearance of the ball depending on
lighting conditions, ball speed, ball position etc. as
visible in Figure 3 where three different ball
appearances are shown.
In the acquired images, the ball radius varies
from 9 to 11 pixels (depending on the distance from
the camera) so two convolution masks of dimension
23x23 pixels are used to perform Circle Hough
Transform and, consequently, a candidate region
having size 23x23 pixel is given as input to the
validation step based on Scale Invariant Feature
Transform.
a b c
Figure 3: Three different ball appearances. a) The ball in a
sunny day. b) The ball during an evening match. c) The
Ball in the goal post. In this case the grid of the goal post
is between the camera and the ball.
In the validation step, the positive training set
consists of only 17 ball patches acquired during an
evening match.
The system needs several positive ball examples
(not just one) due to the different texture of the
considered ball under different views. Theoretically,
using a uniformly textured ball, the training set
could be reduced to just one image.
In Figure 4, six of seventeen of the training
images are reported.
For each test image the keypoint descriptors (the
descriptor length is 128 and its elements are
normalized to unit length) are compared with those
of the reference images by means of the matching
procedure.
Figure 4: Six of the seventeen training images used in the
experiments.
Keypoint descriptors matching procedure is
based on Euclidean distance. Matching procedure
yields, for each test image, the number of matched
keypoints for each training image.
Finally, to validate the tested patch, the mean
value of successful matches between the tested patch
and the 17 training patches is used. If the mean value
is higher than a experimental threshold then the test
image is labelled as ball otherwise it is labelled as
no-ball.
SIFT APPROACH FOR BALL RECOGNITION IN SOCCER IMAGES
209
a
b
Figure 5: a) The keypoints localized on a ball image.
b) Three keyponts matching between a test patch and a
training patch.
In Figure 5.a the keypoints localized on a ball
image are shown.
In Figure 5.b the keyponts matched between a
test patch and a training patch are connected.
Figure 6: Two ball instances not detected by CHT (first
row) and the corresponding regions erroneously chosen as
ball candidates (second row).
4 EXPERIMENTAL RESULTS
The experimental phase has been divided in two
parts: in the first one the ball recognition approach
has been evaluated on the sequences acquired during
evening matches (with artificial light conditions); in
the second part the proposed approach has been
applied on images acquired with natural light
conditions. In the first experiment a set of 3560
images was used; 1945 of these images contain the
ball (and nearly always some players and the goal-
keeper) and the remaining 1615 do not contain the
ball but only some players and the goal-keeper.
Table 1 reports a scatter matrix explaining the
performance of the Circle Hough Transform to
correctly extract the region containing the ball (when
it is present in the scene).
Table 1: The performance of the Circle Hough Transform
for ball candidate detection in soccer images acquired
during evening matches.
Extracted Patches
containing the Ball
Extracted
Patches not
containing
the Ball
Images
containing
the Ball
(1945)
98.76 %
(1921/1945)
1.23%
(24/1945)
Images not
containing
the Ball
(1615)
0%
(0/1615)
100%
(1615/1615)
When the ball was in the scene the CHT was
almost always able to correctly extract the patch
around it as a candidate ball region. The CHT failed
only when the ball was heavily occluded.
Figure 6 reports two occurrences where the ball
region was not identified by CHT (first row) and the
corresponding regions erroneously chosen as ball
candidates (second row). In these cases the peaks in
the accumulation space associated to the objects in
the second row (a shoulder of the goal-keeper and a
shoe of a player) overcomes the one associated to
the ball caused by occlusion. In Table 2 a scatter
matrix explaining the performance of the validation
step by matching of SIFT are reported.
Table 2: The performance of the SIFT Keypoints matching
for ball candidate validation in soccer images acquired
during evening matches.
Ball No-Ball
Ball
(1921)
90.3 %
(1734/1921)
9.7%
(187/1921)
No-Ball
(1639)
10.92%
(179/1639)
89.08%
(1460/1639)
More than 90% of candidate regions containing
the ball were correctly validated. At the same time,
almost 89% of candidate regions that do not contain
the ball were correctly discarded.
Figure 7: A false positive occurs when, in under a certain
perspective, the texture of the player’s shoe matches some
areas of the ball.
ICEIS 2008 - International Conference on Enterprise Information Systems
210
In Figure 7 a case of incorrect validation is
shown. Keypoints relative to the texture of the
player’s shoe matches some areas of the ball causing
an error obtained by using the proposed approach.
Table 3 summarizes ball recognition
performance by the proposed approach with
artificial light condition. It combines the
performance of the circle Hough transform for
candidate ball detection (Table 1) and SIFT
matching for candidate ball validation (Table 2). As
reported the global performance of the system are
almost to reach 90% of correct recognition of the
ball in the images.
Table 3: The global performance of the proposed approach
for recognizing the ball in soccer images acquired during
evening matches.
Ball No Ball
Ball
(1945)
89.15 %
(1734/1945)
10.84%
(211/1945)
No Ball
(1615)
11.08%
(179/1615)
88.91%
(1436/1615)
In the second experiment a set of 2147 images
acquired with natural light conditions was used.
1034 of these images contain the ball (and nearly
always some players and the goal-keeper) and the
remaining 1113 do not contain the ball but only
some players and the goal-keeper.
Table 4 reports the scatter matrix explaining the
performance of the Circle Hough Transform to
correctly extract the region containing the ball (when
it is present in the scene).
Table 4: The performance of the Circle Hough Transform
for ball candidate detection in soccer images acquired with
natural light conditions.
Extracted Patches
containing the Ball
Extracted
Patches not
containing the
Ball
Images
containing
the Ball
(1034)
97.19 %
(1005/1034)
2.80%
(29/1034)
Images not
containing
the Ball
(1113)
0%
(0/1113)
100%
(1113/1113)
Comparing Table 1 and table 4 it is possible to
conclude that performance are similar to those
obtained in the first experiment even if some
additional misdetections of the ball regions occur
due to the presence of the self shadow on the ball
that reduces edge matching in the convolution with
the oriented masks.
In Table 5 the performance of the validation step
by matching of Scale Invariant Feature are reported.
The presence of self shadows and the saturation of
some areas of the ball appearance due to the sunrise
reflection (see Figure 2.a) reduce ball recognition
performance with respect to Table 2.
Table 5: The performance of the SIFT’s Keypoints
matching for ball candidate validation in soccer images
acquired with natural light conditions.
Ball No Ball
Ball
(1005)
83.08 %
(835/1005)
16.91%
(170/1005)
No-Ball
(1142)
10.07%
(115/1142)
89.93%
(1027/1142)
Table 6 summarizes ball recognition
performance by the proposed approach with natural
light conditions. It combines the performance of the
circle Hough transform for candidate ball detection
(Table 4) and SIFT matching for candidate ball
validation (Table 5).
Table 6: The global performance of the proposed approach
for recognizing the ball in soccer image acquired with
natural light condition.
Ball No Ball
Ball
(1034)
80.75 %
(835/1034)
19.25%
(199/1034)
No Ball
(1113)
10.33%
(115/1113)
89.67%
(998/1113)
In Figure 8 an example of misrecognition of the
ball due to reflection effects on the ball surface is
shown. Anyhow, performance remain satisfactory
(more than 80%) even if the small training set does
not contains balls acquired in the tested natural
lighting conditions.
Figure 8: An example of misrecognition of the ball due to
reflection effects on the ball surface.
SIFT APPROACH FOR BALL RECOGNITION IN SOCCER IMAGES
211
As will be highlighted in the conclusions, this is,
in our opinion, a very pleasant result, because the
manual extraction of a small set of positive
examples guarantee very good results even in light
conditions that were not well represented or not
considered at all. This makes the proposed approach
sometimes preferable with respect to methods that
are more accurate but also more demanding in user
intervention.
5 DISCUSSION AND
CONCLUSIONS
Experimental results demonstrate the capability of
the proposed approach to recognize the ball in image
soccer sequences acquired in different light
conditions.
Ball recognition performance are comparable to
those obtained in previous works by using
appearance based methods involving more complex
training procedures.
Differently from the other appearance based
methods that can be found in the literature, the
proposed one does not require, nevertheless, a long
and tedious phase to build different training sets to
manage different light conditions. Moreover it does
not require any negative training set, avoiding the
difficulties relative to the balancing of the number of
negative and positive examples that occurs when, as
in the soccer context, negative examples abound
(player’s socks, pants or shirts, advertising posters,
etc.).
The usefulness of such approach allows to use
the method in real systems with little user
intervention during the setup phase of the system
installation. Last but not least, the fact that it poses
less emphasis in the acquisition of negative
examples and the balancing with the positive ones,
means that it is less prone to errors when dealing
with previously unknown external objects.
In the reported experiments only one set of 17
ball images acquired in an evening match, was used
to performs ball recognition in any lighting
condition. This is a very pleasant characteristic for a
ball recognition system considering that ball texture
is not uniform and moreover ball appearance can
change also depending on the stadium.
In conclusion, the proposed approach seems to be a
proper trade off between performance, portability
and easiness to start up. Future work will be
addressed to improve classification performance
both using newest vision tools able to avoid
saturation effects on ball surface and introducing
different keypoint matching strategies.
REFERENCES
Atherton, T. J., Kerbyson, d. J., 1999. Size invariant circle
detection. Image and video Computing.
D’Orazio, T., Guaragnella, C., Leo, M., Distante, A.,
2004. A new algorithm for ball recognition using
circle Hough Transform and neural classifier. Pattern
Recognition, 37, 393-408.
Jones, M., Poggio, T., 1997. Model based Matching by
Linear Combinations of Prototypes. Proceedings of
Image Understanding workshop.
Leo, M., D’Orazio, T., Distante, A., 2003. Independent
Component Analysis for Ball Recognition in Soccer
Images. Proceedings of the Intelligent Systems and
Control.
Lowe, D. G., 2004. Distinctive Image Features from
Scale-Invariant Keypoints. International Journal of
Computer Vision, 60, 2, 91-110.
Mohan, A., Papageorgoiu, C., Poggio, T., 2001. Example-
based Object Detection in Images by Components.
IEEE Transactions On Pattern Analysis and Machine
Intelligence, 23, 4, 349-361.
Murase, H., 1995. Visual Learning and Recognition of 3-
D Objects from Appearance. International Journal of
Computer Vision, 14, 5-24.
Papageorgiou, C., Oren, M., Poggio, T., 1998. A general
framework for Object Detection. Proceedings of the
International Conference for Computer Vision.
Ren, J., Orwell, J., Jones, G. A., Xu, M., 2004. A general
framework for 3d soccer ball estimation and tracking,
ICIP 2004, 1935-1938.
Rowley, H., Baluja, S., Kanade, T., 1998. Neural
Network-Based Face Detection. IEEE Trans. On
Pattern analysis and Machine Intelligence, 20, 1, 23-
38.
Tong, X. F., Lu H. Q., Liu Q. S., 2004. An effective and
fast soccer ball detection and tracking method. Pattern
Recognition, ICPR 2004, Proceedings of the 17th
International Conference, 4, 795-798.
Yu, X., Leong, H.W., Xu C., Tian, Q., 2006. Trajectory-
based ball detection and tracking in broadcast soccer
video. IEEE Transactions on Multimedia, 8, 6, 1164-
1178.
ICEIS 2008 - International Conference on Enterprise Information Systems
212