intended to satisfy two main requirements: speed and
robustness. First, the system needs to execute quickly,
in real time or close to it. Second, the system needs to
be robust to changes in the environment, to deal with
other moving objects, to track a fast-moving and er-
ratically moving object, to deal with occlusion, and to
track an object whose screen-space size varies signif-
icantly during a single test, owing to changes in dis-
tance to the camera (the ball ranged in apparent size
from 5 pixels to 60 pixels during our tests). As we
will see, we met some of these criteria, but not all;
nonetheless, the techniques we developed have con-
siderably better performance than existing techniques
that we compared against.
The paper is organized as follows. Section 2
discusses previous work, including existing methods
which we incorporate into our algorithm. Section 3
describes our algorithms. Section 4 describes our ex-
periments, analysis of the data, and evaluation of the
technique. Finally, we close in Section 5 with con-
cluding remarks and suggestions for future work.
2 PREVIOUS WORK
A number of object detection and tracking techniques
have been developed in the last two decades for track-
ing humans (Rano et al., 2004) and cars (Stauffer
and Grimson, 1999). More recently, researchers have
examined computer vision techniques for tracking
sporting events (Han et al., 2002; Assfalg et al., 2002;
Sudhir et al., 1998). Here we review the related liter-
ature on computer vision based object detection and
tracking techniques.
Viola et al. (Viola and Jones, 2001) introduced
classifier cascades for object recognition. They
trained a set of weak classifiers on a set of very sim-
ple features, one classifier per feature; the classifiers
are used in sequence to detect the presence of the tar-
get object, and since the weak classifiers are able to
reject most non-target objects quickly, the majority
of the computational effort is spent on difficult cases.
Lienhart and Maydt (Lienhart and Maydt, 2002) ex-
tended this work by proposing a richer set of features
(Haar-like features, including edge, line, and center-
surround features) and showing a lower false positive
rate than was achieved by the simple feature set of
Viola et al (Viola and Jones, 2001).
Stauffer and Grimson (Stauffer and Grimson,
1999) proposed a background model in which each
pixel is a mixture of Gaussian distributions; pixels
which fit into some existing distribution are consid-
ered background, while pixels which lie outside all
distributions are considered foreground. The method
allows the distributions to adapt to new samples, so
that only parts of the image which change faster than
a set learning rate are still considered foreground, and
portions which change more slowly are incorporated
into the background.
Ren et al. (Ren et al., 2004) devised K-ZONE,
a system for tracking baseball pitches. They used
a mixture of Gaussians for background discrimina-
tion; their method uses trajectory information to re-
ject some ball candidates. They report good results
for their context, but the trajectory of the baseballs
is considerably constrained compared to the variation
we can expect in a tennis match.
D’Orazi et al. (D’Orazio et al., 2002) propose
a system for tracking soccer balls using a modified
Hough transform. They use the parametric represen-
tation of a circle to transform the image and deter-
mine points which are on the soccer ball. They show
that the circular Hough transform is effective in de-
tecting the soccer ball. However, their algorithm re-
quires considerable processing to be viable as a real-
time ball tracking technique.
In (Sudhir et al., 1998) the authors perform an au-
tomatic analysis of tennis video to facilitate content-
based retrieval. They generate an image model for
the tennis court-lines based on the knowledge of the
dimensions and connectivity of a tennis court and typ-
ical geometry used when capturing a tennis video.
They use this model to track the tennis players over
a sequence of images.
In (Pingali et al., 2000) the authors use multiple
cameras to track the 3D trajectory of the ball using
stereo matching algorithms. A multi-thread approach
is taken to track the ball using motion, intensity and
shape. However, they do not give enough details of
their implementation to compare their approach with
ours.
Throughout this paper we use various image pro-
cessing techniques, including median filtering and
shape feature extraction (Shapiro and Stockman,
2001). The median filter is used to reduce noise in
the image while shape features, including aspect ra-
tio, compactness, and roughness, are used to check if
a region’s properties resemble a ball or not.
3 ALGORITHMS AND INITIAL
RESULTS
Complex algorithms, such as boosted classifiers based
on Haar-like features, and circular Hough transforms
have been brought to bear on the problem of tennis
ball tracking. Many problems arise when they are ap-
plied to the tennis tracking system.