framework based on short term and long term mem-
ory that allows an incremental processing of data
streams. However, the tennis model used only in-
cludes one variable (the ball landing position) and
only eight different locations. Chu and Tsai (Chu and
Tsai, 2009) use symbolic sequences to tackle tactics
analysis. They use players location (four areas), play-
ers movement direction (up, down, left, right, still)
and players speed (fast, medium, still) to find frequent
movement patterns.
3 FORMALIZATION
In this section we explain how we formalize a tennis
match between two players, 1 and 2. For the rules of
tennis, the reader is referred to (International Tennis
Federation, 2010).
Although many computerized systems exist for
collecting and managing observational data, our need
to record the exact position of the players and the ball
on the court, forced us to develop a standalone appli-
cation that allowed us to calculate those positions on
a reference court model by means of computer vision
algorithms and camera calibration techniques. It is
not the aim of this paper to detail the methods and
algorithms used to obtain the data. The interested
reader is referred to (Hartley and Zisserman, 2003;
Hayet et al., 2005) for further information. Along
with player and ball positions, other relevant variables
were also collected as part of our sequential data.
3.1 Definitions
We will consider an event as a single stroke episode.
This event will contain all attributes that characterize
the stroke, i.e., the player that hits the stroke, the type
of stroke, the position of both players at the time of
hitting the ball, the position of the ball landing on the
opponent’s side after the stroke, the generated speed
of the ball, etc. A rally, on the other hand, refers to the
sequence or series of events that completely describe
the strokes exchanged by the players during a game
point. In other words, a rally will always start with a
service and will end with the final stroke that leads to
the conclusion of the point.
We will also define a partial rally as a subse-
quence of a rally. Partial rallies are made of consec-
utive events, with players alternating. For instance,
looking at rally hA,B,C, D, Ei, then hB,C, Di is a par-
tial rally, whereas hB, Di is not.
3.2 Reference Model
All integer coordinate pairs of events will be in the set
C = {0, 1, . . . , 316} × {0, 1, . . . , 768}. The positions
between (0, 0) and (316, 768) represent coordinates
both inside and outside of the court, being (50, 150)
and (266, 618) the coordinates of the top left corner
and the bottom right corner of the doubles court re-
spectively. This reference system gives us 2.5 m of
space at each side of the doubles sidelines and 7.5 m
at each side of the baselines which is sufficient to cap-
ture all the action within a match.
Because the players change sides every couple of
games, a transformation in the coordinates is needed
so that the data is always coherent.
3.3 Attributes Considered
We will now first focus on the stroke level and rally
level. There we have the following attributes (for each
attribute the possible values are mentioned):
• pl: player hitting the ball, {1, 2};
• st: stroke type, {FS, SS, FH, FHS, BH, BHS, VOL,
SM, LOB, DSH}, corresponding to: first serve,
second serve, forehand, forehand sliced, back-
hand, backhand sliced, volley, smash, lob and
drop shot, respectively;
• P
1
= (x
1
, y
1
): position of the player when the ball
is hit, C;
• P
2
= (x
2
, y
2
): position of the opponent when the
ball is hit, C;
• P
3
= (x
3
, y
3
): position of the ball when it bounces
on the opponent’s half of the court, C;
• sb: speed of the ball generated after the stroke,
{slow, normal, fast};
• us: unbalancing stroke that breaks the exchange
equilibrium, {0, 1, 2, 3}.
As an example, a sequence including the first
events within a rally might look like this:
h(2, FS, (142, 618), (231, 56), (163, 267), fast, 1),
(1, BHS, (191, 64), (134, 610), (103, 566), slow, 0),
(2, FH, (78, 608), (173, 55), (108, 239), fast, 2), . . .i
Most attributes are self-explanatory. Attribute us
represents the intention of one player to attack and
destabilize the rally with his/her stroke. The non-zero
values indicate whether it is a first, second or third
attack. Very rarely a player will need more than three
strokes to finish an attack, and in such a case, one
could argue that the opponent did recover from the
initial attack and lost the point later on due to a new
and different attack.
TACTICAL ANALYSIS MODELING THROUGH DATA MINING - Pattern Discovery in Racket Sports
177