Automatic Recognition of Sport Events from Spatio-temporal Data: An
Application for Virtual Reality-based Training in Basketball
Alberto Cannav
`
o, Davide Calandra, Gianpaolo Basilic
`
o and Fabrizio Lamberti
Politecnico di Torino, Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi 24, 10129 Torino, Italy
Keywords:
Machine Learning, Event Recognition, Virtual Reality, Sport Training.
Abstract:
Data analysis in the field of sport is growing rapidly due to the availability of datasets containing spatio-
temporal positional data of the players and other sport equipment collected during the game. This paper inves-
tigates the use of machine learning for the automatic recognition of small-scale sport events in a basketball-
related dataset. The results of the method discussed in this paper have been exploited to extend the func-
tionality of an existing Virtual Reality (VR)-based tool supporting training in basketball. The tool allows the
coaches to draw game tactics on a touchscreen, which can be then visualized and studies in an immersive VR
environment by multiple players. Events recognized by the proposed system can be used to let the tool manage
also previous matches, which can be automatically recreated by activating different animations for the virtual
players and the ball based on the particular game situation, thus increasing the realism of the simulation.
1 INTRODUCTION
In previous research, thanks also to recent advance-
ments in tracking technology, the use of spatio-
temporal data collected during matches or training
sessions has grown significantly in many competi-
tive sports (Richly et al., 2016). A number of so-
lutions based on different sensing techniques have
been presented in the literature, which allow to record
the movement of the players and other equipment (a
tennis ball, a baseball bat, etc.) at high sampling
rates (von der Gr
¨
un et al., 2011; Jiang and Yin, 2015;
D’Orazio et al., 2010). The analysis of tracking data
concerning the players, the ball, etc. can provide
coaches with helpful insights about the game, which
can be used for the automatic recognition of the op-
posing team’s strategy (Varriale and Tafuri, 2016), the
generation of commentaries for matches (Zheng and
Kudenko, 2012), etc.
Based on these observations, this paper investi-
gates the use of machine learning for the automatic
recognition of players’ activity or actions from
spatio-temporal data for VR-based basketball appli-
cations (though information extracted could be ex-
ploited in other contexts, like those above). The paper
builds on a previous work targeted to soccer (Richly
et al., 2016). With respect to (Richly et al., 2016), in
this paper new features are extracted, which permit a)
to consider aspects that were not taken into consid-
eration in that work, b) to integrate data not present
in the reference dataset (like, for instance, the vertical
position of the ball), and c) to account for different
characteristics of basketball w.r.t. to soccer, with the
final goal of improving recognition accuracy.
The recognition method proposed in this work has
been integrated in an immersive VR tool to allow the
visualization of animated reconstructions of previous
basketball matches for tactic analysis and training.
Specifically, events identified through machine learn-
ing are provided in input to the VR system, which
uses them to activate proper player’s animations.
2 BACKGROUND
A few methods have been experimented already for
the automatic recognition of sport events. For in-
stance, in (Zheng and Kudenko, 2012), inductive
learning techniques are used for the automatic gen-
eration of commentaries for football matches within a
management simulation game named Championship
Manager. Three classification techniques (Decision
Tree, KNN, and Na
¨
ıve Bayes) are exploited to find
the mapping between game states and commentaries.
In (Teachabarikiti et al., 2010), an algorithm for track-
ing the players and the ball in tennis is proposed to
enable automatic footage annotation. By analyzing
the motion patterns of the players and the ball, the
310
Cannavò, A., Calandra, D., Basilicò, G. and Lamberti, F.
Automatic Recognition of Sport Events from Spatio-temporal Data: An Application for Virtual Reality-based Training in Basketball.
DOI: 10.5220/0007524203100316
In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 310-316
ISBN: 978-989-758-354-4
Copyright
c
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
algorithm is able to classify a player’s action into ei-
ther backhand and forehand stroke with high preci-
sion and recall rates. The authors of (McQueen et al.,
2014) exploit players’ tracking data to recognize of-
fensive strategies in basketball through a linear SVM
classifier and a rule-based algorithm. In (Richly et al.,
2016), three machine learning approaches, namely
SVM, KNN, and RF, are experimented for the pur-
pose of classifying events in a soccer match, like
passes or receptions. The dataset used therein refers
to matches of the German Bundesliga, and contains
the timestamp, the two-dimensional coordinates of
the ball, a list of game events (e.g., fouls, substitu-
tions, offsides, etc.) and player involved. Event clas-
sification is accomplished by working with several
features computed by considering the raw position
data for the ball. To train the classifiers, the dataset
is annotated by manually identifying the events of in-
terest in the footage of three matches.
By building upon the works found in the literature,
this paper presents the design and evaluation of an im-
proved technique for the automatic classification of
sport events from spatio-temporal data. In particular,
given the promising results reported in (Richly et al.,
2016), this paper moves by considering the method-
ology developed in that work as a reference, and ex-
tends it to target a different sport, i.e., basketball. Af-
ter having experimented the same algorithms and the
same set of features used in the reference work on a
dataset containing position data from National Bas-
ketball Association (NBA) matches, this paper addi-
tionally proposes a new set of features, which proved
to significantly boost the performance of basketball
event recognition and classification. Finally, the pa-
per explores how automatically information extracted
can be used to support the job of both coaches and
players by enhancing the functionality of an existing
VR-based tool for tactics analysis.
3 METHODOLOGY
This section describes the dataset as well as the fea-
tures that have been developed/used in this paper.
3.1 Dataset
The original dataset refers to the 2015–16 season
of the NBA (https://github.com/sealneaward/nba-
movement-data/tree/master/data), and contains
spatio-temporal data collected at 20 Hz. Data are
structured in matches and actions (for a given match).
For each action, the position of the ball and of the
players is recorded. The dataset, stored as a .csv file,
consists of the following values:
team
id
: identifier of the team to which player be-
longs to, 1 if the tracked object is the ball;
player
id
: identifier of the tracked object, 1 if the
tracked object is the ball;
x
loc
, y
loc
, z
loc
: 3D spatial position of the tracked
object (the z coordinate is provided only for the
ball);
game
clock
: remaining time of the match;
shot
clock
: remaining time of the 24 seconds
granted to a team to finalize an offensive action;
quarter: quarter of the game;
game
id
: identifier of the match;
event
id
: identifier of the action in the game.
The coordinate system used for x
loc
and y
loc
is nor-
malized in the 0 100 and 0 50 range, respectively
for the x and y axis; the bottom-left corner is rep-
resented by point with (0, 0) coordinates. To create
the annotated dataset, sports events were manually
identified in the footage of the San Antonio Spurs vs
Minnesota Timberwolves match that was played on
December 23rd, 2015. Like in the reference work,
passes and receptions were considered. Other events,
like shots, dribbles, etc. were marked with the label
“other”. Part of the events belonging to the latter cat-
egory were randomly deleted, in order to balance the
frequency of the three events. At the end of the pro-
cess, the annotated dataset included 180 entries per
event category.
3.2 Features
According to the reference work, a sport event can
be recognized in a dataset containing spatio-temporal
data by analyzing the values of several features that
characterize it. Features have been extracted by run-
ning a script on the above data. For each time t in
the dataset, a vector is obtained containing a value for
each feature. Features used in this work can be cate-
gorized in ve groups. The first group contains the
(”two-dimensional”) features directly derived from
the reference work. The remaining groups host the
new features that have been introduced in this paper.
In particular, the features in the second group are cal-
culated by considering only the movement of the ball
along the z axis; hence, they are referred to as “ver-
tical”. Features in the third group are those in the
second group, but adapted to a “three-dimensional”
space. For features in the fourth group, the posi-
tion of the players is also considered; thus, they are
Automatic Recognition of Sport Events from Spatio-temporal Data: An Application for Virtual Reality-based Training in Basketball
311
called “players”’ features. Lastly, features in the fifth
group are computed by aggregating data (and comput-
ing mean and variance values for them within given
time windows): hence, they are referred to as “ag-
gregated” features. The position of a tracked object
o at time t will be represented as p(o,t). Similarly,
p
x
(o,t) and p
y
(o,t) will be used to refer to the posi-
tion along the x and y axes. The distance between two
consecutive positions will be defined as:
d(o, t
1
) = p(o,t
2
) p(o,t
1
) (1)
where t
1
and t
2
are different time samples and t
1
< t
2
.
3.3 Two-dimensional Features
The features in this group consider only the position
of the ball in two dimensions (hence, a subscript 2D
will be used). In the reference work, the z dimension
was not considered because the soccer dataset con-
tained only two coordinates for the ball.
3.3.1 Velocity
The two-dimension velocity, introduced since it is an
indicator of the ball’s momentum, is calculated by di-
viding the length of the direction vector d(o, t
1
) by the
time interval between two adjacent samples:
Vel
2D
(o,t
1
) =
|
d(o, t
1
)
|
t
2
t
1
(2)
3.3.2 Acceleration
The acceleration, like the velocity, was introduced as
an indicator of the ball’s momentum, and it is com-
puted as:
Acc
2D
(o,t
1
) =
Vel
2D
(o,t
2
) Vel
2D
(o,t
1
)
t
2
t
1
(3)
3.3.3 Acceleration Peaks
Given the sampling rate of the data, the same acceler-
ation could be captured in consecutive time samples.
Therefore, the authors of the reference work intro-
duced two features referred to as acceleration peaks,
that combine consecutive acceleration values by se-
lecting the highest and the lowest ones among adja-
cent values, respectively. The computation of actual
maximum and minimum peaks can be split in two
steps. In the first step, the sum of two consecutive
accelerations is computed ignoring negative and pos-
itive values by setting them to 0 for the computation
of the former and the latter, respectively:
AP
2D max
(o,t
2
) =
xt
1
,t
2
max(0, Acc
2D
(o, x)) (4)
AP
2D min
(o,t
2
) =
xt
1
,t
2
min(0, Acc
2D
(o, x)) (5)
In the second step, in order to avoid the detec-
tion of a peak in two consecutive samples, the ac-
tual (real) acceleration peaks AP
2D max
real
(o,t
2
) and
AP
2D min
real
(o,t
2
) are computed by setting them to
AP
2D max
(o,t
2
) and AP
2D min
(o,t
2
) only if the value of
the feature at time t
2
is higher than values at t
1
and t
3
,
otherwise they are set to 0.
3.3.4 Direction Change
This feature considers the variations in the trajectory
of the ball during the game, taking into account the
angle between two consecutive direction vectors. It
was added to improve the recognition of event like
passes or shoots characterized by an high value of this
metric. The direction change DC
2D
(o,t
2
) of object o
at time t
2
is obtained by applying the arccos() func-
tion as follows:
DC
2D
(o,t
2
) = arccos
d(o, t
1
) d(o,t
2
)
|
d(o, t
1
)
|
|
d(o, t
2
)
|
(6)
3.3.5 Distance to Target
During a match, the ball should be thrown into one
of the baskets (nets, in the reference work) in order
to earn some points. Therefore, it is reasonable to
assume that the ball moves towards one these targets.
For this reason this metric is used to recognize passes
from other shot. The distance of object o at time t
from the target is calculated as:
DT
2D
(o,t) =
|
p(o,t) b(o, t)
|
(7)
where b(o,t) represents the target position assigned
depending on the direction of the ball w.r.t. to the
x axis. This position could be either the point T
1
with
coordinates (0, 25) if the ball moves towards the left
side of the court, or T
2
with coordinates (100, 25) if
the ball moves towards the right side, as shown in Fig-
ure 1a. The figure shows also different distances (rep-
resented by solid lines) computed depending on the
direction of the ball (represented by an arrow at each
data point). For object at point P
1
, which is character-
ized by a horizontal velocity equal to 0, target cannot
be determined; hence, the feature value is set to .
3.3.6 Cross on Target Line
This feature is defined by considering the distance be-
tween the target and the position in which the ball
would cross the end line should the current trajec-
tory be maintained up to the line. Figure 1b shows
a data point P
1
and its direction vector d
1
. Should the
GRAPP 2019 - 14th International Conference on Computer Graphics Theory and Applications
312
(a) (b) (c)
Figure 1: Calculation of a) Distance to Target, b) Cross on Target Line, and c) Cross on Target Line features.
ball continue to move without any direction change
(dashed line), it would reach the end line in C
1
. The
distance between C
1
and the target position T
1
is the
actual value of this feature. Position of C
1
can be cal-
culated as:
b
x
(o,t)
ctl
= p(o, t) + s d(o,t) (8)
where s is a factor that, if it is multiplied for the di-
rection vector of the object o at time t and added to
the position of the object o at time t, allows to reach
the end line. From all of the above it is possible to
compute CT L
2D
(o,t) as:
CT L
2D
(o,t) = p
y
(o,t) + d
y
(o,t)
b
x
(o,t) p
x
(o,t)
d
x
(o,t)
(9)
where the subscript identifies the axis considered.
3.4 Vertical Features
Features in this and in the following groups are
those defined in the current work. In particular, this
group contains some of the features in the previ-
ous group recomputed considering only the z coor-
dinate: Vel
V
(o,t), Acc
V
(o,t), AP
V max
real
(o,t), and
AP
V
min
real
(o,t). The remaining features cannot be
recalculated, since the single dimension considered
does not allow to identify the direction of the ball.
3.5 Three-dimensional Features
In this group, features Vel
3D
(o,t), Acc
3D
(o,t),
AP
3D max
real
(o,t), AP
3D min
real
(o,t), CTV
3D
(o,t) are
calculated by considering the three coordinates.
Hence, the subscript 3D is used. For CT L
3D
(o,t), a
parabolic trajectory is assumed (as shown in Figure
1c), and the feature is computed as:
CT L
3D
(o,t) =
1
2
gt
2
line
+Vel
z
(o,t)t
line
+ p
z
(o,t)
(10)
where g is the gravity acceleration, Vel
z
(o,t) is the
component along the z axis of V el
3D
(o,t), and t
line
is
the time that is required for the ball to reach the end
line; t
line
is defined as:
t
line
=
b
x
(o,t) p
x
(o,t)
Vel
x
(o,t)
(11)
where Vel
x
(o,t) is the component of Vel
3D
(o,t) along
the x axis.
3.6 Players’ Features
This group contains two features that take into ac-
count the relationship between the position of the ball
and the players. These features have been introduced
because the way ball position changes in close prox-
imity to a player could be a valid descriptor especially
for some basketball events.
3.6.1 Ball-player Distance
This feature computes the distance between the ball
and the closest player at time t. It is defined as:
BPD(o,t) =
p(o,t) p
player
(o,t)
(12)
where p(o,t) is the two-dimension position of the ball
at time t and p
player
(o,t) is the two-dimension posi-
tion of the closest player.
3.6.2 Team of Closer Player
This feature represents the team of the player closest
to the ball at time t.
3.7 Aggregated Features
This group includes a set of features computed by
aggregating consecutive samples. The aggregation
considers the average and the variance values calcu-
lated in two time windows, named before-window and
after-window. In this way, the aggregation allows to
take into account the features’ dynamics. The size
of the two windows have been experimentally defined
and includes 20 samples (i.e., one second) before and
after the current time. The features considered for
the aggregation are: p
z
(o,t), V el
V
(o,t), Acc
V
(o,t),
DC
2D
(o,t), BPD(o,t).
Automatic Recognition of Sport Events from Spatio-temporal Data: An Application for Virtual Reality-based Training in Basketball
313
Table 1: Recognition of basketball events using KNN.
Pass
Reception Other
Precision 0.69 0.68 0.93
Recall 0.65 0.67 1.00
F-measure 0.67 0.67 0.96
Accuracy 76.67%
4 PERFORMANCE EVALUATION
Features described in the previous section have been
used in combination with the three machine learning
algorithms considered in (Richly et al., 2016). For ev-
ery time t, a vector was created containing the values
of all corresponding features. Each vector represents
an event that occurs during the game and it is char-
acterized by particular values of the defined features.
For example, passes are characterized by a significant
acceleration peak and presents a high value for the
direction change feature, whereas in the case of re-
ceptions, the ball shows a strong negative acceleration
and the distance with the closest player, probably the
ball’s owner, remains almost the same. The data sci-
ence software platform named Rapidminer was used
to run the algorithms. As said, the paper focused
on the recognition of three events: pass, reception
and other ball events though in basketball, rather than
in soccer. In order to assess the quality of results
achieved, accuracy, precision, recall and F-measure
were calculated. To cope with the reduced size of the
dataset, cross validation with 20 partitions and linear
sampling were used. Evaluation was carried out by
considering different combinations of the features in
the five groups. Initially, only the first group was con-
sidered, to qualitatively compare results obtained on
the new dataset with those in (Richly et al., 2016).
Afterwards, the vertical and the players’ features were
added. The next experiment consisted in replacing the
two-dimensional features with the three-dimensional
ones. Lastly, the aggregated features were integrated.
At every change in the set of features considered, the
overall accuracy improved: from the initial value of
33.68% obtained when using only the first group of
features (and comparable to that obtained in the ref-
erence work for soccer events), it reached a value of
76.67% when using the last set of features. Table 1 re-
ports recognition results for each event obtained with
KNN, which achieved the best performance.
5 APPLICATION SCENARIO
The method illustrated in the previous sections has
been used to extend the functionalities of an existing
tool for VR-based training in basketball. The tool,
named VR Playbook (Cannav
`
o et al., 2018), was de-
signed to let coaches and players create tactics and
visualize previous basketball games in an immersive
environment. The VR Playbook tool offers coaches
several graphics means for drawing a tactics in 2D
with a tablet device by moving players and defining
actions for them (passes, stops, throws, etc.) on a
timeline (Figure 2a). The tool then creates the cor-
responding 3D animation that can be visualized at the
same time by multiple players wearing VR headsets
(Figures 2b and 2c). To this purpose, the timing and
type of manually defined events are used to activate
realistic players’ animations which were previously
recorded using motion capture. Tactics could also
be saved (exported) and reloaded (imported) for later
use. In the native implementation of the tool, in order
to visualize the actions of a previous match coaches
had to manually add players’ events to the timeline,
e.g., based on available game footage or by resort-
ing to their memory. Players’ trajectories could be ei-
ther defined by drawing arrows on the touchscreen be-
tween the starting and ending points of a given action,
or by adding many intermediate points to the timeline
to avoid straight paths. Alternatively, they could load
a dataset like the one used in this paper and recreate
actual displacements. However, without annotations
concerning events’ timing and type, animations cre-
ated would be poorly realistic, since positional data
could only be used to activate a run cycle animation
for players. In this paper, the devised methodology
has been used to extract players’ events from a dataset
containing only spatio-temporal positional data and to
store them in a format ready to be parsed and imported
in the VR Playbook tool. In this way, the quality (re-
alism) of the simulation can be improved, since the
exact time a given animation shall begin/end is auto-
matically defined, and a more correct relationship be-
tween the players’ hands and the ball can be identified
(and used for blending the run and pass animations).
The integration of the devised methodology (the mod-
ule named Event Recognizer) in the architecture of
the VR playbook tool is depicted in Figure 3. It can
be easily observed that integration is transparent to the
users, since automatically extracted events are treated
as manually defined ones, and coaches are allowed to
further modify them using the tablet-based interface.
An example of the quality of animations that could
be created using only dataset’s raw data is given in
Figure 4a. Improvements that could be obtained us-
ing the proposed automatic event recognition are il-
lustrated in Figure 4b. A video is also available for
download at https://goo.gl/ucDzH7.
GRAPP 2019 - 14th International Conference on Computer Graphics Theory and Applications
314
(a) (b) (c)
Figure 2: VR Playbook tool: a) tablet interface for drawing tactics, and b)-c) animations displayed on VR headsets.
NETWORK
PLAYERS’ APPLICATION
COACH APPLICATION
DATA S ET
spatio-temporal
data
feature vectors
players’ & ball’s positions,
recognized events
FILE
PARSER
EVENT
RECOGNIZER
(NBA)
CSV
C#
Figure 3: Integration of the devised event recognition
methodology into the VR Playbook tool.
(a)
(b)
Figure 4: Frames of a 3D animation created using a) only
raw positional data, and b) automatically recognized events.
6 CONCLUSIONS
Results reported in this paper confirm the suitabil-
ity of machine learning techniques for the identifi-
cation of small-scale sport events in spatio-temporal
data collected during basketball games. In particu-
lar, features leading to good performances in the con-
sidered conditions are identified. Besides quantita-
tive measurements concerning the accuracy of event
recognition, preliminary evidences on the effective-
ness of the devised methodology have been also col-
lected through qualitative observations on the realism
of animations that can be generated by integrating
automatic event recognition in a computer animation
tool. Future work will be devoted to the exploration
of new features and classification methods (e.g., based
on deep learning) as well as the recognition of other
small-scale basketball events (like throws, screens,
cuts, etc.) and of large-scale phenomena occurring
during the game (e.g., to predict dangerous actions,
to identify tactics, to spot mistakes made by a player
in executing a tactic, etc.). The introduction of these
new aspects and the development of improved tech-
niques for animation blending could help to further
enhance the quality of the animations that can be pro-
duced, making (VR-based) visualization systems suit-
able also for sport applications different than training.
Moreover, a user study will be planned with coaches
and players of a basketball team to validate the effec-
tiveness of the VR training system.
ACKNOWLEDGEMENTS
This work has been partially supported by
VR@Polito initiative. The authors wish to thank
Francesco Raho, the technical manager of the
Auxilium CUS basketball Torino’s youth sector,
Italy.
REFERENCES
Cannav
`
o, A., Musto, M., Prattic
`
o, F. G., Raho, F., and Lam-
berti, F. (2018). A participative system for tactics anal-
ysis in sport training based on immersive virtual real-
ity. In 4th Workshop on Everyday Virtual Reality.
D’Orazio, T., Leo, M., Mazzeo, P. L., and Spagnolo, P.
(2010). Soccer player activity recognition by a multi-
variate features integration. In 7th IEEE Int. Conf. on
Advanced Video and Signal Based Surveillance.
Jiang, W. and Yin, Z. (2015). Human activity recognition
using wearable sensors by deep convolutional neural
networks. In 23rd ACM Int. Conf. on Multimedia,
pages 1307–1310.
McQueen, A., Wiens, J., and Guttag, J. (2014). Automati-
cally recognizing on-ball screens. In 2014 MIT Sloan
Sports Analytics Conference.
Richly, K., Rohloff, T., Bothe, M., and Schwarz, C.
(2016). Recognizing compound events in spatio-
temporal football data. In Int. Conf. on Internet of
Things and Big Data.
Teachabarikiti, K., Chalidabhongse, T. H., and Thammano,
A. (2010). Players tracking and ball detection for an
Automatic Recognition of Sport Events from Spatio-temporal Data: An Application for Virtual Reality-based Training in Basketball
315
automatic tennis video annotation. In 11th Int. Conf.
on Control Automation Robotics & Vision.
Varriale, L. and Tafuri, D. (2016). Technology for soccer
sport: The human side in the technical part. In Int.
Conf. on Exploring Services Science.
von der Gr
¨
un, T., Franke, N., Wolf, D., Witt, N., and Eid-
loth, A. (2011). A real-time tracking system for foot-
ball match and training analysis. In Microelectronic
Systems, pages 199–212. Springer.
Zheng, M. and Kudenko, D. (2012). Automated event
recognition for football commentary generation. In In-
terdisciplinary Advancements in Gaming, Simulations
and Virtual Environments: Emerging Trends, pages
300–315. IGI Global.
GRAPP 2019 - 14th International Conference on Computer Graphics Theory and Applications
316