Automatic Recognition of Sport Events from Spatio-temporal Data: An

Application for Virtual Reality-based Training in Basketball

Alberto Cannav

o, Davide Calandra, Gianpaolo Basilic

o and Fabrizio Lamberti

Politecnico di Torino, Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi 24, 10129 Torino, Italy

Keywords:

Machine Learning, Event Recognition, Virtual Reality, Sport Training.

Abstract:

Data analysis in the ﬁeld of sport is growing rapidly due to the availability of datasets containing spatio-

temporal positional data of the players and other sport equipment collected during the game. This paper inves-

tigates the use of machine learning for the automatic recognition of small-scale sport events in a basketball-

related dataset. The results of the method discussed in this paper have been exploited to extend the func-

tionality of an existing Virtual Reality (VR)-based tool supporting training in basketball. The tool allows the

coaches to draw game tactics on a touchscreen, which can be then visualized and studies in an immersive VR

environment by multiple players. Events recognized by the proposed system can be used to let the tool manage

also previous matches, which can be automatically recreated by activating different animations for the virtual

players and the ball based on the particular game situation, thus increasing the realism of the simulation.

1 INTRODUCTION

In previous research, thanks also to recent advance-

ments in tracking technology, the use of spatio-

temporal data collected during matches or training

sessions has grown signiﬁcantly in many competi-

tive sports (Richly et al., 2016). A number of so-

lutions based on different sensing techniques have

been presented in the literature, which allow to record

the movement of the players and other equipment (a

tennis ball, a baseball bat, etc.) at high sampling

rates (von der Gr

un et al., 2011; Jiang and Yin, 2015;

D’Orazio et al., 2010). The analysis of tracking data

concerning the players, the ball, etc. can provide

coaches with helpful insights about the game, which

can be used for the automatic recognition of the op-

posing team’s strategy (Varriale and Tafuri, 2016), the

generation of commentaries for matches (Zheng and

Kudenko, 2012), etc.

Based on these observations, this paper investi-

gates the use of machine learning for the automatic

recognition of players’ activity – or actions – from

spatio-temporal data for VR-based basketball appli-

cations (though information extracted could be ex-

ploited in other contexts, like those above). The paper

builds on a previous work targeted to soccer (Richly

et al., 2016). With respect to (Richly et al., 2016), in

this paper new features are extracted, which permit a)

to consider aspects that were not taken into consid-

eration in that work, b) to integrate data not present

in the reference dataset (like, for instance, the vertical

position of the ball), and c) to account for different

characteristics of basketball w.r.t. to soccer, with the

ﬁnal goal of improving recognition accuracy.

The recognition method proposed in this work has

been integrated in an immersive VR tool to allow the

visualization of animated reconstructions of previous

basketball matches for tactic analysis and training.

Speciﬁcally, events identiﬁed through machine learn-

ing are provided in input to the VR system, which

uses them to activate proper player’s animations.

2 BACKGROUND

A few methods have been experimented already for

the automatic recognition of sport events. For in-

stance, in (Zheng and Kudenko, 2012), inductive

learning techniques are used for the automatic gen-

eration of commentaries for football matches within a

management simulation game named Championship

Manager. Three classiﬁcation techniques (Decision

Tree, KNN, and Na

ıve Bayes) are exploited to ﬁnd

the mapping between game states and commentaries.

In (Teachabarikiti et al., 2010), an algorithm for track-

ing the players and the ball in tennis is proposed to

enable automatic footage annotation. By analyzing

the motion patterns of the players and the ball, the

310

Cannavò, A., Calandra, D., Basilicò, G. and Lamberti, F.

Automatic Recognition of Sport Events from Spatio-temporal Data: An Application for Virtual Reality-based Training in Basketball.

DOI: 10.5220/0007524203100316

In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 310-316

ISBN: 978-989-758-354-4

algorithm is able to classify a player’s action into ei-

ther backhand and forehand stroke with high preci-

sion and recall rates. The authors of (McQueen et al.,

2014) exploit players’ tracking data to recognize of-

fensive strategies in basketball through a linear SVM

classiﬁer and a rule-based algorithm. In (Richly et al.,

2016), three machine learning approaches, namely

SVM, KNN, and RF, are experimented for the pur-

pose of classifying events in a soccer match, like

passes or receptions. The dataset used therein refers

to matches of the German Bundesliga, and contains

the timestamp, the two-dimensional coordinates of

the ball, a list of game events (e.g., fouls, substitu-

tions, offsides, etc.) and player involved. Event clas-

siﬁcation is accomplished by working with several

features computed by considering the raw position

data for the ball. To train the classiﬁers, the dataset

is annotated by manually identifying the events of in-

terest in the footage of three matches.

By building upon the works found in the literature,

this paper presents the design and evaluation of an im-

proved technique for the automatic classiﬁcation of

sport events from spatio-temporal data. In particular,

given the promising results reported in (Richly et al.,

2016), this paper moves by considering the method-

ology developed in that work as a reference, and ex-

tends it to target a different sport, i.e., basketball. Af-

ter having experimented the same algorithms and the

same set of features used in the reference work on a

dataset containing position data from National Bas-

ketball Association (NBA) matches, this paper addi-

tionally proposes a new set of features, which proved

to signiﬁcantly boost the performance of basketball

event recognition and classiﬁcation. Finally, the pa-

per explores how automatically information extracted

can be used to support the job of both coaches and

players by enhancing the functionality of an existing

VR-based tool for tactics analysis.

3 METHODOLOGY

This section describes the dataset as well as the fea-

tures that have been developed/used in this paper.

3.1 Dataset

The original dataset refers to the 2015–16 season

of the NBA (https://github.com/sealneaward/nba-

movement-data/tree/master/data), and contains

spatio-temporal data collected at 20 Hz. Data are

structured in matches and actions (for a given match).

For each action, the position of the ball and of the

players is recorded. The dataset, stored as a .csv ﬁle,

consists of the following values:

• team

: identiﬁer of the team to which player be-

longs to, −1 if the tracked object is the ball;

• player

: identiﬁer of the tracked object, −1 if the

tracked object is the ball;

• x

loc

, y

loc

, z

loc

: 3D spatial position of the tracked

object (the z coordinate is provided only for the

ball);

• game

clock

: remaining time of the match;

• shot

clock

: remaining time of the 24 seconds

granted to a team to ﬁnalize an offensive action;

• quarter: quarter of the game;

• game

: identiﬁer of the match;

• event

: identiﬁer of the action in the game.

The coordinate system used for x

loc

and y

loc

is nor-

malized in the 0 − 100 and 0 − 50 range, respectively

for the x and y axis; the bottom-left corner is rep-

resented by point with (0, 0) coordinates. To create

the annotated dataset, sports events were manually

identiﬁed in the footage of the San Antonio Spurs vs

Minnesota Timberwolves match that was played on

December 23rd, 2015. Like in the reference work,

passes and receptions were considered. Other events,

like shots, dribbles, etc. were marked with the label

“other”. Part of the events belonging to the latter cat-

egory were randomly deleted, in order to balance the

frequency of the three events. At the end of the pro-

cess, the annotated dataset included 180 entries per

event category.

3.2 Features

According to the reference work, a sport event can

be recognized in a dataset containing spatio-temporal

data by analyzing the values of several features that

characterize it. Features have been extracted by run-

ning a script on the above data. For each time t in

the dataset, a vector is obtained containing a value for

each feature. Features used in this work can be cate-

gorized in ﬁve groups. The ﬁrst group contains the

(”two-dimensional”) features directly derived from

the reference work. The remaining groups host the

new features that have been introduced in this paper.

In particular, the features in the second group are cal-

culated by considering only the movement of the ball

along the z axis; hence, they are referred to as “ver-

tical”. Features in the third group are those in the

second group, but adapted to a “three-dimensional”

space. For features in the fourth group, the posi-

tion of the players is also considered; thus, they are

Automatic Recognition of Sport Events from Spatio-temporal Data: An Application for Virtual Reality-based Training in Basketball

311

called “players”’ features. Lastly, features in the ﬁfth

group are computed by aggregating data (and comput-

ing mean and variance values for them within given

time windows): hence, they are referred to as “ag-

gregated” features. The position of a tracked object

o at time t will be represented as p(o,t). Similarly,

(o,t) and p

(o,t) will be used to refer to the posi-

tion along the x and y axes. The distance between two

consecutive positions will be deﬁned as:

d(o, t

) = p(o,t

) − p(o,t

) (1)

where t

and t

are different time samples and t

< t

3.3 Two-dimensional Features

The features in this group consider only the position

of the ball in two dimensions (hence, a subscript 2D

will be used). In the reference work, the z dimension

was not considered because the soccer dataset con-

tained only two coordinates for the ball.

3.3.1 Velocity

The two-dimension velocity, introduced since it is an

indicator of the ball’s momentum, is calculated by di-

viding the length of the direction vector d(o, t

) by the

time interval between two adjacent samples:

Vel

(o,t

) =

d(o, t

)

−t

(2)

3.3.2 Acceleration

The acceleration, like the velocity, was introduced as

an indicator of the ball’s momentum, and it is com-

puted as:

Acc

(o,t

) =

Vel

(o,t

) −Vel

(o,t

)

−t

(3)

3.3.3 Acceleration Peaks

Given the sampling rate of the data, the same acceler-

ation could be captured in consecutive time samples.

Therefore, the authors of the reference work intro-

duced two features referred to as acceleration peaks,

that combine consecutive acceleration values by se-

lecting the highest and the lowest ones among adja-

cent values, respectively. The computation of actual

maximum and minimum peaks can be split in two

steps. In the ﬁrst step, the sum of two consecutive

accelerations is computed ignoring negative and pos-

itive values by setting them to 0 for the computation

of the former and the latter, respectively:

2D max

(o,t

) =

∑

x∈t

max(0, Acc

(o, x)) (4)

2D min

(o,t

) =

∑

x∈t

min(0, Acc

(o, x)) (5)

In the second step, in order to avoid the detec-

tion of a peak in two consecutive samples, the ac-

tual (real) acceleration peaks – AP

2D max

real

(o,t

) and

2D min

real

(o,t

) – are computed by setting them to

2D max

(o,t

) and AP

2D min

(o,t

) only if the value of

the feature at time t

is higher than values at t

and t

otherwise they are set to 0.

3.3.4 Direction Change

This feature considers the variations in the trajectory

of the ball during the game, taking into account the

angle between two consecutive direction vectors. It

was added to improve the recognition of event like

passes or shoots characterized by an high value of this

metric. The direction change DC

(o,t

) of object o

at time t

is obtained by applying the arccos() func-

tion as follows:

(o,t

) = arccos



d(o, t

) ∗ d(o,t

)

d(o, t

)

∗

d(o, t

)



(6)

3.3.5 Distance to Target

During a match, the ball should be thrown into one

of the baskets (nets, in the reference work) in order

to earn some points. Therefore, it is reasonable to

assume that the ball moves towards one these targets.

For this reason this metric is used to recognize passes

from other shot. The distance of object o at time t

from the target is calculated as:

(o,t) =

p(o,t) − b(o, t)

(7)

where b(o,t) represents the target position assigned

depending on the direction of the ball w.r.t. to the

x axis. This position could be either the point T

with

coordinates (0, 25) if the ball moves towards the left

side of the court, or T

with coordinates (100, 25) if

the ball moves towards the right side, as shown in Fig-

ure 1a. The ﬁgure shows also different distances (rep-

resented by solid lines) computed depending on the

direction of the ball (represented by an arrow at each

data point). For object at point P

, which is character-

ized by a horizontal velocity equal to 0, target cannot

be determined; hence, the feature value is set to ∞.

3.3.6 Cross on Target Line

This feature is deﬁned by considering the distance be-

tween the target and the position in which the ball

would cross the end line should the current trajec-

tory be maintained up to the line. Figure 1b shows

a data point P

and its direction vector d

. Should the

GRAPP 2019 - 14th International Conference on Computer Graphics Theory and Applications

312

(a) (b) (c)

Figure 1: Calculation of a) Distance to Target, b) Cross on Target Line, and c) Cross on Target Line features.

ball continue to move without any direction change

(dashed line), it would reach the end line in C

. The

distance between C

and the target position T

is the

actual value of this feature. Position of C

can be cal-

culated as:



(o,t)

ctl



= p(o, t) + s ∗ d(o,t) (8)

where s is a factor that, if it is multiplied for the di-

rection vector of the object o at time t and added to

the position of the object o at time t, allows to reach

the end line. From all of the above it is possible to

compute CT L

(o,t) as:

CT L

(o,t) = p

(o,t) + d

(o,t)

(o,t) − p

(o,t)

(9)

where the subscript identiﬁes the axis considered.

3.4 Vertical Features

Features in this and in the following groups are

those deﬁned in the current work. In particular, this

group contains some of the features in the previ-

ous group recomputed considering only the z coor-

dinate: Vel

(o,t), Acc

(o,t), AP

V max

real

(o,t), and

min

real

(o,t). The remaining features cannot be

recalculated, since the single dimension considered

does not allow to identify the direction of the ball.

3.5 Three-dimensional Features

In this group, features Vel

(o,t), Acc

(o,t),

3D max

real

(o,t), AP

3D min

real

(o,t), CTV

(o,t) are

calculated by considering the three coordinates.

Hence, the subscript 3D is used. For CT L

(o,t), a

parabolic trajectory is assumed (as shown in Figure

1c), and the feature is computed as:

CT L

(o,t) = −

line

+Vel

(o,t)t

line

+ p

(o,t)

(10)

where g is the gravity acceleration, Vel

(o,t) is the

component along the z axis of V el

(o,t), and t

line

the time that is required for the ball to reach the end

line; t

line

is deﬁned as:

line

(o,t) − p

(o,t)

Vel

(o,t)

(11)

where Vel

(o,t) is the component of Vel

(o,t) along

the x axis.

3.6 Players’ Features

This group contains two features that take into ac-

count the relationship between the position of the ball

and the players. These features have been introduced

because the way ball position changes in close prox-

imity to a player could be a valid descriptor especially

for some basketball events.

3.6.1 Ball-player Distance

This feature computes the distance between the ball

and the closest player at time t. It is deﬁned as:

BPD(o,t) =



p(o,t) − p

player

(o,t)



(12)

where p(o,t) is the two-dimension position of the ball

at time t and p

player

(o,t) is the two-dimension posi-

tion of the closest player.

3.6.2 Team of Closer Player

This feature represents the team of the player closest

to the ball at time t.

3.7 Aggregated Features

This group includes a set of features computed by

aggregating consecutive samples. The aggregation

considers the average and the variance values calcu-

lated in two time windows, named before-window and

after-window. In this way, the aggregation allows to

take into account the features’ dynamics. The size

of the two windows have been experimentally deﬁned

and includes 20 samples (i.e., one second) before and

after the current time. The features considered for

the aggregation are: p

(o,t), V el

(o,t), Acc

(o,t),

(o,t), BPD(o,t).

Automatic Recognition of Sport Events from Spatio-temporal Data: An Application for Virtual Reality-based Training in Basketball

313

Table 1: Recognition of basketball events using KNN.

Pass

Reception Other

Precision 0.69 0.68 0.93

Recall 0.65 0.67 1.00

F-measure 0.67 0.67 0.96

Accuracy 76.67%

4 PERFORMANCE EVALUATION

Features described in the previous section have been

used in combination with the three machine learning

algorithms considered in (Richly et al., 2016). For ev-

ery time t, a vector was created containing the values

of all corresponding features. Each vector represents

an event that occurs during the game and it is char-

acterized by particular values of the deﬁned features.

For example, passes are characterized by a signiﬁcant

acceleration peak and presents a high value for the

direction change feature, whereas in the case of re-

ceptions, the ball shows a strong negative acceleration

and the distance with the closest player, probably the

ball’s owner, remains almost the same. The data sci-

ence software platform named Rapidminer was used

to run the algorithms. As said, the paper focused

on the recognition of three events: pass, reception

and other ball events though in basketball, rather than

in soccer. In order to assess the quality of results

achieved, accuracy, precision, recall and F-measure

were calculated. To cope with the reduced size of the

dataset, cross validation with 20 partitions and linear

sampling were used. Evaluation was carried out by

considering different combinations of the features in

the ﬁve groups. Initially, only the ﬁrst group was con-

sidered, to qualitatively compare results obtained on

the new dataset with those in (Richly et al., 2016).

Afterwards, the vertical and the players’ features were

added. The next experiment consisted in replacing the

two-dimensional features with the three-dimensional

ones. Lastly, the aggregated features were integrated.

At every change in the set of features considered, the

overall accuracy improved: from the initial value of

33.68% obtained when using only the ﬁrst group of

features (and comparable to that obtained in the ref-

erence work for soccer events), it reached a value of

76.67% when using the last set of features. Table 1 re-

ports recognition results for each event obtained with

KNN, which achieved the best performance.

5 APPLICATION SCENARIO

The method illustrated in the previous sections has

been used to extend the functionalities of an existing

tool for VR-based training in basketball. The tool,

named VR Playbook (Cannav

o et al., 2018), was de-

signed to let coaches and players create tactics and

visualize previous basketball games in an immersive

environment. The VR Playbook tool offers coaches

several graphics means for drawing a tactics in 2D

with a tablet device by moving players and deﬁning

actions for them (passes, stops, throws, etc.) on a

timeline (Figure 2a). The tool then creates the cor-

responding 3D animation that can be visualized at the

same time by multiple players wearing VR headsets

(Figures 2b and 2c). To this purpose, the timing and

type of manually deﬁned events are used to activate

realistic players’ animations which were previously

recorded using motion capture. Tactics could also

be saved (exported) and reloaded (imported) for later

use. In the native implementation of the tool, in order

to visualize the actions of a previous match coaches

had to manually add players’ events to the timeline,

e.g., based on available game footage or by resort-

ing to their memory. Players’ trajectories could be ei-

ther deﬁned by drawing arrows on the touchscreen be-

tween the starting and ending points of a given action,

or by adding many intermediate points to the timeline

to avoid straight paths. Alternatively, they could load

a dataset like the one used in this paper and recreate

actual displacements. However, without annotations

concerning events’ timing and type, animations cre-

ated would be poorly realistic, since positional data

could only be used to activate a run cycle animation

for players. In this paper, the devised methodology

has been used to extract players’ events from a dataset

containing only spatio-temporal positional data and to

store them in a format ready to be parsed and imported

in the VR Playbook tool. In this way, the quality (re-

alism) of the simulation can be improved, since the

exact time a given animation shall begin/end is auto-

matically deﬁned, and a more correct relationship be-

tween the players’ hands and the ball can be identiﬁed

(and used for blending the run and pass animations).

The integration of the devised methodology (the mod-

ule named Event Recognizer) in the architecture of

the VR playbook tool is depicted in Figure 3. It can

be easily observed that integration is transparent to the

users, since automatically extracted events are treated

as manually deﬁned ones, and coaches are allowed to

further modify them using the tablet-based interface.

An example of the quality of animations that could

be created using only dataset’s raw data is given in

Figure 4a. Improvements that could be obtained us-

ing the proposed automatic event recognition are il-

lustrated in Figure 4b. A video is also available for

download at https://goo.gl/ucDzH7.

GRAPP 2019 - 14th International Conference on Computer Graphics Theory and Applications

314

(a) (b) (c)

Figure 2: VR Playbook tool: a) tablet interface for drawing tactics, and b)-c) animations displayed on VR headsets.

NETWORK

PLAYERS’ APPLICATION

COACH APPLICATION

DATA S ET

spatio-temporal

data

feature vectors

players’ & ball’s positions,

recognized events

FILE

PARSER

EVENT

RECOGNIZER

(NBA)

CSV

Figure 3: Integration of the devised event recognition

methodology into the VR Playbook tool.

(a)

(b)

Figure 4: Frames of a 3D animation created using a) only

raw positional data, and b) automatically recognized events.

6 CONCLUSIONS

Results reported in this paper conﬁrm the suitabil-

ity of machine learning techniques for the identiﬁ-

cation of small-scale sport events in spatio-temporal

data collected during basketball games. In particu-

lar, features leading to good performances in the con-

sidered conditions are identiﬁed. Besides quantita-

tive measurements concerning the accuracy of event

recognition, preliminary evidences on the effective-

ness of the devised methodology have been also col-

lected through qualitative observations on the realism

of animations that can be generated by integrating

automatic event recognition in a computer animation

tool. Future work will be devoted to the exploration

of new features and classiﬁcation methods (e.g., based

on deep learning) as well as the recognition of other

small-scale basketball events (like throws, screens,

cuts, etc.) and of large-scale phenomena occurring

during the game (e.g., to predict dangerous actions,

to identify tactics, to spot mistakes made by a player

in executing a tactic, etc.). The introduction of these

new aspects and the development of improved tech-

niques for animation blending could help to further

enhance the quality of the animations that can be pro-

duced, making (VR-based) visualization systems suit-

able also for sport applications different than training.

Moreover, a user study will be planned with coaches

and players of a basketball team to validate the effec-

tiveness of the VR training system.

ACKNOWLEDGEMENTS

This work has been partially supported by

VR@Polito initiative. The authors wish to thank

Francesco Raho, the technical manager of the

Auxilium CUS basketball Torino’s youth sector,

Italy.

REFERENCES

Cannav

o, A., Musto, M., Prattic

o, F. G., Raho, F., and Lam-

berti, F. (2018). A participative system for tactics anal-

ysis in sport training based on immersive virtual real-

ity. In 4th Workshop on Everyday Virtual Reality.

D’Orazio, T., Leo, M., Mazzeo, P. L., and Spagnolo, P.

(2010). Soccer player activity recognition by a multi-

variate features integration. In 7th IEEE Int. Conf. on

Advanced Video and Signal Based Surveillance.

Jiang, W. and Yin, Z. (2015). Human activity recognition

using wearable sensors by deep convolutional neural

networks. In 23rd ACM Int. Conf. on Multimedia,

pages 1307–1310.

McQueen, A., Wiens, J., and Guttag, J. (2014). Automati-

cally recognizing on-ball screens. In 2014 MIT Sloan

Sports Analytics Conference.

Richly, K., Rohloff, T., Bothe, M., and Schwarz, C.

(2016). Recognizing compound events in spatio-

temporal football data. In Int. Conf. on Internet of

Things and Big Data.

Teachabarikiti, K., Chalidabhongse, T. H., and Thammano,

A. (2010). Players tracking and ball detection for an

Automatic Recognition of Sport Events from Spatio-temporal Data: An Application for Virtual Reality-based Training in Basketball

315

automatic tennis video annotation. In 11th Int. Conf.

on Control Automation Robotics & Vision.

Varriale, L. and Tafuri, D. (2016). Technology for soccer

sport: The human side in the technical part. In Int.

Conf. on Exploring Services Science.

von der Gr

un, T., Franke, N., Wolf, D., Witt, N., and Eid-

loth, A. (2011). A real-time tracking system for foot-

ball match and training analysis. In Microelectronic

Systems, pages 199–212. Springer.

Zheng, M. and Kudenko, D. (2012). Automated event

recognition for football commentary generation. In In-

terdisciplinary Advancements in Gaming, Simulations

and Virtual Environments: Emerging Trends, pages

300–315. IGI Global.

GRAPP 2019 - 14th International Conference on Computer Graphics Theory and Applications

316