location in both basketball (Miller et al. 2014; Franks
et al. 2015) and hockey (Becker, Woolford, and Dean
2020). Those papers estimated a spatial point process
estimate for each player, then treated these estimates
as images to determine how players utilized regions of
the basketball court or hockey rink. The results illu-
minated similarities and differences between players
that would have been very difficult to discern with the
naked eye. The authors all used this analysis to create
some measure for shot optimality. In what follows we
describe an image recognition technique and demon-
strate that applying it to estimates of a sequence of
spatial models is a useful visualization technique.
2 PRELIMINARIES
2.1 Non-negative Matrix Factorization
(NMF)
Non-negative Matrix Factorization (NMF) is a dimen-
sion reduction technique that decomposes a data ma-
trix into a matrix of basis functions and a matrix of
coefficients for those basis functions. The This was
popularized as an image recognition technique by Lee
and Seung (1999), and has been used in a wide variety
of applications since (see, e.g., Gillis 2014).
For our purposes, NMF has the attractive feature
of being purely additive. That is, the estimated basis
functions as well as their coefficients are both non-
negative, so a linear combination of the bases can only
add to the estimate. In PCA, the bases and coefficients
and bases are allowed to be negative, so one basis may
be allowed to counteract the effect of another. With
purely additive bases, the basis functions all represent
a single feature. This restriction makes the estimated
basis functions directly interpretable.
The NMF algorithm works by factorizing an n×m
matrix V with non-negative entries into an n × r ma-
trix W and an r×m matrix H, where r is the number of
basis functions and must be specified prior to estima-
tion. The columns of W represent the basis functions.
Each row in the matrix H represents the coefficients
for the corresponding basis in W . With these matri-
ces, we can approximate the ith column of V , which
we will denote V
·i
, as the matrix product W H
i·
.
Under certain constraints, NMF is particularly
useful for feature recognition in images (Gillis
(2014)). Assuming all images have the same pixel
dimensions and the colour of each pixel is a single,
non-negative number (e.g., grey scale), a matrix of
images can be constructed such that each column rep-
resents a single image. For instance, suppose we have
a collection of grey scale images of faces, all of which
have the same pixel dimensions and the faces are all
aligned in the same way (e.g. all eyes and mouths are
at the same location of the image). Each image can
be represented by a vector of non-negative numbers.
The NMF algorithm will estimate basis functions that
correspond to facial features. The coefficient matrix
will determine how much of each basis function is re-
quired to construct a face.
To make this process more clear, consider
the following example. Suppose we have a ma-
trix V with n = 7 rows and m = 2 columns,
resulting in 14 entries total. The first column
is [1, 2, 3, 4, 3, 2, 1]
T
and the second column is
[2, 3, 5, 7, 5, 3, 2]
T
. Clearly, both columns have a
similar pattern (or feature). If we want to characterize
this feature, we could use an NMF decompo-
sition, where we choose r = 1 since we know
there is a single feature. Doing so results in W =
[1.259, 2.099, 3.359, 4.618, 3.359, 2.099, 1.259]
T
,
which is a matrix with seven rows (n) and
one column (r). The coefficient matrix is
H = [0.886, 1.496]. From this, the approxima-
tion for the first column of V is W × 0.886 =
[1.116, 1.86, 2.977, 4.093, 2.977, 1.86, 1.116]
T
, which
is quite close to the original first column (but with
some approximation error). Note that W and H
together have 9 entries compared to the original 14
and that inspection of W tells us about both columns
of the original matrix simultaneously.
The choice of r, the number of bases, also known
as the rank, is non-trivial. Too many basis functions
and the algorithm will simply be modelling the noise.
Too few and the approximations will not be accurate.
In some contexts, prior knowledge will be sufficient
for choosing the number of bases. In other contexts
there are numerous heuristic approaches. Techniques
have been proposed by Brunet et al. (2004), Hutchins
et al. (2008), and Frigyesi and H
¨
oglund (2008). A
properly motivated choice of r is imperative whenever
NMF is being used for analysis. As a visualization
technique, however, the choice of rank is dependent
on the usefulness of the visualizations.
Estimation of NMF models has been shown to be
NP-hard (Vavasis 2007). There have been many al-
gorithms developed to estimate the matrices (Wang
and Zhang 2013), and these methods have been im-
plemented in multiple software packages. We use the
NMF package in the R Statistical Software language,
and details can be found in Gaujoux and Seoighe
(2010).
IVAPP 2021 - 12th International Conference on Information Visualization Theory and Applications
234