proach to discover sequential patterns from spatio-
temporal data. PLSM is topic model based approach
to activity mining in videos similar to probabilistic la-
tent semantic analysis (PLSA) (Hofmann, 2001) and
Latent Dirichlet Allocation (LDA) (Blei et al., 2003).
However, PLSM addresses the disadvantages of the
bag-of-words assumption in PLSA and performs tem-
poral modeling at multiple levels: a) within motifs to
identify when words occur, i.e., at which relative time
with respect to the motif beginning; b) within video
segments (temporal documents), to identify when a
motif actually starts in the document (more details in
sec 3.1). There are several advantages of temporal
modeling in PLSM: a) PLSM helps in understanding
how an activity unfolds over time enabling a time sen-
sitive visualization of the discovered activity patterns;
and b) it enables to precisely identify when an acti-
vity begins in a video, which could be used for tasks
including event counting. Furthermore, PLSM relies
on elegant generative model approach combined with
well established inference techniques to uncover the
latent variables. This allows an intuitive semantic in-
terpretation of the observed and latent variables, ma-
king it an easy choice despite a few recent deep lear-
ning based approaches towards activity analysis pre-
sented in (Xu et al., 2015; Hasan et al., 2016).
Earlier PLSM implementations (Varadarajan
et al., 2010) make use of complex dimensionality
reduction steps using LDA (Blei et al., 2003),
PLSA (Hofmann, 2001) to bring down the vocabu-
lary size and thereby the running time of PLSM, but
this is also cumbersome and time consuming. For
instance, it takes nearly 4.5 hours to apply PLSA on a
90 minute long video. While this reduces the running
time of PLSM, it is still inefficient due to the time
spent in other pre-processing steps. Furthermore, the
additional pre-processing layers also introduce diffi-
culties in motif visualization and in higher level tasks
such as abnormal event detection. Using multiple
pre-processing steps makes it difficult to reason out
which low-level feature caused an anomalous event.
On the other had, applying PLSM directly on videos
is complex and time taking due to high dimensional
nature of videos combined with complex nested
loops in PLSM EM procedure. However, thanks to
the cheap availability of GPUs these days, it is easier
to realize PLSM directly on the low-level visual
features, while still achieving superior running time
performance.
In this paper, we propose two different GPU based
implementations of PLSM i) Dense GPU-PLSM, ii)
Sparse GPU-PLSM. We perform the entire evaluation
on GPU in an efficient manner minimizing the data
transfers and providing good performance with high
Visual Words
(location, motion)
Video
bg. sub
optical flow
TSLA patterns
(pLSM words)
plsa on temporal window
connected comp.
pLSM
(sequencial motifs)
pLSM
Figure 1: Flowchart for discovering sequential activity mo-
tifs from videos using PLSM, as presented in (Varadarajan
et al., 2010).
scalability. In order to ensure that our implementa-
tion is scalable, we ran exhaustive set of experiments
using different generations of GPUs with increasing
number of cores and memory, while varying the in-
put dimensionality. We achieve peak performance of
nearly 265X using dense approach and 366X using
sparse approach.
2 RELATED WORK
Motion and appearance features have been used for
video based activity analysis for several years. For
instance, several methods have been proposed to (Xi-
ang and Gong, 2008; Li et al., 2008; Wang et al.,
2009) to fetch semantic activity patterns using low le-
vel features.
Recently, topic models like pLSA (Hofmann,
2001) LDA (Blei et al., 2003) originally proposed for
text processing have been successfully used with sim-
ple image features to discover scene level activity pat-
terns and detect abnormal events (Varadarajan and
Odobez, 2009; Li et al., 2008; Wang et al., 2009).
These Bag of Words methods assume that words are
exchangeable and their co-occurrence is sufficient to
capture latent patterns in the data. Using topic models
like pLSA allows the use of different abnormality me-
asures based on the interpretation of the model (Vara-
darajan and Odobez, 2009; Emonet et al., 2011). Ge-
nerative topic models for large set of documents with
large vocabulary size tend to consume too much com-
putation time. There have been efforts to speed up
probabilistic models like PLSA. For instance, Hong
et.al (Hong et al., 2008) proposed a CPU-based pa-
rallel algorithm for PLSA and made 6x speedup on
8-core CPU machines. Yu et. al. applied GPU in
Gibbs sampling for motif finding and achieved 10x
speedup (Yu and Xu, 2009). Yan et. al. proposed a
parallel inference method for Latent Dirichlet Alloca-
tion (LDA) on GPU and achieved 20x speedup (Yan
et al., 2009). However, there has been no such effi-
cient implementations for topic models that are po-
pular for video based activity analysis. Therefore, in
this paper, we consider the PLSM model that can be
applied on video data and propose two different GPU
implementations.
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
410