events by comparing the test trajectory to representa-
tive trajectories of known classes of events.
The remainder of the paper is organized as fol-
lows. In Section 2, we outline related work on
trajectory-based video content analysis. In Section 3,
we introduce the local differential features consid-
ered to represent 2D trajectories. We show that they
are invariant to 2D translation, 2D rotation and scale
transformation, and we also describe their computa-
tion. Section 4 presents our HMM-based framework
to model trajectories. It can be viewed as a (statistical)
quantization of the local features while accounting for
their temporal evolution. We also describe the HMM-
based similarity measure used to compare or to clas-
sify trajectories. Section 5 deals with the detection of
unexpected events. Section 6 introduces other classi-
fication methods which will intervene in the compara-
tive experimental evaluation of the proposed method.
In Section 7, we present the two data sets used to
test and compare the methods. The first one is com-
posed of typical classes of synthetic (noised) trajec-
tories (such as parabola or clothoid), and the second
one includes trajectories computed in sports videos.
Results are then reported and discussed. Concluding
remarks are given in Section 8.
2 RELATED WORK
Trajectory analysis can help recognizing events, ac-
tions, or interactions between people and objects.
First methods considered point coordinates and lo-
cal orientations on image trajectories as input features
(Bashir et al., 2007; Buzan et al., 2004; Chan et al.,
2004; Porikli, 2004). Using these features leads to
express strict spatial similarity between trajectories.
Other methods use velocities as features to compare
2D trajectories (Porikli, 2004; Hu et al., 2007; Wang
et al., 2006), but visual velocity still depends on the
distance of the viewed action to the camera.
Different methods have been developed to com-
pare and cluster trajectories in order to analyze the
content of video sequences. Buzan et al. (Buzan et al.,
2004) resorted to the Longest Common Subsequence
(LCSS) distance (Vlachos et al., 2002), to classify tra-
jectories computed in an image sequence acquired by
a single stationary camera for video surveillance. It is
based on a hierarchical unsupervised clustering of tra-
jectories where trajectory features are vectors of 2D
coordinates of the trajectory points. Wang et al. intro-
duced a novel similarity measure based on a modified
Hausdorff distance and a comparison confidence mea-
sure. They compare the distributions of the spatial co-
ordinates of the trajectory points, and also use other
attributes, such as velocity and object size (Wang et
al., 2006) Bashir et al. presented a trajectory-based
real-time indexing method (Bashir et al., 2007), us-
ing PCA and spectral clustering. A system that learns
patterns of activity from trajectories, and hierarchi-
cally classifies sequences using a codebook was de-
veloped by Stauffer and Grimson (Stauffer and Grim-
son). Li et al. considered statistical distributions of
trajectory orientations exploited in a clustering algo-
rithm (Li et al., 2006). Recent work has explored
modeling frameworks such as DPN (Dynamic Proba-
bilistic Network) and HMM (Hidden Markov Model)
to express the temporal information (causality) em-
bedded in video trajectories and the semantic mean-
ing that they convey. Hongeng et al. (Hu et al., 2007)
described a complex event recognition method based
on the definition of scenarios and on the use of Semi-
Markov Chain (SMC). Chan et al. (Chan et al., 2006)
coped with fragmented tracks that occur when using
mean-shift tracking. They attempted to jointly solve
the problem of linking these “tracklets” and recogniz-
ing complex events using DBN (Dynamic Bayes Net).
They also proposed a method for detecting rare events
by representing motions and space-time relations be-
tween objects using HMMs (Chan et al., 2004). A
recognition method for group activities was defined
by Gong and Xiang (Gong and Xiang, 2003) relying
on DPN to model and detect actions involving multi-
ple objects. DPN are specially used to model the tem-
poral relationships among different temporal events
in the scene. Porikli defined distances to handle tra-
jectories, especially a HMM-based distance (Porikli,
2004). The methods based on HMMs, SMCs or DPNs
developed so far are unable to treat short trajectories
(see subsection 4.1). Let us also stress that all the
aforementioned methods exploit features invariant to
translation or scale transformation only.
The approach we have designed is different from
those proposed so far in several points. First, we in-
troduce local differential trajectory features which are
able to jointly capture information on the trajectory
shape and on the object speed. Besides, they are in-
variant to translation, rotation and scale transforma-
tions. We have also developed a procedure to com-
pute them which is efficient and robust to noise. Sec-
ond, temporal evolution of these features over the tra-
jectory curve is explicitly accounted for by consider-
ing an original and effective HMM scheme. Indeed,
the HMMs states are given by properly quantizing the
real feature values. Our HMM method is also able to
process trajectories of any sizes (especially small tra-
jectories). Moreover, we have adopted a HMM dis-
tance which can be exploited both for clustering and
recognizing dynamic video contents and for detecting
VIDEO EVENT CLASSIFICATION AND DETECTION USING 2D TRAJECTORIES
159