Interest Area Localization using Trajectory Analysis in Surveillance

Scenes

D. P. Dogra

, A. Ahmed

and H. Bhaskar

School of Electrical Sciences, Indian Institute of Technology, Bhubaneswar, India

Department of Computer Sc. & Engineering, Haldia Institute of Technology, Haldia, India

Department of Electrical and Computer Engineering, Khalifa University of Science Technology and Research,

Abu Dhabi, U.A.E.

Keywords:

Interest Area Detection, Trajectory Analysis, Surveillance, Clustering.

Abstract:

In this paper, a method for detecting and localizing interest areas in a surveillance scene by analyzing the

motion trajectories of multiple interacting targets, is proposed. Our method is based on a theoretical model

representing the importance distribution of different areas (represented as a rectangular blocks) present in a

surveillance scene. The importance of each block is modeled as a function of the total time spent by mul-

tiple targets and their relative velocity whilst passing through the blocks. Extensive experimentation and

statistical validation with empirical data has shown that the proposed method follows the process of the the-

oretical model. The accuracy of our method in localizing interest areas has been veriﬁed and its superiority

demonstrated against baseline methods using the publicly available: CAVIAR, ViSOR datasets and a scenario-

speciﬁc in-house surveillance dataset.

1 INTRODUCTION

Scene understanding in video(s) for surveillance ap-

plications has received signiﬁcant attention in recent

years. Considerable amount of research is carried on

improving models for the detection and tracking of

moving targets in a scene and by analyzing the mo-

tion characteristics of such moving targets, perform

behavioral analysis (Brun and Vento, 2014), anomaly

detection (Suzuki and Fujino, 2007; Xu and Chen,

2013; Zhou and Huang, 2007; Piciarelli and Foresti,

2008), trajectory clustering (Piciarelli and Foresti,

2006), semantics analysis (Wang and Grimson, 2006)

and even scene classiﬁcation (Morris and Trivedi,

2008). Whilst such methods have been largely suc-

cessful in human and insect behavior understanding;

not much research efforts has been spent in modeling

the underlying semantic relationships between mov-

ing targets and static objects for scene understanding.

Moving target detection is widely supported by the

development of the state-of-the-art algorithms (Dinh

and Medioni, 2011). However, on the other hand,

the static objects in the scene have either been ne-

glected from video analysis and usually left-out as

a part of the background or detected using object-

speciﬁc global models. It is important to acknowl-

edge that the movements of targets in any uncon-

strained environment is often governed by the pres-

ence of static objects in the scene. For example, in

a common room environment, the presence of a cof-

fee vending machine and its location inﬂuence a spe-

ciﬁc motion pattern in that scene. By localizing such

interest areas where static objects are located and ac-

cessed by moving targets, it is often possible to make

more informed decisions on the behavior and activi-

ties of moving targets in that scene. The purpose of

localizing interest areas may intuitively be equivalent

to motion invariant generic object detection, which

is ideal; however is often impractical as it would re-

quire models for countless number of objects that can

be present in any surveillance environment. In this

paper, we propose a technique for the detection and

localization of those interest areas in the scene con-

taining static objects that inﬂuence the motion char-

acteristics of other moving targets during autonomous

surveillance. The structure of the paper is as follows.

We begin by outlining some of the main contributions

and distinguishing aspects of the paper in compari-

son to baseline strategies in Section 2. A theoretical

model of target behavior with an appropriate research

hypothesis and a novel trajectory analysis technique

with relevant modeling and parametrization are pre-

478

Dogra D., Ahmed A. and Bhaskar H..

Interest Area Localization using Trajectory Analysis in Surveillance Scenes.

DOI: 10.5220/0005334704780485

In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 478-485

ISBN: 978-989-758-090-1

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

sented in Section 3. We then conduct experiments on

the chosen datasets (in Section 4) to investigate the

effect of key system parameters and demonstrate the

superiority of the proposed strategy in comparison to

other baseline techniques. Finally, the research hy-

pothesis is veriﬁed in Section 4 before we conclude

in Section 5.

2 CONTRIBUTIONS &

DISTINGUISHING ASPECTS

The fundamental purpose of the proposed research

is to detect interest areas that are potential loca-

tions of those static objects which inﬂuence mov-

ing targets and localize them using the trajectories

of multiple moving targets, for natural scene under-

standing. In other words, while the areas of scene

labeling (Farabet and LeCun, 2013), object detec-

tion (Sharma and Nevatia, 2013) and object track-

ing (Dinh and Medioni, 2011) have been the focus

of scene understanding research in the recent past, it

is proposed that signiﬁcant impact can be made by

solving these problems jointly. The localization of

such interest areas can facilitate automatic scene re-

construction and help to make informed decisions on

the behavior of moving targets. A key novelty of our

method is the integration of target behavioral seman-

tics into a theoretical distribution of area importance.

Such an integration allows building a realistic statis-

tical model of target motion and interaction in an un-

restricted environment. In addition, the importance

estimation of areas in a surveillance scene is based

on a novel ”entry-exit” motion model of multiple tar-

gets through the measurement of time spent between

entry-to-exit and the relative velocity changes. Fur-

thermore, we also provide detailed analysis and com-

parison of our results on different datasets that permit

bench-marking of similar strategies, in the future.

Given the breadth of techniques for scene un-

derstanding; we highlight some of the critical dis-

tinguishing aspects of our proposed method against

other available state-of-the-art strategies, particularly

from the trajectory analysis and salient object detec-

tion perspectives. In the light of trajectory analysis

techniques: a) our method does not explicitly model

motion ﬂow as in (Morris and Trivedi, 2008), b) our

method is invariant to partial trajectories and the sam-

ple size that constitutes the motion trajectory of the

targets in a scene in comparison to (Wang and Grim-

son, 2006), c) our method does not build variances by

measuring similarities or distances between different

motion trajectories of multiple targets like the method

of (Zhou and Huang, 2007), instead builds essential

Figure 1: (a) A typical representation of the surveillance en-

vironment with one object of interest. (b) Rectangular sub-

division of the surveillance environment assuming the ob-

ject of interest is located at the center. (c-f) Typical motion

pattern followed by a target while approaching the static ob-

ject of interest. (g-h) Estimation of importance of a block

as a function of the motion dynamics.

statistics by considering all the motion trajectories of

multiple targets, and d) the proposed method does

not adopt any highly computational intensive multi-

scale trajectory analysis techniques such as (Yang and

Shah, 2009). In another dimension, from the salient

object detection point-of-view: a) our method does

not perform generic global salient object detection

such as in (Rahtu and Heikkila, 2010), instead at-

tempts to localize salient areas in the scene that in-

ﬂuences the motion dynamics of targets, b) target in-

teraction in our method is modeled using trajectory

analysis and not through the typical connected com-

ponent analysis (Pan and Pankanti, 2011) and (c) our

method is also deterministic and independent of learn-

ing or training for interest area localization with min-

imal parametrization as against (Saleemi and Shah,

2009; Xu and Chen, 2013; Yang and Shah, 2009).

3 PROPOSED METHODOLOGY

In the proposed methodology, we assume that

moving targets are only restricted by the presence

of boundary walls and other static objects in the

scene, not otherwise. Some of these static objects

present inside the surveillance scene are of much

interest to moving targets and it is assumed that, any

normal target will be more attracted towards these

speciﬁc static objects than the other areas of the scene.

Theoretical Model and a Hypothesis

For simplicity, let us consider a single static object

of interest to be located inside a chosen surveillance

scene. Assuming that a target can reach the static

InterestAreaLocalizationusingTrajectoryAnalysisinSurveillanceScenes

479

object of interest from any direction, it is obvious

to expect its movement to follow the shortest path.

This typical scenario is illustrated in Fig. 1(a). The

intermediate space is unconstrained and the target can

move freely while heading towards the static object

of interest. A block based geometry, as depicted in

Fig. 1(b), is used to illustrate the different motion

conﬁgurations possible within such an environment.

That is, let us consider that the environment is

partitioned into a rectangular grid of a predeﬁned

size. Therefore, as a target moves towards the static

object of interest, the path is through the intermediate

blocks (C

) considering its initial position

is in one of the outer layer blocks, e.g. B

, ..., B

There can be different ways a target can reach to a

particular inner block from a set of outer blocks. For

example, if the initial location of the target is in (B

and if the shortest path assumption holds, then the

target must go through (C

) as depicted in Fig. 1(c).

Other possibilities are described in the successive

images. Under these conditions, a theoretical model

can be used to describe the movement. Suppose, the

probability of a target being in one of the outermost

blocks is represented as P. It can easily be veriﬁed

from the images shown in Figs. 1(c-f) that, a target

can reach to one inner block from an outer block

in three possible ways. Therefore, probability of

reaching any of the inner block becomes three times

the probability of an outer block. Since the static

object of interest is at the center and surrounded by

eight equip-probable blocks, the possible ways of

reaching the target is eight times higher than of its

immediate neighboring blocks. Figs. 1(g-h) illustrate

the method for a three layer geometry. However, this

can easily be extended for any desired number of

layers.

Figure 2: (a) Division of a space assuming the object of in-

terest is located at the center and corresponding frequency

of a block being used while a target approaches the object

of interest. (b) pdf of the importance of a block in terms

of number of times a block is accessed while a target ap-

proaches the object of interest located at center.

The Hypothesis. Suppose, x is a random variable that

denotes the probability of a target visiting a particular

block while approaching an object of interest. If we

plot the different number of ways an inner block

can be reached from an outer block, the normalized

probability computed shall represent a distribution of

the importance of the blocks. Now, we hypothesize

that, given a scenario where sufﬁcient number of

targets approach towards an object of interest, if the

surveillance space is divided into rectangular blocks

as shown in Fig. 2(a), then the target motion model

will usually follow the theoretical distribution shown

in Fig. 2(b).

Proposed Trajectory Analysis. The proposed

trajectory analysis method has been designed with

the knowledge of the theoretical model described

earlier. We have used the target detection and

tracking algorithm proposed in (Dinh and Medioni,

2011) to extract the trajectories of moving targets.

A spatial domain heuristic, where-in a point on

the trajectory is removed if it deviates abruptly

from its usual path, has been applied to remove

noise/outliers from the trajectories. Further, the

importance of a block is estimated from the cleaned

trajectories using the steps detailed below. During

pre-processing, the surveillance scene is divided

into a rectangular grid of uniform dimension as

shown in Fig. 1(b) where the total number of blocks

is denoted by M. Assuming that N trajectories of

multiple targets are available for analysis; it can be

represented as a set, say T = {t

...,t

} such that

= (< x

, y

>, < x

, y

>, ......, < x

, y

) is a

trajectory of length m

. The importance of a block is

estimated as:

• Step I. The average velocity (V

avg

) of a target

) is estimated from the uniformly sampled seg-

ments of its trajectory. First, the minimum and

maximum values of the velocity of the targets are

calculated using (1) and (2) such that p

and p

j+1

denote successive points on the trajectory t

that is

bounded by 0 < j <

min

= min |p

− p

j+1

| (1)

max

= max |p

− p

j+1

| (2)

Next, range of the velocity [V

max

− V

min

] is di-

vided into R uniform segments and a histogram of

the instantaneous velocity is generated. Finally,

the mean of all instantaneous velocities under the

largest bin is taken as the average velocity of the

target. This will remove any bias that may occur

due to fast moving segments within a given trajec-

tory.

• Step II. Next, the total number of times a block is

visited by various targets is computed. We call it

global visit (G

) where M

is the block. Initially,

global visit parameter for all the blocks are set to

zero. However, when a target enters into a new

block, we increment its global visit value by one.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

480

• Step III. The global visit parameter is then used

to ﬁlter out some of the unimportant blocks. We

calculate the minimum and maximum values of

and apply a a methodology similar to the one

described in Step I to construct a histogram. Now,

all such blocks where the value of G

is less

than the average of G

’s of the largest bin are

discarded. This essentially discards those blocks

where a target rarely visited or did not visit at all.

• Step IV. Next, the per visit weight (W

) of the

remaining blocks represented in terms of the rela-

tive average velocity of the moving targets is cal-

culated. Initially, per visit weight of all the blocks

is set to zero. It is assumed that a target usually

moves slower than its average velocity when it

approaches the location of a static object of in-

terest. That is, the instantaneous velocity (V

) is

expected to be less as compared to its average ve-

locity V

avg

and the weight of that particular cell is

updated using (3).

= W

avg

−V

avg

(3)

Above step guarantees that, if a target moves rel-

atively slower than its average velocity in a par-

ticular block, it contributes more to the value of

average weight of that block.

• Step V: Using the per visit weight of a block and

the global count described in the previous steps,

a probabilistic estimation of the presence a tar-

get inside a block is made. For this, we deﬁne

a term called Importance (I) that is computed us-

ing (4). To ﬁlter out blocks where the global

visit count is higher due to unintended visits, (e.g.

blocks that cover entry / exit areas are expected

to have large global visit values) we discard all

such blocks where I is less than the average per

visit weight (I

avg

). The cell importance of such

discarded blocks are expected to be small because

targets usually do not spend much time in such

blocks. Finally, normalized importance is taken

as the probability of a block.

I(M

) =

(4)

Therefore, a peak in the distribution of importance

can be indicative of the presence of an object of in-

terest and the presence of several peaks may be ob-

served if multiple interesting objects are found inside

the surveillance area.

4 RESULTS & ANALYSIS

We have selected the CAVIAR

and ViSOR

datasets

that contains several videos of surveillance scenes to

benchmark our model against baselines. Videos un-

der the ”Browsing” category of the CAVIAR dataset

were chosen for analysis as they typically represent

events mentioned in the research hypothesis. Each of

these videos were 240 seconds clips in average with 2

or 3 freely moving targets that are randomly access-

ing a vending machine and an ATM present within the

open environment. From ViSOR dataset, we have se-

lected videos from the ”Outdoor Unimore D.I.I. Setup

- Multicamera - Disjoint Views” set (Vezzani and

Cucchiara, 2010). These videos are of longer duration

(typically in the order of 40-60 minutes). We have

extracted trajectories of several moving targets from

these videos and used them in our analysis. We have

also collected 100+ trajectories from the Fish dataset

and applied them to our proposed methodology to an-

alyze the ﬁsh behavior in underwater videos (Beyan

and Fisher, 2013). In addition to that, a custom in-

house dataset that mainly contains free movement of

human targets accessing a center table within a lab-

oratory environment, has been used for testing. The

above mentioned datasets have been used for indepen-

dent as well as comparative analysis.

The baseline algorithms used for comparison

are carefully selected and can be categorized into

two classes: a) saliency based techniques in-

cluding: discriminative regional feature integra-

tion(DRFI) (Jiang and Li, 2013),principal compo-

nent analysis (PCA) (Margolin and Zelnik-Manor,

2013) and saliency map (SM) (Rahtu and Heikkila,

2010), and b) abandoned object detection (AOD)

The saliency based methods used in our comparison

are mainly image based techniques, hence, we have

extended them for detecting salient objects in video

frames by combining them with a trajectory density

(TD) estimation method. Thus the salient objects de-

tected on a frame-by-frame basis is correlated with

those spatial locations where the trajectory density is

maximum and the interest areas are further localized.

The abandoned object detection algorithm used for

comparison in this paper is based on connected com-

ponent analysis. We present the comparative results

with all four state-of-the-art techniques to demon-

strate the superiority of the proposed algorithm.

In Figure 3, the results of the proposed method in

CAVIAR: Context Aware Vision using Image-based

Active Recognition

ViSOR:www.openvisor.org.

http://www.mathworks.in/help/vision/examples/

abandoned-object-detection.html.

InterestAreaLocalizationusingTrajectoryAnalysisinSurveillanceScenes

481

Figure 3: Results and analysis using videos from datasets.

(a-b) Localization of interest areas of the surveillance scene

of CAVIAR Browsing dataset. (c-d) Results of interest area

localization using ViSOR dataset trajectories. (e-f) Results

using in-house dataset.

Table 1: Performance of the proposed interest area local-

ization technique with respect to ground truths. IA: Interest

Area, V: Visited, NV: Not-Visited, D-Rate: Detection Rate.

Dataset Trajectories

D-Rate

Total V NV

ViSOR

IA 1 62 28 34

100%

IA 2 62 11 51

CAVIAR

IA 1 4 2 2

75%

IA 2 4 2 2

IA 3 4 3 1

In-house IA 1 17 9 8 100%

localizing interest areas from ViSOR, CAVIAR, and

in-house datasets are presented ﬁrst.

It is evident from Figure 3 that the proposed tech-

nique was successful in localizing interest areas. The

corresponding locations of the peaks are highlighted

with a bounding box and when compared to the

ground truth, the proposed technique is found to be

accurate in localizing areas. A summary of results

using the three datasets are presented in Table 1. It

can be observed that the proposed algorithm has mis-

classiﬁed one area of the CAVIAR scene as interest-

ing, marked as peak-3 in Figure 3(b) or interest area

3 in the table. This mis-classiﬁcation is the result of

the protocol behavior (motion pattern) of targets in the

CAVIAR dataset. Although we perceive this a ”non-

interest area” from a behavior analysis point of view,

this disagreement between the stopping of targets in

the middle of a room, and lack of sufﬁcient evidence

of an interest area, can itself be useful to detecting

suspicious activity. However, from an interest area

localization point of view, we classify the third peak

as a false positive or falsely detected interest area.

Our observations with the chosen datasets conﬁrm

that the proposed importance metric, composed of ve-

locity changes and time spent is robust and our as-

Figure 4: Additional results using ViSOR dataset videos

that were captured from another viewing angle on the same

surveillance area at different time instance. (a) Scene with

a rectangular grid. (b) Time varying trajectories of some of

the targets. (c) Trajectories superimposed on the original

scene. (d) Peak detected using the proposed algorithm.

sumptions for interest-area localization holds true for

a large selection of scenarios in surveillance applica-

tions.

Additional results of localization using a set of

videos from the ViSOR dataset is presented in Fig-

ure 4. It may be observed that the peak representing

the interest area has higher density of trajectory points

which is quite natural since many targets is likely to

have visited that area.

Figure 5: (a) Evidence of the presence of interest point in

temporal domain on a set of trajectories selected from the

in-house dataset. (b) A few snapshots of the surveillance

scene where the interest point was automatically marked

when a person reached the location of interest.

Our algorithm was successful in detecting the cor-

rect interest areas and successfully reject false pos-

itives (e.g. locations with high trajectory density)

where the baseline techniques failed. In Figure 5(b),

we present some of the video frames of in-house

data where the interest area was localized automati-

cally. Localization of interest area in temporal domain

is shown in Figure 5(a). We have considered 5354

frames of the in-house dataset videos to highlight the

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

482

Figure 6: Results using the scene from the Fish trajectory

dataset and localised interest area.

relevant time-sequences that correspond to the inter-

est areas represented by block numbers. This experi-

ment demonstrates the effectiveness of our algorithm

even when applied on a sufﬁciently large number of

frames.

Our analysis on Fish video dataset suggests that a

conclusion based on a single peak is complicated be-

cause, the movement of ﬁsh inside water may be gov-

erned by the rules of their world. Several peaks were

detected as shown in Figure 6, where a high density of

ﬁsh population was found. However, the environmen-

tal signiﬁcance of the interest area needs to be studied

more in detail before conclusive remarks can be made

on such datasets.

Figure 7: (a-d) Effect of block size on performance of the

interest area localization process, e.g. block size = 5 × 5,

9 × 9, 15× 15, and 20 × 20.

In our study we have found that our proposed lo-

calization algorithm is sensitive to block size. In Fig-

ure 7, distributions of I(M

) using varying block size,

e.g. 5 × 5, 9 × 9, 15 × 15, and 20 × 20 is presented. It

is evident that the peaks become more prominent with

the increase in the number of divisions (or reduction

of grid sizes). As expected, a larger block size effec-

tively reduces the total number of inter-block move-

ments and hence results in the inclusion of the regions

surrounding the interest area within the peak proxim-

ity. However, in contrary, if the block size is reduced

beyond a scene-speciﬁc threshold, larger number of

closely located peaks are produced thus making the

segmentation of the interest area difﬁcult. After sufﬁ-

cient validation, a grid size of 9 × 9 was found to be

optimal for the choice of videos in our datasets.

Figure 8: (a) Abandoned object detection on CAVIAR

dataset. (b) Trajectory density based interest point local-

ization on another in-house dataset. (c) Various techniques

of saliency based interest point segmentation on CAVIAR

dataset.

Next, we present the results of comparisons of

the proposed approach against the baseline algorithms

mentioned at the beginning of this section. Figure 8

presents the results of interest localization obtained

when applied on CAVIAR dataset. It may be observed

that, although the saliency based techniques (Jiang

and Li, 2013; Margolin and Zelnik-Manor, 2013;

Rahtu and Heikkila, 2010) when combined with tra-

jectory density estimation method is able to localize

some of the interest areas, these baselines were found

to contain a larger number of false positives and is

highly inﬂuenced by noise in measurements. In ad-

dition, the localization of the interest areas were only

possible after the integration of the temporal analysis

through trajectory density estimation and in the ab-

sence of the which, the baselines could not closely

match with the proposed. Figure 8a illustrates the

interest areas obtained using the abandoned object

detection algorithm based on connected component

analysis

that was successful in detecting the reading

desk only. In addition in Figure 8b, we have also ap-

plied the trajectory density based interest area local-

ization algorithm on our in-house dataset, which was

able to identify two locations as denoted by the peaks

in the density map presented. A summary of local-

ization done using the selected baseline algorithms is

presented in Table 2.

http://www.mathworks.in/help/vision/examples/

abandoned-object-detection.html

InterestAreaLocalizationusingTrajectoryAnalysisinSurveillanceScenes

483

Table 2: Localization of interest areas using baseline algo-

rithms on CAVIAR dataset videos. GT: Ground Truth, TP:

True Positive (Correctly Detected Interest Areas), TN: True

Negative ( Interest Areas Missed or Omitted), FP: False

Positive (Areas Incorrectly Detected as Interest Locations).

Method GT TP TN FP

TD 2 2 2 0

DRFI 2 0 0 2

DRFI+TD 2 2 0 1

PCA 2 0 0 2

PCA+TD 2 1 1 0

SM 2 0 0 2

SM+TD 2 1 1 0

AOD 2 1 1 0

Proposed 2 2 0 1

Table 3: Comparison of computational complexity of the

proposed method against chosen baselines calculated on

average per-frame on the sequences from the CAVIAR

dataset.

Method Time (sec.)

Proposed Method 0.015

TD 3.3452

DRFI 8.5640

PCA 5.3923

SM 8.2116

AOD 0.0307

In Table 3, we provide a comparison of the com-

putational complexity of the proposed method against

the chosen baseline strategies. The proposed method

is a seamless integration of tracking and trajectory

analysis for interest area localization that provides an

undue time advantage of 0.015sec. This time gain is

primarily attributed to the spatial independence of the

dynamic motion features that the proposed algorithm

relies-on for localizing interest areas. In contrast, the

other baseline methods use computationally intense

study of spatio-temporal relationships for detecting

interest areas. The experiments have been performed

on an Intel i7 2.8GHz machine running Matlab 2014b.

Veriﬁcation of the Hypothesis

Let, the importance of a block based on the the-

oretical analysis be represented using a discrete

random variable, say y. A histogram considering

a window of M × N around the center of an object

of interest is generated. The histogram represents

the discrete probability distribution of the random

variable. Assume that the distribution is denoted

as Q(y). Similarly, a histogram around the peak of

the probability values computed, is produced. Let

this distribution be represented as P(y). According

to the hypothesis, a target usually approaches the

location of a static object following the shortest

path. Therefore, given adequate number of such

trajectories, the probabilistic model computed from

the data is expected to follow the theoretical model.

We have used Kullback-Leibler Divergence (KLD)

(dis)similarity metric to validate our hypothesis.

Theoretically, D

(P(y)||Q(y)) is a measure of loss

of information when P(y) is approximated by Q(y)

and it is deﬁned using (5).

(P||Q) =

∑



P(y)

Q(y)



P(y) (5)

Since it is a non-symmetric measure, we have used

the symmetric version of the KLD (6). The quantity is

often used for feature selection in classiﬁcation prob-

lems, where P(y) and Q(y) represent conditional pdf

of a feature under two different classes.

∑



P(y)

Q(y)



P(y)+

∑



Q(y)

P(y)



Q(y) (6)

To verify the hypothesis, the distribution shown in

Figure 2(b) was taken as P(y) and the distributions

shown in Figure 7 were taken as Q(y). The following

values of D

for various grid conﬁgurations: 0.6956

(5 × 5), 0.5497 (9 × 9), 0.9619 (15 × 15), and 0.8668

(20 × 20) were recorded. However, this conﬁrms that

the proposed algorithm produces best result in view of

the theoretical model when a 9 × 9 grid size is used.

We also performed correlation analysis between P(y)

and Q(y) and recorded the coefﬁcients using different

grid sizes. The values are 0.01 (5 × 5), 0.77 (9 × 9),

0.63 (15 × 15), and 0.37505 (20 × 20). It is found

that both the metrics behave consistently and 9 ×9 has

been found to be optimum validating our test results

that ”a very large or small sized grid may not be ideal

since it deviates more from the theoretical model”.

5 CONCLUSION & FUTURE

WORK

In this paper, we proposed a technique of localiz-

ing interest areas from a video using the motion tra-

jectories of multiple moving targets in the environ-

ment. The proposed method was based on a theo-

retical model of natural human behavior in an un-

constrained environment that was further statistically

veriﬁed and validated using various datasets. The re-

sults of the method have demonstrated its ability to lo-

calize key elements in the scene that govern changes

in the motion characteristics of target in the environ-

ment. We strongly anticipate the future of this paper

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

484

to throw insight into natural behavior understanding

in surveillance taking into consideration not just the

motion of targets but equally the knowledge of the

other elements in an unconstrained scene, that govern

such movements within that environment.

REFERENCES

Beyan, C. and Fisher, R. (2013). Detection of abnormal ﬁsh

trajectories using a clustering based hierarchical clas-

siﬁer. In British Machine Vision Conference, pages

21.1–21.11.

Brun, L., S. A. and Vento, M. (2014). Dynamic scene un-

derstanding for behavior analysis based on string ker-

nels. In IEEE Transactions on Circuits and Systems

for Video Technology.

Dinh, T., V. N. and Medioni, G. (2011). Context tracker:

Exploring supporters and distracters in unconstrained

environments. In Computer Vision and Pattern Recog-

nition (CVPR), pages 1177–1184.

Farabet, C., C. C. N. L. and LeCun, Y. (2013). Learning hi-

erarchical features for scene labeling. In IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,

pages 1915–1929.

Jiang, H., W. J. Y. Z. W. Y. Z. N. and Li, S. (2013). Salient

object detection: A discriminative regional feature in-

tegration approach. In Computer Vision and Pattern

Recognition (CVPR), pages 2083–2090.

Margolin, R., T. A. and Zelnik-Manor, L. (2013). What

makes a patch distinct? In Computer Vision and Pat-

tern Recognition (CVPR). 1139-1146.

Morris, B. and Trivedi, M. (2008). Learning and clas-

siﬁcation of trajectories in dynamic scenes: A gen-

eral framework for live video analysis. In Inter-

national Conference on Advanced Video and Signal

Based Surveillance (AVSS), pages 154–161.

Pan, J., F. Q. and Pankanti, S. (2011). Robust abandoned ob-

ject detection using region-level analysis. In Interna-

tional Conference on Image Processing (ICIP), pages

3597–3600.

Piciarelli, C. and Foresti, G. (2006). On-line trajectory clus-

tering for anomalous events detection. Pattern Recog-

nition Letters, 27(15):1835 – 1842. Vision for Crime

Detection and Prevention.

Piciarelli, C., M. C. and Foresti, G. (2008). Trajectory-

based anomalous event detection. Circuits and Sys-

tems for Video Technology, IEEE Transactions on,

18(11):1544–1554.

Rahtu, E., K. J. S. M. and Heikkila, J. (2010). Segmenting

salient objects from images and videos. In European

Conference on Computer Vision (ECCV), pages 366–

379.

Saleemi, I., S. K. and Shah, M. (2009). Probabilistic mod-

eling of scene dynamics for applications in visual

surveillance. In IEEE Transactions on Pattern Anal-

ysis and Machine Intelligence, number 31(8), pages

1472–1485.

Sharma, P. and Nevatia, R. (2013). Efﬁcient detector adap-

tation for object detection in a video. In Computer

Vision and Pattern Recognition (CVPR), 2013 IEEE

Conference on, pages 3254–3261.

Suzuki, N., H. K. T. K. K. Y. S. Y. and Fujino, Y. (2007).

Learning motion patterns and anomaly detection by

human trajectory analysis. In International Confer-

ence on Systems, Man and Cybernetics, pages 498–

503.

Vezzani, R. and Cucchiara, R. (2010). Video surveillance

online repository (visor): an integrated framework.

In Multimedia Tools and Applications, number 50(2),

pages 359–380.

Wang, X., T. K. and Grimson, E. (2006). Learning seman-

tic scene models by trajectory analysis. In European

Conference on Computer Vision (ECCV), pages 110–

123.

Xu, D., W. X. S. D. L. N. and Chen, Y. (2013). Hierarchical

activity discovery within spatio-temporal context for

video anomaly detection. In International Conference

on Image Processing (ICIP), pages 3597–3601.

Yang, Y., L. J. and Shah, M. (2009). Video scene under-

standing using multi-scale analysis. In Computer Vi-

sion and Pattern Recognition (CVPR), pages 1669–

1676.

Zhou, Y., Y. S. and Huang, T. (2007). Detecting anomaly in

videos from trajectory similarity analysis. In Interna-

tional Conference on Multimedia and Expo (ICME),

pages 1087–1090.

InterestAreaLocalizationusingTrajectoryAnalysisinSurveillanceScenes

485