SEMANTICS-PROVIDED ENVIRONMENT VIEWS FOR

NORMALITY ANALYSIS-BASED INTELLIGENT SURVEILLANCE

Lorenzo M. L

opez-L

opez, Javier Albusac, Jos

e Jes

us Castro-Schez and Luis Jim

enez-Linares

Escuela Superior de Inform

atica, Universidad de Castilla-La Mancha, Paseo de la Universidad 4, Ciudad Real, Spain

Keywords:

Surveillance system, Intelligent surveillance, Multisensor surveillance, Multi-sensor data fusion, Sensor data

integration, Sensor processing and control, Multiagent system.

Abstract:

Nowadays, the design and development of intelligent surveillance systems is a hot research topic thanks to the

recent advances in related ﬁelds such as computer perception, artiﬁcial intelligence, and distributed devices

infrastructures. These systems are gradually going from the classic CCTV passive surveillance systems to-

wards systems which are capable of offering automatic interpretation of the events occurred in a monitored

environment and decision support information based on the data obtained from a number of heterogeneous

perception devices. In this work, we introduce the formal deﬁnition of an intermediate layer in the archi-

tecture of an intelligent surveillance system, of which purpose is to provide the components responsible for

performing the reasoning processes with the data from the environment they need. Such data is provided by

means of environment views, which are data objects that contain not only data from different sensors, but

also associated semantics which depends on the particular context in which the analysis of the normality of a

concept is performed.

1 INTRODUCTION

The problem of intelligent surveillance deals with the

perception, interpretation, and identiﬁcation of the ac-

tivities and situations that occur in a monitored envi-

ronment.

The technological evolution of surveillance sys-

tems started with the CCTV (Closed-Circuit Televi-

sion) based surveillance systems (M. Valera and S. A.

Velastin, 2005). These employed a set of video cam-

eras distributed throughout the environment and usu-

ally connected to a central security department, where

the security personnel must be continuously monitor-

ing what is shown in the monitors. This approach is

often referred to as passive surveillance.

The second generation of surveillance systems

has been possible due to the recent advances in the

computer vision and artiﬁcial intelligence ﬁelds, such

as new image processing techniques (L. Rodriguez-

Benitez et al., 2008) or behaviour and activity recog-

nition. Intelligent visual surveillance systems, as they

are commonly known (W. Hu et al., 2004), are able

to automatically learn and recognise activity patterns

or situations that occur in an environment from the

data provided by a collection of video cameras dis-

tributed throughout the environment (P. Remagnino

et al., 2004) (R. T. Collins et al., 2000). Although

these systems suppose a clear improvement over the

ﬁrst generation ones, some of the employed algo-

rithms and techniques are still not mature enough,

which results in systems that often need a long time

to response and trigger too many false positive alarms

(M. Valera and S. A. Velastin, 2005).

Finally, the third generation of surveillance sys-

tems refers to those systems which are able to use

the information provided by a heterogeneous set of

sensors distributed across the environment. Such set

of sensors is composed of devices of relatively low

cost, such as radio-frequency identiﬁcation (RFID)

sensors, audio sensors, presence detection sensors,

and video cameras. As a consequence of this hetero-

geneity, these systems are able to obtain a more accu-

rate knowledge of the environment (R. C. Luo et al.,

2002). However, some aspects about the design and

development of this third generation of systems are

still not well deﬁned, thus they are focusing an im-

portant research activity nowadays (D. Smith and S.

Singh, 2006).

This paper introduces the formalism for an inter-

mediate architectural layer aimed at fusing and pro-

161

M. López-López L., Albusac J., Jesús Castro-Schez J. and Jiménez-Linares L. (2009).

SEMANTICS-PROVIDED ENVIRONMENT VIEWS FOR NORMALITY ANALYSIS-BASED INTELLIGENT SURVEILLANCE.

In Proceedings of the International Conference on Agents and Artiﬁcial Intelligence, pages 161-166

DOI: 10.5220/0001542701610166

 SciTePress

viding with semantics the sensory data obtained from

the environment according to the particular necessi-

ties of the reasoning components in the upper layers.

The fusion of heterogeneous sensory data (D. L. Hall

and J. Llinas, 1997) (H. B. Mitchell, 2007) is an im-

portant issue to resolve in this respect. This is man-

aged by means of environment views which are ob-

jects that comprise sensory data which is fused and

provided with semantics according to the particular

necessities of the reasoning components which re-

quire them.

Next, the motivation for introducing an intermedi-

ate semantic layer is exposed in section 2. The formal

model of the proposed semantic layer is described in

section 3. Finally, section 4 concludes the paper.

2 MOTIVATION FOR A

SEMANTIC LAYER

Our proposed semantic layer comes to bridge the ex-

isting gap between the lower and the upper layers in

the general architecture of an intelligent surveillance

system. The lower layers are usually responsible for

allowing the distribution of sensors throughout the en-

vironment and communicating the signals they pro-

duce. Such signals are composed of data which is

highly device-dependent. On the other hand, the up-

per layers are usually in charge of inferring a cogni-

tive model of the situation in which the environment

is from the captured sensory data. In order to isolate

the upper layers from the details of the lower ones,

we introduce an intermediate semantic layer of which

purpose is to process the sensory data, fusing it and

providing it with the proper semantics according to

the particular necessities of the reasoning components

that demand it. Next in this section, we brieﬂy intro-

duce a formal model to detect abnormal behaviours

or activities in a monitored environment based on the

analysis of the normality of deﬁned concepts (J. Al-

busac et al., 2008).

When it comes to design an intelligent surveil-

lance system that is able to distinguish between nor-

mal and anomalous situations, three possible ap-

proaches are proposed: (1) deﬁning anomalous sit-

uations with the collaboration of a human expert; (2)

deﬁning normal situations with the collaboration of a

human expert; (3) deﬁning normal situations together

with the most frequently occurred anomalous ones.

According to the ﬁrst approach, the expert must be

able to deﬁne every possible anomalous situation that

may occur. This turns to be impossible in real scenar-

ios, where the abnormality is unpredictable in most

cases. According to the second approach, only the

situations considered as normal are deﬁned as these

are usually easier to enumerate than the anomalous

ones. However, although every non-recognised activ-

ity would be marked as anomalous, it would be im-

possible for the system to establish the degree of po-

tential risk that situation implies. Our work is based

on the last approach. In this case, the expert must de-

ﬁne not only the set of normal situations, but also the

set of the most frequently occurred anomalous ones.

Thus, the system is able to recognise every normal

and anomalous situation which has been previously

deﬁned and the worst case arises when an anomalous

situation which has not been previously deﬁned oc-

curs. Similarly to the second approach, in this case

the system would not be able to establish the degree

of risk that situation implies, however it would be pos-

sible to trigger some kind of alarm.

To formalise the normality of a monitored envi-

ronment, we have elaborated a model in which we

deﬁne the problem of surveillance as the interpreta-

tion of a set of perceptions which are provided by

a collection of sensors. This model is proposed in

(J. Albusac et al., 2008) and brieﬂy included here for

convenience.

P = {E

, E

, ..., E

} (1)

Where P denotes the problem of surveillance in a

global monitored environment and each E

denotes a

portion of such environment which features are cap-

tured by means of a set of sensors. Therefore, each E

is considered a monitored environment as well.

Also, we can deﬁne an environment E

as a four

elements tuple as following.

E =< V ; O;C; O ×C > (2)

where:

• V is the set of variables extracted from the data

provided by the sensors. Such extraction task is

performed by the semantic layer.

• O is the set of object classes to which the system

must pay special attention.

• C is the set of concepts upon which normality

analysis processes are performed, such as normal

object trajectories or normal object speed.

• O × C refers to the correspondence between the

sets of objects and that of the concepts. This de-

ﬁnes which concepts must be used to analyse the

behaviour of each object class, such as normal

pedestrian trajectories or normal vehicles speed.

Next, we deﬁne the normality N

of some given

concept C

as following:

ICAART 2009 - International Conference on Agents and Artificial Intelligence

162

=< V

;DDV

;Φ

> (3)

where,

• V

⊆ V is the subset of variables needed to perform

the normality analysis.

• DDV

is the set of domains of deﬁnition for each

variable in V

• Φ

is the set of constraints used to analyse the nor-

mality of an object behaviour according to a con-

cept C

The constraints in Φ

are functions f

i j

which take

a set of variables from V

as input and return a value

within the interval [0, 1] as output.

i j

: P (V

) −→ [0, 1] (4)

where 1 implies the maximum degree of constraint

satisfaction and 0 the opposite case.

In short, the above deﬁnitions mean that to per-

form the normality analysis of an object behaviour

according to a concept C

, a set of variables V

ex-

tracted from the sensors data V is needed. Besides,

some variables in V must be included in several dif-

ferent V

, which means that such variables are needed

to perform several different normality analysis pro-

cesses. Moreover, according to each normality anal-

ysis, each variable in its corresponding V

has an as-

sociated domain of deﬁnition DDV

. Such domains

of deﬁnition provide each variable in V

with particu-

lar semantics according to the context in which N

performed. Therefore, some given variable may have

different semantics according to the different contexts

in which it is needed to analyse the normality.

3 PROPOSED SEMANTIC LAYER

The objectives of the proposed semantic layer are:

• To verify and interpret the different data ﬂows

provided by the sensors in the environment.

• To fuse the captured sensory data as required (en-

vironment views) by the normality analysis com-

ponents.

• To provide the sensory data with semantics ac-

cording to the speciﬁc context in which it is

needed.

As shown in Figure 1, the semantic layer is com-

posed of two sub-levels:

• Interpretation Sub-level. Which objectives are

obtaining, verifying, and interpreting the data

ﬂows supplied by the sensors. The output of this

Event

Dispatcher

Agent

Event

Dispatcher

Agent

Perceptual Layer

Audio

Event

Video

Event

Video Event

Video:1

Camera:1

Frame:1

Number:7

Actor:objX

Roles

Rol: Car

...

Video

Event

Access

Event

Video

Event

Semantic Layer

...

Signal

Channel 1

Signal

Channel 2

Signal

Channel n

Sensor

Interpreter

...

Interpretation

Sub-level

....

Reasoning Layer

Sensor

Interpreter

Sensor

Interpreter

Events

Channel

History

Database

History

Agent

Environment

Views

Directory

Environment

Views

Agent

Event

Dispatcher

Agent

Normality

Analysis

Component

Normality

Analysis

Component

Normality

Analysis

Component

Data Fusion

and

Distribution

Sub-level

Env. Views

Distribution

Channel

Figure 1: The Semantic layer links the Perceptual and Rea-

soning layers.

sub-level is the data extracted from the sensors

and represented in a device-independent common

form.

• Data Fusion and Distribution Sub-level. Which

objective is to provide the components in the rea-

soning layer with the environment data they re-

quire and according to the semantics they demand.

That is, to provide each normality analysis com-

ponent with the environment views they request.

3.1 Interpretation Sub-level

Data provided by each sensor on the environment

come to the semantic layer through the signal chan-

nels. Then, components in the interpretation sub-level

are responsible for obtaining, verifying and interpret-

ing such data. These components, named sensor in-

terpreters, may be viewed as agents which are spe-

cialised in interpreting the data provided by some kind

sensor.

SEMANTICS-PROVIDED ENVIRONMENT VIEWS FOR NORMALITY ANALYSIS-BASED INTELLIGENT

SURVEILLANCE

163

First, a sensor interpreter must verify and interpret

the data provided by the sensor it is specialised in.

Such data is highly device-dependent, so we can state

that for any sensor s

installed on the environment, s

provides strings of data according to some language

which generates them.

That is, sensor s

generates strings of the form:

α ∈ A

∗

(5)

which means that strings α are made up of symbols

from some alphabet A

. Besides, we can state that for

any sensor s

, there is always a language L

associated

to it.

∀s

∃L

such that L

⊆ A

∗

(6)

Moreover, each language L

is described by

means of a formal grammar G

, deﬁned as:

= (N

, T

, P

, S

) (7)

where N

, T

, P

, S

are the sets of non-terminal sym-

bols, terminal symbols, productions and the start sym-

bol of the grammar G

, respectively.

Therefore, we can state that a sensor interpreter is

a ﬁve elements tuple deﬁned as:

=< V

;SR

;SC

> (8)

where:

• V

= {v

, ..., v

} is the set of variables extracted

from the data provided by sensor s

• SR

= {sr

, ..., sr

} is the set of semantic rules

deﬁned for the variables extracted from sensor s

This is a set of constraints which the incoming

data must satisfy in order to determine that the

sensor s

works properly. An example of such a

rule might be to check whether some variable’s

value is within some range of deﬁnition.

• G

is the grammar that generates the language L

that describes the strings of data provided by the

sensor s

(7).

• I

is a function which transforms the strings α

provided by the sensor s

into strings β from a

common intermediate language IL.

: α −→ β (9)

such that α ∈ L

and β ∈ IL. Besides, such trans-

formation satisﬁes

m(α) = m(β) (10)

where m(x) is a function which returns the mean-

ing of x on the semantic domain.

• SC

is the set of signal channels to which the in-

terpreter SI

is subscribed to. A signal channel

is the mechanism by which sensor interpreter

obtains the data provided by sensor s

Once the interpretation process is ﬁnished, sensor

interpreters send the extracted data to the data fusion

and distribution sub-level, through the events chan-

nel. At this point, data is represented according to an

intermediate common language IL to allow the fusion

of data extracted from several kinds of sensors and

the construction of the environment view objects re-

quested by the normality analysis set of components.

3.2 Data Fusion and Distribution

Sub-level

Components in this sub-level are responsible for sup-

plying the normality analysis components in the rea-

soning layer with the environment data they require

and in the form they require it. This lead us to de-

ﬁne the concept of environment view. An environ-

ment view is an object of data consisting of a set of

data variables together with associated semantics as

required by the normality analysis component that re-

quests it. Each normality analysis component must

request an environment view in order to be provided

with the data from the environment it needs.

Let S = {SI

, ..., SI

} be the set of sensor inter-

preters plugged-in to the system,

∀SI

∈ S, ∃V

= {v

, ..., v

} (11)

deﬁned V

as the set of data variables provided by

sensor interpreter SI

Let V be the set of data variables provided by all

sensor interpreters in S,

V =

[

i=1

(12)

We deﬁne an environment view as a two elements

tuple as follows:

=< V

;DDV

> (13)

where:

• V

= {v

, ..., v

} ⊆ V refers to the set of variables

included in environment view EV

• DDV

= {DDV

, ..., DDV

} refers to the set of do-

mains of deﬁnition for the variables in V

. These

elements specify the proper semantics to their cor-

responding elements in V

Next, we deﬁne how to obtain the variables V

and

the form of the elements DDV

in DDV

ICAART 2009 - International Conference on Agents and Artificial Intelligence

164

Let V

jSI

⊆ V

be the subset of variables from V

provided by sensor interpreter SI

, named component

view of EV

[

i=1

jSI

and V

jSI

⊆ V

(14)

which means that the set of variables V

of an environ-

ment view EV

is the aggregation of the component

views V

jSI

extracted from the sets of variables V

pro-

vided by each sensor interpreter SI

. Of course, the

component view V

jSI

for a given sensor interpreter

may be the empty set, which means that the envi-

ronment view EV

does not require any data variable

from sensor interpreter SI

Moreover, data provided by sensor interpreters

may be given as crisp or vague values. For exam-

ple, a presence detection sensor interpreter may return

a crisp variable value indicating presence detected,

which may be given as a boolean true value. Also,

a video camera may return some linguistic value indi-

cating that a person moving fast has been identiﬁed in

a monitored area. Thus, we need a common data type

which allows us to represent every possible kind of

value provided by any sensor interpreter in S. For that

purpose, we propose the use of trapezoidal functions,

as deﬁned in (15) (J. J. Castro-Schez et al., 2004b).

∏

(u;a, b, c, d) =











0 u < a

(u−a)

(b−a)

a ≤ u < b

1 b ≤ u ≤ c

(d−u)

(d−c)

c < u ≤ d

0 u > d

(15)

The reason for choosing this function is that it is

a suitable way to represent both, categorical and nu-

merical values and in both cases such values may be

given to the user by means of linguistic terms, which

is convenient in order to provide them with informa-

tion expressed in their own terminology. Besides, the

inference rules in the normality analysis components

can be expressed using linguistic variables and fuzzy

theory techniques for dealing with the inherent un-

certainty of the environment data (J. J. Castro-Schez

et al., 2004a).

When designing a new normality analysis compo-

nent, a new environment view must be also speciﬁed

indicating its requirements in terms of environment

data and its associated semantics.

The main components in this sub-level are the

event dispatcher agents (Figure 1), which main func-

tions are: (1) collecting the data required by the nor-

mality analysis component; (2) processing such data

according to the requirements of the normality analy-

sis components to be served; and (3) distributing the

data to those normality analysis components that de-

mand it.

The event dispatcher agents are the components

responsible for distributing the sensory data to the

normality analysis components that require it in form

of environment views. This process is as follows:

1. An event dispatcher agent keeps waiting until

some event of data is triggered.

2. When a new environment event is triggered, some

event dispatcher agent collects the incoming data

and asks the environment views agent for the nor-

mality analysis components which requested an

environment view containing the obtained data.

3. The selected items (environment views) may need

to be completed with previously obtained data

from the history database, because the incoming

data may be insufﬁcient to construct a complete

environment view.

4. Once an event dispatcher agent has obtained the

complete chunk of data which make up an envi-

ronment view, it processes it according to DDV

in order to provide data with the requested seman-

tics.

5. Finally, the event dispatcher agent transfers the

chunk of data to the normality analysis component

(or components) which requested the environment

view object just constructed.

The environment views are stored in the environ-

ment views directory. This component is managed

by the environment views agent, which is responsi-

ble for both publishing new data in the directory and

retrieve existing data under request of an event dis-

patcher agent. When a new normality analysis com-

ponent is plugged-in to the system, a new environ-

ment view must be deﬁned and published in the en-

vironment views directory. We deﬁne the structure

of this component as a collection of entry elements.

An entry element is a two elements tuple composed

by (N

,EV

) elements. The element N

refers to

the identiﬁer of some normality analysis component,

whereas the EV

element refers to a deﬁned environ-

ment view. This way, some environment view may

be associated with several normality analysis compo-

nents when needed.

When the data contained in a new environment

event is insufﬁcient to construct an environment view

object, the event dispatcher agent responsible for at-

tending that event must go somewhere else to obtain

the missing data. Such place is the history database.

This component is managed by the history agent

which responsibilities are both to store data incom-

ing from the event channel in the history database

SEMANTICS-PROVIDED ENVIRONMENT VIEWS FOR NORMALITY ANALYSIS-BASED INTELLIGENT

SURVEILLANCE

165

and to retrieve existing data from the database un-

der request of an event dispatcher agent. The his-

tory database is composed of a collection of entry el-

ements. Each entry consists in a three elements tuple

(SI

, V

, Moment), such that SI

is the identiﬁer of the

sensor interpreter which provided the data, V

is the

set of variables extracted from sensor interpreter SI

and Moment is the time and date when such informa-

tion was retrieved by the system.

Once an event dispatcher agent has completed the

chunk of data that comprises an environment view, it

sends a message containing such data through the en-

vironment views distribution channel. This channel

supports a broadcast-like communication mode, such

that every normality analysis component must sub-

scribe to it and they can just receive those messages

that are sent to them. This approach has as its main

advantage the fact that only one communication point

is needed to be known by both, the event dispatcher

agents and the normality analysis components.

4 CONCLUSIONS

The trend in the design and development of intelligent

surveillance systems is to use not only the visual in-

formation provided by a set of video cameras, but also

to use other kinds of sensors to allow the system to

handle a more accurate knowledge of the monitored

environment. In this respect, the fusion of sensory

data plays an essential role as different sensors pro-

vide data in a variety of different forms. Besides, the

intelligent surveillance based on normality analysis

requires that the same sensory data can is used in dif-

ferent analysis contexts with different semantics. We

have presented an architectural layer to fuse and pro-

vide with semantics sensory data according to the par-

ticular requirements of the normality analysis compo-

nents which require it. We have deﬁned the concept of

environment view as an object that contains the data

requested for a normality analysis component. Be-

sides, the proposed architecture allows for the easy

scalability in terms of both, the sensors installed in

the environment and the normality analysis compo-

nents plugged-in to the system. Adding a new kind

of sensor entails designing a new sensor interpreter

capable of interpreting the data sent by such kind of

sensor and leaving it available to the rest of compo-

nents of the system. On the other hand, adding a new

normality analysis component entails to deﬁne a new

environment view, according to the requirements of

data and semantics of the newly added component.

ACKNOWLEDGEMENTS

This work has been founded by the Regional Gov-

ernment of Castilla-La Mancha under Research

Project e-PACTOS (ref. PAC-06-141), Sarasvati (ref.

PBC06-0064) and the Spanish Ministry of Education

and Science under the Research Project TIN2007-

62568.

REFERENCES

D. L. Hall and J. Llinas (1997). An introduction to multisen-

sor data fusion. Proceedings of the IEEE, 85(1):6–23.

D. Smith and S. Singh (2006). Approaches to Multisen-

sor Data Fusion in Target Tracking: A Survey. IEEE

Transactions on Knowledge and Data Engineering,

18(12):1696–1710.

H. B. Mitchell (2007). Multi-Sensor Data Fusion: An In-

troduction. Springer-Verlag.

J. Albusac, D. Vallejo, L. Jimenez, J. J. Castro-Schez, and

L. Rodriguez (2008). Intelligent Surveillance based

on Normality Analysis to Detect Abnormal Behaviors.

Submitted to International Journal of Pattern Recog-

nition and Artiﬁcial Intelligence.

J. J. Castro-Schez, J. L. Castro, and J. M. Zurita (2004a).

Fuzzy repertory table: a method for acquiring knowl-

edge about input variables to machine learning al-

gorithm. IEEE Transactions on Fuzzy Systems,

12(1):123–139.

J. J. Castro-Schez, N. R. Jennings, X. Luo, and N. Shadbolt

(2004b). Acquiring domain knowledge for negotiat-

ing agents: a case of study. International Journal of

Human-Computer Studies, 61(1):3–31.

L. Rodriguez-Benitez, J. Moreno-Garcia, and J. J. Castro-

Schez (2008). Automatic Object Behaviour Recogni-

tion from Compressed Video Domain. Image Vision

and Computing. doi: 10.1016/j.imavis.2008.07.002.

M. Valera and S. A. Velastin (2005). Intelligent Distributed

Surveillance Systems: A Review. IEEE Proceedings

on Vision, Image, and Signal Processing, 152(2):192–

204.

P. Remagnino, A. I. Shihab, and G. A. Jones (2004). Dis-

tributed intelligence for multi-camera visual surveil-

lance. Pattern Recognition, 37(4):675–689.

R. C. Luo, C. Yih, and K. L. Su (2002). Multisensor Fu-

sion and Integration: Approaches, Applications, and

Future Research Directions. IEEE Sensors Journal,

2(2):107–119.

R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Dug-

gins, Y. Tsin, D. Tolliver, N. Enomoto, O. Hasegawa,

P. Burt, and L. Wixson (2000). A System for Video

Surveillance and Monitoring. Technical report, The

Robotics Institute, Carnegie Mellon University.

W. Hu, T. Tan, and L. Wang (2004). A survey on visual

surveillance of object motion and behaviors. IEEE

Transactions on Systems, Man, and Cybernetics, Part

C: Applications and Reviews, 34(3):334–352.

ICAART 2009 - International Conference on Agents and Artificial Intelligence

166