SEMANTICS-PROVIDED ENVIRONMENT VIEWS FOR
NORMALITY ANALYSIS-BASED INTELLIGENT SURVEILLANCE
Lorenzo M. L
´
opez-L
´
opez, Javier Albusac, Jos
´
e Jes
´
us Castro-Schez and Luis Jim
´
enez-Linares
Escuela Superior de Inform
´
atica, Universidad de Castilla-La Mancha, Paseo de la Universidad 4, Ciudad Real, Spain
Keywords:
Surveillance system, Intelligent surveillance, Multisensor surveillance, Multi-sensor data fusion, Sensor data
integration, Sensor processing and control, Multiagent system.
Abstract:
Nowadays, the design and development of intelligent surveillance systems is a hot research topic thanks to the
recent advances in related fields such as computer perception, artificial intelligence, and distributed devices
infrastructures. These systems are gradually going from the classic CCTV passive surveillance systems to-
wards systems which are capable of offering automatic interpretation of the events occurred in a monitored
environment and decision support information based on the data obtained from a number of heterogeneous
perception devices. In this work, we introduce the formal definition of an intermediate layer in the archi-
tecture of an intelligent surveillance system, of which purpose is to provide the components responsible for
performing the reasoning processes with the data from the environment they need. Such data is provided by
means of environment views, which are data objects that contain not only data from different sensors, but
also associated semantics which depends on the particular context in which the analysis of the normality of a
concept is performed.
1 INTRODUCTION
The problem of intelligent surveillance deals with the
perception, interpretation, and identification of the ac-
tivities and situations that occur in a monitored envi-
ronment.
The technological evolution of surveillance sys-
tems started with the CCTV (Closed-Circuit Televi-
sion) based surveillance systems (M. Valera and S. A.
Velastin, 2005). These employed a set of video cam-
eras distributed throughout the environment and usu-
ally connected to a central security department, where
the security personnel must be continuously monitor-
ing what is shown in the monitors. This approach is
often referred to as passive surveillance.
The second generation of surveillance systems
has been possible due to the recent advances in the
computer vision and artificial intelligence fields, such
as new image processing techniques (L. Rodriguez-
Benitez et al., 2008) or behaviour and activity recog-
nition. Intelligent visual surveillance systems, as they
are commonly known (W. Hu et al., 2004), are able
to automatically learn and recognise activity patterns
or situations that occur in an environment from the
data provided by a collection of video cameras dis-
tributed throughout the environment (P. Remagnino
et al., 2004) (R. T. Collins et al., 2000). Although
these systems suppose a clear improvement over the
first generation ones, some of the employed algo-
rithms and techniques are still not mature enough,
which results in systems that often need a long time
to response and trigger too many false positive alarms
(M. Valera and S. A. Velastin, 2005).
Finally, the third generation of surveillance sys-
tems refers to those systems which are able to use
the information provided by a heterogeneous set of
sensors distributed across the environment. Such set
of sensors is composed of devices of relatively low
cost, such as radio-frequency identification (RFID)
sensors, audio sensors, presence detection sensors,
and video cameras. As a consequence of this hetero-
geneity, these systems are able to obtain a more accu-
rate knowledge of the environment (R. C. Luo et al.,
2002). However, some aspects about the design and
development of this third generation of systems are
still not well defined, thus they are focusing an im-
portant research activity nowadays (D. Smith and S.
Singh, 2006).
This paper introduces the formalism for an inter-
mediate architectural layer aimed at fusing and pro-
161
M. López-López L., Albusac J., Jesús Castro-Schez J. and Jiménez-Linares L. (2009).
SEMANTICS-PROVIDED ENVIRONMENT VIEWS FOR NORMALITY ANALYSIS-BASED INTELLIGENT SURVEILLANCE.
In Proceedings of the International Conference on Agents and Artificial Intelligence, pages 161-166
DOI: 10.5220/0001542701610166
Copyright
c
SciTePress
viding with semantics the sensory data obtained from
the environment according to the particular necessi-
ties of the reasoning components in the upper layers.
The fusion of heterogeneous sensory data (D. L. Hall
and J. Llinas, 1997) (H. B. Mitchell, 2007) is an im-
portant issue to resolve in this respect. This is man-
aged by means of environment views which are ob-
jects that comprise sensory data which is fused and
provided with semantics according to the particular
necessities of the reasoning components which re-
quire them.
Next, the motivation for introducing an intermedi-
ate semantic layer is exposed in section 2. The formal
model of the proposed semantic layer is described in
section 3. Finally, section 4 concludes the paper.
2 MOTIVATION FOR A
SEMANTIC LAYER
Our proposed semantic layer comes to bridge the ex-
isting gap between the lower and the upper layers in
the general architecture of an intelligent surveillance
system. The lower layers are usually responsible for
allowing the distribution of sensors throughout the en-
vironment and communicating the signals they pro-
duce. Such signals are composed of data which is
highly device-dependent. On the other hand, the up-
per layers are usually in charge of inferring a cogni-
tive model of the situation in which the environment
is from the captured sensory data. In order to isolate
the upper layers from the details of the lower ones,
we introduce an intermediate semantic layer of which
purpose is to process the sensory data, fusing it and
providing it with the proper semantics according to
the particular necessities of the reasoning components
that demand it. Next in this section, we briefly intro-
duce a formal model to detect abnormal behaviours
or activities in a monitored environment based on the
analysis of the normality of defined concepts (J. Al-
busac et al., 2008).
When it comes to design an intelligent surveil-
lance system that is able to distinguish between nor-
mal and anomalous situations, three possible ap-
proaches are proposed: (1) defining anomalous sit-
uations with the collaboration of a human expert; (2)
defining normal situations with the collaboration of a
human expert; (3) defining normal situations together
with the most frequently occurred anomalous ones.
According to the first approach, the expert must be
able to define every possible anomalous situation that
may occur. This turns to be impossible in real scenar-
ios, where the abnormality is unpredictable in most
cases. According to the second approach, only the
situations considered as normal are defined as these
are usually easier to enumerate than the anomalous
ones. However, although every non-recognised activ-
ity would be marked as anomalous, it would be im-
possible for the system to establish the degree of po-
tential risk that situation implies. Our work is based
on the last approach. In this case, the expert must de-
fine not only the set of normal situations, but also the
set of the most frequently occurred anomalous ones.
Thus, the system is able to recognise every normal
and anomalous situation which has been previously
defined and the worst case arises when an anomalous
situation which has not been previously defined oc-
curs. Similarly to the second approach, in this case
the system would not be able to establish the degree
of risk that situation implies, however it would be pos-
sible to trigger some kind of alarm.
To formalise the normality of a monitored envi-
ronment, we have elaborated a model in which we
define the problem of surveillance as the interpreta-
tion of a set of perceptions which are provided by
a collection of sensors. This model is proposed in
(J. Albusac et al., 2008) and briefly included here for
convenience.
P = {E
1
, E
2
, ..., E
n
} (1)
Where P denotes the problem of surveillance in a
global monitored environment and each E
i
denotes a
portion of such environment which features are cap-
tured by means of a set of sensors. Therefore, each E
i
is considered a monitored environment as well.
Also, we can define an environment E
i
as a four
elements tuple as following.
E =< V ; O;C; O ×C > (2)
where:
V is the set of variables extracted from the data
provided by the sensors. Such extraction task is
performed by the semantic layer.
O is the set of object classes to which the system
must pay special attention.
C is the set of concepts upon which normality
analysis processes are performed, such as normal
object trajectories or normal object speed.
O × C refers to the correspondence between the
sets of objects and that of the concepts. This de-
fines which concepts must be used to analyse the
behaviour of each object class, such as normal
pedestrian trajectories or normal vehicles speed.
Next, we define the normality N
C
i
of some given
concept C
i
as following:
ICAART 2009 - International Conference on Agents and Artificial Intelligence
162
N
C
i
=< V
i
;DDV
i
;Φ
i
> (3)
where,
V
i
V is the subset of variables needed to perform
the normality analysis.
DDV
i
is the set of domains of definition for each
variable in V
i
.
Φ
i
is the set of constraints used to analyse the nor-
mality of an object behaviour according to a con-
cept C
i
.
The constraints in Φ
i
are functions f
i j
which take
a set of variables from V
i
as input and return a value
within the interval [0, 1] as output.
f
i j
: P (V
i
) [0, 1] (4)
where 1 implies the maximum degree of constraint
satisfaction and 0 the opposite case.
In short, the above definitions mean that to per-
form the normality analysis of an object behaviour
according to a concept C
i
, a set of variables V
i
ex-
tracted from the sensors data V is needed. Besides,
some variables in V must be included in several dif-
ferent V
i
, which means that such variables are needed
to perform several different normality analysis pro-
cesses. Moreover, according to each normality anal-
ysis, each variable in its corresponding V
i
has an as-
sociated domain of definition DDV
i
. Such domains
of definition provide each variable in V
i
with particu-
lar semantics according to the context in which N
C
i
is
performed. Therefore, some given variable may have
different semantics according to the different contexts
in which it is needed to analyse the normality.
3 PROPOSED SEMANTIC LAYER
The objectives of the proposed semantic layer are:
To verify and interpret the different data flows
provided by the sensors in the environment.
To fuse the captured sensory data as required (en-
vironment views) by the normality analysis com-
ponents.
To provide the sensory data with semantics ac-
cording to the specific context in which it is
needed.
As shown in Figure 1, the semantic layer is com-
posed of two sub-levels:
Interpretation Sub-level. Which objectives are
obtaining, verifying, and interpreting the data
flows supplied by the sensors. The output of this
Event
Dispatcher
Agent
Event
Dispatcher
Agent
Perceptual Layer
Audio
Event
Video
Event
Video Event
Video:1
Camera:1
Frame:1
Number:7
Actor:objX
Roles
Rol: Car
...
...
Video
Event
Access
Event
Video
Event
Semantic Layer
...
Signal
Channel 1
Signal
Channel 2
Signal
Channel n
Sensor
Interpreter
SI
1
...
Interpretation
Sub-level
....
Reasoning Layer
Sensor
Interpreter
SI
2
Sensor
Interpreter
SI
n
Events
Channel
History
Database
History
Agent
Environment
Views
Directory
Environment
Views
Agent
Event
Dispatcher
Agent
Normality
Analysis
Component
Nc
1
Normality
Analysis
Component
Nc
2
Normality
Analysis
Component
Nc
n
Data Fusion
and
Distribution
Sub-level
Env. Views
Distribution
Channel
Figure 1: The Semantic layer links the Perceptual and Rea-
soning layers.
sub-level is the data extracted from the sensors
and represented in a device-independent common
form.
Data Fusion and Distribution Sub-level. Which
objective is to provide the components in the rea-
soning layer with the environment data they re-
quire and according to the semantics they demand.
That is, to provide each normality analysis com-
ponent with the environment views they request.
3.1 Interpretation Sub-level
Data provided by each sensor on the environment
come to the semantic layer through the signal chan-
nels. Then, components in the interpretation sub-level
are responsible for obtaining, verifying and interpret-
ing such data. These components, named sensor in-
terpreters, may be viewed as agents which are spe-
cialised in interpreting the data provided by some kind
sensor.
SEMANTICS-PROVIDED ENVIRONMENT VIEWS FOR NORMALITY ANALYSIS-BASED INTELLIGENT
SURVEILLANCE
163
First, a sensor interpreter must verify and interpret
the data provided by the sensor it is specialised in.
Such data is highly device-dependent, so we can state
that for any sensor s
i
installed on the environment, s
i
provides strings of data according to some language
which generates them.
That is, sensor s
i
generates strings of the form:
α A
s
i
(5)
which means that strings α are made up of symbols
from some alphabet A
s
i
. Besides, we can state that for
any sensor s
i
, there is always a language L
s
i
associated
to it.
s
i
L
s
i
such that L
s
i
A
s
i
(6)
Moreover, each language L
s
i
is described by
means of a formal grammar G
s
i
, defined as:
G
s
i
= (N
s
i
, T
s
i
, P
s
i
, S
s
i
) (7)
where N
s
i
, T
s
i
, P
s
i
, S
s
i
are the sets of non-terminal sym-
bols, terminal symbols, productions and the start sym-
bol of the grammar G
s
i
, respectively.
Therefore, we can state that a sensor interpreter is
a five elements tuple defined as:
SI
s
i
=< V
s
i
;SR
s
i
;G
s
i
;I
s
i
;SC
s
i
> (8)
where:
V
s
i
= {v
1
s
i
, ..., v
n
s
i
} is the set of variables extracted
from the data provided by sensor s
i
.
SR
s
i
= {sr
1
s
i
, ..., sr
n
s
i
} is the set of semantic rules
defined for the variables extracted from sensor s
i
.
This is a set of constraints which the incoming
data must satisfy in order to determine that the
sensor s
i
works properly. An example of such a
rule might be to check whether some variable’s
value is within some range of definition.
G
s
i
is the grammar that generates the language L
s
i
that describes the strings of data provided by the
sensor s
i
(7).
I
s
i
is a function which transforms the strings α
provided by the sensor s
i
into strings β from a
common intermediate language IL.
I
s
i
: α β (9)
such that α L
s
i
and β IL. Besides, such trans-
formation satisfies
m(α) = m(β) (10)
where m(x) is a function which returns the mean-
ing of x on the semantic domain.
SC
s
i
is the set of signal channels to which the in-
terpreter SI
s
i
is subscribed to. A signal channel
SC
s
i
is the mechanism by which sensor interpreter
SI
s
i
obtains the data provided by sensor s
i
.
Once the interpretation process is finished, sensor
interpreters send the extracted data to the data fusion
and distribution sub-level, through the events chan-
nel. At this point, data is represented according to an
intermediate common language IL to allow the fusion
of data extracted from several kinds of sensors and
the construction of the environment view objects re-
quested by the normality analysis set of components.
3.2 Data Fusion and Distribution
Sub-level
Components in this sub-level are responsible for sup-
plying the normality analysis components in the rea-
soning layer with the environment data they require
and in the form they require it. This lead us to de-
fine the concept of environment view. An environ-
ment view is an object of data consisting of a set of
data variables together with associated semantics as
required by the normality analysis component that re-
quests it. Each normality analysis component must
request an environment view in order to be provided
with the data from the environment it needs.
Let S = {SI
1
, ..., SI
n
} be the set of sensor inter-
preters plugged-in to the system,
SI
i
S, V
SI
i
= {v
1
SI
i
, ..., v
n
SI
i
} (11)
defined V
SI
i
as the set of data variables provided by
sensor interpreter SI
i
.
Let V be the set of data variables provided by all
sensor interpreters in S,
V =
n
[
i=1
V
SI
i
(12)
We define an environment view as a two elements
tuple as follows:
EV
j
=< V
j
;DDV
j
> (13)
where:
V
j
= {v
1
j
, ..., v
n
j
} V refers to the set of variables
included in environment view EV
j
.
DDV
j
= {DDV
1
j
, ..., DDV
n
j
} refers to the set of do-
mains of definition for the variables in V
j
. These
elements specify the proper semantics to their cor-
responding elements in V
j
.
Next, we define how to obtain the variables V
j
and
the form of the elements DDV
i
j
in DDV
j
.
ICAART 2009 - International Conference on Agents and Artificial Intelligence
164
Let V
jSI
i
V
j
be the subset of variables from V
j
provided by sensor interpreter SI
i
, named component
view of EV
j
.
V
j
=
n
[
i=1
V
jSI
i
and V
jSI
i
V
SI
i
(14)
which means that the set of variables V
j
of an environ-
ment view EV
j
is the aggregation of the component
views V
jSI
i
extracted from the sets of variables V
s
i
pro-
vided by each sensor interpreter SI
s
i
. Of course, the
component view V
jSI
i
for a given sensor interpreter
SI
s
i
may be the empty set, which means that the envi-
ronment view EV
j
does not require any data variable
from sensor interpreter SI
s
i
.
Moreover, data provided by sensor interpreters
may be given as crisp or vague values. For exam-
ple, a presence detection sensor interpreter may return
a crisp variable value indicating presence detected,
which may be given as a boolean true value. Also,
a video camera may return some linguistic value indi-
cating that a person moving fast has been identified in
a monitored area. Thus, we need a common data type
which allows us to represent every possible kind of
value provided by any sensor interpreter in S. For that
purpose, we propose the use of trapezoidal functions,
as defined in (15) (J. J. Castro-Schez et al., 2004b).
(u;a, b, c, d) =
0 u < a
(ua)
(ba)
a u < b
1 b u c
(du)
(dc)
c < u d
0 u > d
(15)
The reason for choosing this function is that it is
a suitable way to represent both, categorical and nu-
merical values and in both cases such values may be
given to the user by means of linguistic terms, which
is convenient in order to provide them with informa-
tion expressed in their own terminology. Besides, the
inference rules in the normality analysis components
can be expressed using linguistic variables and fuzzy
theory techniques for dealing with the inherent un-
certainty of the environment data (J. J. Castro-Schez
et al., 2004a).
When designing a new normality analysis compo-
nent, a new environment view must be also specified
indicating its requirements in terms of environment
data and its associated semantics.
The main components in this sub-level are the
event dispatcher agents (Figure 1), which main func-
tions are: (1) collecting the data required by the nor-
mality analysis component; (2) processing such data
according to the requirements of the normality analy-
sis components to be served; and (3) distributing the
data to those normality analysis components that de-
mand it.
The event dispatcher agents are the components
responsible for distributing the sensory data to the
normality analysis components that require it in form
of environment views. This process is as follows:
1. An event dispatcher agent keeps waiting until
some event of data is triggered.
2. When a new environment event is triggered, some
event dispatcher agent collects the incoming data
and asks the environment views agent for the nor-
mality analysis components which requested an
environment view containing the obtained data.
3. The selected items (environment views) may need
to be completed with previously obtained data
from the history database, because the incoming
data may be insufficient to construct a complete
environment view.
4. Once an event dispatcher agent has obtained the
complete chunk of data which make up an envi-
ronment view, it processes it according to DDV
j
in order to provide data with the requested seman-
tics.
5. Finally, the event dispatcher agent transfers the
chunk of data to the normality analysis component
(or components) which requested the environment
view object just constructed.
The environment views are stored in the environ-
ment views directory. This component is managed
by the environment views agent, which is responsi-
ble for both publishing new data in the directory and
retrieve existing data under request of an event dis-
patcher agent. When a new normality analysis com-
ponent is plugged-in to the system, a new environ-
ment view must be defined and published in the en-
vironment views directory. We define the structure
of this component as a collection of entry elements.
An entry element is a two elements tuple composed
by (N
C
i
,EV
j
) elements. The element N
C
i
refers to
the identifier of some normality analysis component,
whereas the EV
j
element refers to a defined environ-
ment view. This way, some environment view may
be associated with several normality analysis compo-
nents when needed.
When the data contained in a new environment
event is insufficient to construct an environment view
object, the event dispatcher agent responsible for at-
tending that event must go somewhere else to obtain
the missing data. Such place is the history database.
This component is managed by the history agent
which responsibilities are both to store data incom-
ing from the event channel in the history database
SEMANTICS-PROVIDED ENVIRONMENT VIEWS FOR NORMALITY ANALYSIS-BASED INTELLIGENT
SURVEILLANCE
165
and to retrieve existing data from the database un-
der request of an event dispatcher agent. The his-
tory database is composed of a collection of entry el-
ements. Each entry consists in a three elements tuple
(SI
i
, V
s
i
, Moment), such that SI
i
is the identifier of the
sensor interpreter which provided the data, V
s
i
is the
set of variables extracted from sensor interpreter SI
i
,
and Moment is the time and date when such informa-
tion was retrieved by the system.
Once an event dispatcher agent has completed the
chunk of data that comprises an environment view, it
sends a message containing such data through the en-
vironment views distribution channel. This channel
supports a broadcast-like communication mode, such
that every normality analysis component must sub-
scribe to it and they can just receive those messages
that are sent to them. This approach has as its main
advantage the fact that only one communication point
is needed to be known by both, the event dispatcher
agents and the normality analysis components.
4 CONCLUSIONS
The trend in the design and development of intelligent
surveillance systems is to use not only the visual in-
formation provided by a set of video cameras, but also
to use other kinds of sensors to allow the system to
handle a more accurate knowledge of the monitored
environment. In this respect, the fusion of sensory
data plays an essential role as different sensors pro-
vide data in a variety of different forms. Besides, the
intelligent surveillance based on normality analysis
requires that the same sensory data can is used in dif-
ferent analysis contexts with different semantics. We
have presented an architectural layer to fuse and pro-
vide with semantics sensory data according to the par-
ticular requirements of the normality analysis compo-
nents which require it. We have defined the concept of
environment view as an object that contains the data
requested for a normality analysis component. Be-
sides, the proposed architecture allows for the easy
scalability in terms of both, the sensors installed in
the environment and the normality analysis compo-
nents plugged-in to the system. Adding a new kind
of sensor entails designing a new sensor interpreter
capable of interpreting the data sent by such kind of
sensor and leaving it available to the rest of compo-
nents of the system. On the other hand, adding a new
normality analysis component entails to define a new
environment view, according to the requirements of
data and semantics of the newly added component.
ACKNOWLEDGEMENTS
This work has been founded by the Regional Gov-
ernment of Castilla-La Mancha under Research
Project e-PACTOS (ref. PAC-06-141), Sarasvati (ref.
PBC06-0064) and the Spanish Ministry of Education
and Science under the Research Project TIN2007-
62568.
REFERENCES
D. L. Hall and J. Llinas (1997). An introduction to multisen-
sor data fusion. Proceedings of the IEEE, 85(1):6–23.
D. Smith and S. Singh (2006). Approaches to Multisen-
sor Data Fusion in Target Tracking: A Survey. IEEE
Transactions on Knowledge and Data Engineering,
18(12):1696–1710.
H. B. Mitchell (2007). Multi-Sensor Data Fusion: An In-
troduction. Springer-Verlag.
J. Albusac, D. Vallejo, L. Jimenez, J. J. Castro-Schez, and
L. Rodriguez (2008). Intelligent Surveillance based
on Normality Analysis to Detect Abnormal Behaviors.
Submitted to International Journal of Pattern Recog-
nition and Artificial Intelligence.
J. J. Castro-Schez, J. L. Castro, and J. M. Zurita (2004a).
Fuzzy repertory table: a method for acquiring knowl-
edge about input variables to machine learning al-
gorithm. IEEE Transactions on Fuzzy Systems,
12(1):123–139.
J. J. Castro-Schez, N. R. Jennings, X. Luo, and N. Shadbolt
(2004b). Acquiring domain knowledge for negotiat-
ing agents: a case of study. International Journal of
Human-Computer Studies, 61(1):3–31.
L. Rodriguez-Benitez, J. Moreno-Garcia, and J. J. Castro-
Schez (2008). Automatic Object Behaviour Recogni-
tion from Compressed Video Domain. Image Vision
and Computing. doi: 10.1016/j.imavis.2008.07.002.
M. Valera and S. A. Velastin (2005). Intelligent Distributed
Surveillance Systems: A Review. IEEE Proceedings
on Vision, Image, and Signal Processing, 152(2):192–
204.
P. Remagnino, A. I. Shihab, and G. A. Jones (2004). Dis-
tributed intelligence for multi-camera visual surveil-
lance. Pattern Recognition, 37(4):675–689.
R. C. Luo, C. Yih, and K. L. Su (2002). Multisensor Fu-
sion and Integration: Approaches, Applications, and
Future Research Directions. IEEE Sensors Journal,
2(2):107–119.
R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Dug-
gins, Y. Tsin, D. Tolliver, N. Enomoto, O. Hasegawa,
P. Burt, and L. Wixson (2000). A System for Video
Surveillance and Monitoring. Technical report, The
Robotics Institute, Carnegie Mellon University.
W. Hu, T. Tan, and L. Wang (2004). A survey on visual
surveillance of object motion and behaviors. IEEE
Transactions on Systems, Man, and Cybernetics, Part
C: Applications and Reviews, 34(3):334–352.
ICAART 2009 - International Conference on Agents and Artificial Intelligence
166