or blocks which are a sequence of images associated
to a particular attribute. In such a block, the con-
cerned attribute verifies a particular property, for in-
stance it belongs to a particular interval. Thus, before
video processing, a set of block models is defined. It
can be seen as a visual vocabulary where each block
is a space-time word.
3.1 Definition of Basic Block Models
A block model is defined through an attribute associ-
ated to a property. The property can be a value (the at-
tribute has a specific value) or an interval (the attribute
is within an interval - the most frequent situation). As
a consequence, for a given attribute, there are several
models of blocks corresponding to different values or
intervals. For instance, we define the compactness
c ∈ [0, 1] of an object as c =
minimum(width,height)
maximum(width,height)
. For
this attribute, we propose to define three blocks mod-
els: low compactness (0 to 0.4), average compactness
(between 0.4 and 0.65) and high compactness (be-
tween 0.6 and 1). Until now, the different ranges or
values and the number of blocks are defined by exper-
tise. At the end of this pre-processing phase, we ob-
tain a database containing all the models of the basic
block models. These models are relatively generic for
different types of application, because they are only
linked to the attributes and they have no particular se-
mantic meaning. At the end of this modeling phase,
we obtain a database containing all the basic block
models. With the 40 attributes, 120 basic block mod-
els are defined. This number has to be compared to
the size of visual word vocabulary used in static im-
age indexation, which is typically equal to a few thou-
sands.
The characteristics of all these models are stored in
a database, that is: name, model type (value or in-
terval), level (sequence, image or object) and finally
the attribute values related to this block. Then, this
structure can dynamically create queries to detect the
occurrence of blocks in different indexed sequences.
3.2 Extraction of Basic Blocks
The extraction of the spatio-temporal blocks consists
in looking for occurrences of model blocks within the
videos. In other words, it consists in founding images
or block of images within which an attribute verifies a
model property. A block is defined by a model com-
posed of an attribute and a property, an initial and a
final image. For one model, a lot of basic blocks can
be extracted. Conversely, an image or block of im-
ages can belong to different models. If the attribute
of a block model corresponds to a moving object, an
extracted basic block will correspond to a sequence of
images containing the object and verifying the prop-
erty. As all information is stored in databases and are
classified by sequence, by block and increasing num-
ber of frames, the block extraction is realized using a
query mechanism. To eliminate too small blocks and
merge too close blocks, a morphological filtering is
performed.
4 CONCEPT BLOCK MODEL
The indexation process is generally realized in a
manual way because it refers to index with a high se-
mantic level (denoted as concepts). For instance, a
concept can be the running action of a personage. It is
difficult to make this indexing process automatic, be-
cause it needs to link the low level attributes extracted
from the video to the concepts that have a semantic
interpretation. We assume that these links must be
defined only with the help of the user. But before
explaining how we propose to build these links, we
define the concept block models.
4.1 Notion of Concept Block Models
A concept block model has a high-level semantic
meaning which fits the user need in terms of index-
ation task. In this paper, we propose to build these
models by combining models previously defined: ba-
sic block models or other concept block models.
The construction of such block models is per-
formed by using simples combination rules: simul-
taneity of blocks (logical AND operator), presence of
at least one block (logical OR operator), presence of
only one block among two blocks, (logical XOR oper-
ator), succession of blocks (sequentiality), and alter-
nation of blocks which is composed of several succes-
sion of blocks (periodicity). Initially, these operators
are sufficient to define a number of concepts. There-
after, it will be interesting to establish additional op-
erators.
4.2 Learning Concept Block Models
The problem is to build relevant links between blocks.
In such a situation, a classical approach consists in us-
ing a neural network or a supervised classification as-
sociated to a set of learning data corresponding to al-
ready indexing data (Burgener, 2006). But this man-
ual indexing task, realized by users, is very tedious
and very long. Another classical approach is to use
the knowledge of an expert who explicitly defines the
searched links. But in this case, he must be an expert
SPATIO-TEMPORAL BLOCK MODEL FOR VIDEO INDEXATION ASSISTANCE
477