
recognition of the 1st-level scenes can be put in the
base of the method of s-layer scenes
12
11 1
...
v
ss s
recognition as
follows.
Step 1. Each object
r
lll
,...,,
21
that is
included at least in one 1st-level scene
1
is
recognized by separate comparison with reference
objects
1
2
12
, ,...,
ll
l
r
r
kk
k
ll l
1,..., ,
rr
ll
kK
12
{ , ,..., } {1,..., }
r
ll l L
using fusion
operators
12
, ,..., .
r
ll l
AA
Step 2. Each 1st-level scene
1
for all objects
of which are found similar reference objects is
considered recognized and a similarity criterion is
extracted for it (the value of the fusion operator)
1
A
. After this go to step 3. If there was found no
such scenes then there’s no recognized scenes of the
first level and higher and the execution is stopped.
Step 3. The value of level is set to 2 and we go to
step 4.
Step 4. If there were found s-level scenes
for
all (s-1)-level scenes of which had been found
nonzero values of the similarity criterion then these
scenes
are considered recognized and similarity
criteria (fusion operator values)
A
are calculated
for them. If there are any (s+1)-level scenes then
step 4 is executed once again with the value s=s+1.
Else the execution is stopped.
If there were found no s-level scenes
for all
(s-1)-level scenes of which had been found nonzero
values of the similarity criterion then there are no
recognized s-level scenes and the execution is
stopped.
5 CONCLUSIONS
There is considered a developed method that uses
fuzzy fusion operators with fuzzy measure for data
fusion in multimodal interface. The main advantages
of the method from the well-known analogs are the
following:
The ability of hierarchic multimodal recognition of
scenes that consist of static and dynamic (moving)
objects.
The ability to consider the measure of importance
of each modality during the process of hierarchic
scene recognition due to the use of fusion
operators that use a fuzzy measure.
The ability to be the base for developing of control
systems for different objects with the help of
dynamic patterns (robots, computers, TV-sets etc.).
The ability to increase the reliability of recognition
of separate objects (for example, a human being)
in the scene using relations between these objects
and other objects of the scene (background
objects).
Promising opportunities for development of
intellectual and intuitive human-machine interfaces
by using more modalities and relations.
REFERENCES
Alatan, A. A. Automatic multimodal dialogue scene
indexing // Proc. of image processing.- 2001.- Vol. 3.-
P. 374-377.
Averkin, A. A., Batyrshin, I.Z., Blishun, A.F. Fuzzy sets
in control models and artificial intelligence systems //
Moscow . Book, Nauka, 1986.
Devyatkov V., Alfimtsev A. Optimal Fuzzy Aggregation
of Secondary Attributes in Recognition Problems //
Proc. of 16-th Int. Conf. in Central Europe on Comp.
Graphics, Visual. and Computer Vision.-Plzen, 2008.-
P. 78–85.
Grabisch M., Roubens M. Application of the Choquet
Integral in Multicriteria Decision Making. In Fuzzy
Measures and Integrals- Theory and Applications,
Physica Verlag, 2000, pp. 415-434.
Higuchi M. and oth. Scene Recognition Based on
Relationship between Human Actions and Objects //
17th Int. Conf. on Pattern Recog.-2004.- Vol. 3.- P.
73-78.
Liu F., Lin X. Multimodal face tracking using Bayesian
network // IEEE Internat. Workshop on Analysis and
Modeling of Faces and Gestures.- Nice, 2003.-P. 135-
142.
Marichal J. On Choquet and Sugeno Integrals as
Aggregation Functions // In Fuzzy Measures and
Integrals.-2000.-Vol. 40.-P. 247-272.
Ronshin A. L., Karpov A.A., Li I.V. Speech and
multimodal interface // Moscow. Book, Nauka, 2006.
Sharma R. Speech-Gesture Driven Multimodal Interfaces
for Crisis Management// The IEEE Proc.-2003.-Vol.
91, №9.- P. 1327–1354.
Wang X., Chen J. Multiple Neural Networks fusion model
based on Choquet fuzzy integral // Proc. of the Third
Intern. Conf. on Mach. Learn. and Cybern.- Vol.4.-P.
2024-2027.
BIODEVICES2013-InternationalConferenceonBiomedicalElectronicsandDevices
310