to time evolution. Detection of the position and
shape of the mouth, eyes, particularly eyelids, wrin-
kles and extraction of features related to them are the
targets of techniques applied to still images of hu-
mans. It has, however, been shown (Bassili, 1979),
that facial expressions can be more accurately rec-
ognized from image sequences, than from a single
still image. His experiments used point-light condi-
tions, i.e. subjects viewed image sequences in which
only white dots on a darkened surface of the face
were visible. Expressions were recognized at above
chance levels when based on image sequences,
whereas only happiness and sadness were recog-
nized at above chance levels when based on still
images. Techniques which attempt to identify facial
gestures for emotional expression characterization
face the problems of locating or extracting the facial
regions or features, computing the spatio-temporal
motion of the face through optical flow estimation,
and introducing geometric or physical muscle mod-
els describing the facial structure or gestures.
In general, facial expressions and emotions are
described by a set of measurements and transforma-
tions that can be considered atomic with respect to
the MPEG-4 standard; in this way, one can describe
both the anatomy of a human face –basically
through FDPs, as well as animation parameters, with
groups of distinct tokens, eliminating the need for
specifying the topology of the underlying geometry.
These tokens can then be mapped to automatically
detected measurements and indications of motion on
a video sequence and, thus, help to approximate a
real expression conveyed by the subject by means of
a synthetic one.
5 GESTURES AND POSTURES
The detection and interpretation of hand gestures has
become an important part of human computer inter-
action (MMI) in recent years (Wu & Huang, 2001).
Sometimes, a simple hand action, such as placing
one’s hands over their ears, can pass on the message
that he has had enough of what he is hearing; this is
conveyed more expressively than with any other
spoken phrase. To benefit from the use of gestures in
MMI it is necessary to provide the means by which
they can be interpreted by computers. The MMI in-
terpretation of gestures requires that dynamic and/or
static configurations of the human hand, arm, and
even other parts of the human body, be measurable
by the machine. First attempts to address this prob-
lem resulted in mechanical devices that directly
measure hand and/or arm joint angles and spatial
position. The so-called glove-based devices best
represent this solutions’ group.
Human hand motion is highly articulate, because
the hand consists of many connected parts that lead
to complex kinematics. At the same time, hand mo-
tion is also highly constrained, which makes it diffi-
cult to model. Usually, the hand can be modeled in
several aspects such as shape (Kuch & Huang,
1995), kinematical structure (Lin, Wu & Huang,
200), dynamics (Quek, 1996), (Wilson & Bobick,
1998) and semantics.
Gesture analysis research follows two different
approaches that work in parallel. The first approach
treats a hand gesture as a two- or three dimensional
signal that is communicated via hand movement
from the part of the user; as a result, the whole
analysis process merely tries to locate and track that
movement, so as to recreate it on an avatar or trans-
late it to specific, predefined input interface, e.g.
raising hands to draw attention or indicate presence
in a virtual classroom.
The low level results of the approach can be ex-
tended, taking into account that hand gestures are a
powerful expressive means. The expected result is to
understand gestural interaction as a higher-level fea-
ture and encapsulate it into an original modal, com-
plementing speech and image analysis in an affec-
tive MMI system (Wexelblat, 1995). This transfor-
mation of a gesture from a time-varying signal into a
symbolic level helps overcome problems such as the
proliferation of available gesture representations or
failure to notice common features in them. In gen-
eral, one can classify hand movements with respect
to their function as:
• Semiotic: these gestures are used to communi-
cate meaningful information or indications
• Ergotic: manipulative gestures that are usually
associated with a particular instrument or job
and
• Epistemic: again related to specific objects, but
also to the reception of tactile feedback.
Semiotic hand gestures are considered to be con-
nected, or even complementary, to speech in order to
convey a concept or emotion. Especially two major
subcategories, namely deictic gestures and beats, i.e.
gestures that consist of two discrete phases, are usu-
ally semantically related to the spoken content and
used to emphasize or clarify it. This relation is also
taken into account in (Kendon, 1988) and provides a
positioning of gestures along a continuous space.
EMOTION SYNTHESIS IN VIRTUAL ENVIRONMENTS
47