possible use of vocal-affect biometrics, and the areas
where this technology could be deployed. Please
note that the words ‘affect’ and ‘emotion’ will be
used throughout this paper interchangeably.
2 AFFECT RECOGNITION
BIOMETRICS
Though ‘affect recognition’ systems such as the one
used for surveillance, the ‘HAL 9000’ computer
from the movie “2001: A Space Odyssey” (“I’m
afraid, Dave…”) or personal domestic- assistant
robots in “ I, Robot” that have good senses of their
masters’ emotional states, sound like science fiction,
there is an interest in making them a reality
(Bullington, 2005). For example, a recent proposal
invitation from DARPA’s Small Business
Innovation Research Center calls for the
development of a “non invasive emotion recognition
system…suitable for deployment in
military/operational environments or in
environments in which discrete observation of
potential enemy threats is desired” (DARPA, 2003).
In addition, SRI lists ‘Affective Computing’ as one
of their ‘Next Generation Technologies:’ “Affective-
computing technology will reduce the intrusiveness
of human-machine interface technology and perhaps
make the technology more acceptable to people
because of its more natural interactions and its
seamless presence in the environment.” (SRI-BI,
N/A). Machine learning in recognition and
adaptation to a human’s affective state is important
for natural human-computer interaction, but in
biometrics, identifying a man’s affective state could
be aimed at determining a person’s psycho-
physiological state. So far, very little studies are
found in emotion recognition for the purpose of
biometric identifications. And the existing ones
focus on facial emotion recognition biometrics. For
example, Gray (Gray, 2003) has proposed a
surveillance system that attempts to read a person’s
involuntary facial muscle-changes or what is termed
as “microexpressions”, that corresponds to his or
her emotional state. Nevertheless, building an affect
recognition system particularly based on voice for
identification and possible intervention has almost
never been attempted (Bullington, 2005).
2.1 Voice Affect Biometrics
In general, biometric analysis of speech aims at
identification of a person. However, analysis on the
emotion conveyed in the speech could reveal the
presence of a particular psychophysiological state of
the person via extralinguistic information (Ronzhin
et al., 2004, Huang, 2001). Simply put, voice-affect
identification could seek to predict an individual’s
‘state of mind’ and reach judgments about his or her
emotional states, impairments or behavioural
intentions (i.e.: criminal intent) (SRI-BI, N/A)),
despite the impostor deliberately trying to deceive
the system, in some cases. Naturally, such a system
will also recommend a possible course of prevention
(Bullington, 2005). Physiological-based biometrics
is constrained in terms of time, cost and detection of
certain emotional ‘colouring’ such as boredom or
fatigue. Additionally, these systems are dependent
on human observers, invasive and impose ethical
issues to some degree which make them unpopular
(Bullington, 2005, Ronzhin et al., 2004). In contrary,
voice-affect recognition systems are ideal and easy
to deploy as they are individual oriented system, that
does need human observers such that in facial-based
recognition system. It uses machine learning
algorithms to analyze and learn about the distinctive
patterns of emotional states responding of individual
users. Therefore, speech-based affect recognition is
more natural, contact-free and offers high processing
speed (Ronzhin et al., 2004). Though this kind of
technology is unsuitable to be used in a crowd
setting (i.e.: surveillance) on its own, it is ideal to be
deployed in smaller scoped, specific high risk
environment, where operator error could lead to
serious problems such as injuries or fatalities. An
example is the transportation industry or nuclear
power plant. Apart from that, employment agencies
or offices that recruit people for high-level security
jobs such that in financial industry, military etc. can
take advantage of this technology. In what follows,
three possible areas for the implementation for
voice-affect recognition technology are discussed:
1. Recruitment: Emotion detection in voice can
estimate the psychophysiological state that leads
to the determination of the psychological
compatibility and the readiness of a candidate to
accept a high security or stressful jobs. Industries
such as nuclear, aerospace, transportation,
financial etc. require workers that are fit for duty
in these kinds of work nature. They may be able
to determine when employees are not in the right
state of mind to complete their tasks or determine
the optimal conditions and most productive
situations for each individual. Rohnzin et al
(Ronzhin et al., 2004) proposed that the system
takes into consideration the lexical and
grammatical accuracy, apart from phrase
understandability of a conversation during
EXPRESSIVE SPEECH IDENTIFICATIONS BASED ON HIDDEN MARKOV MODEL
489