2005) or jointly with the vocal tract (Fu and Mur-
phy, 2006).
• Pitch-asynchronous approaches: These ap-
proaches do not necessarily require either
detection of specific time instants or pitch-period
calculation, though the performance of some
of them may be increased including pitch syn-
chronism. The most known scheme in this
group is the Iterative Adaptive Inverse Filtering
(IAIF) algorithm (Alku, 1992). It assumes a
two-pole model for the glottal source and uses
such assumption to refine the all-pole vocal tract
estimation. A similar approach based on lattice
filters has been reported in (G´omez-Vilda et al.,
2008). An alternative asynchronous approach
consists in making use of the deconvolution ca-
pabilities of cepstrum for discriminating between
glottal source and vocal tract plus radiation. Such
approach was firstly proposed in (Oppenheim and
Schafer, 1968) and was refined afterwards with
the addition of pole-zero modelling (Kopec et al.,
1977). Recently, its use for the estimation of
vocal tract resonances has been reported (Rahman
and Shimamura, 2005).
The interest of the glottal source waveform for the
assessment of laryngeal pathologies comes from the
close expected relationship between laryngeal func-
tion and the glottal waveform itself. Some results on
this application have been reported, for instance, in
(de Oliveira-Rosa et al., 2000), (G´omez-Vilda et al.,
2007) and (G´omez-Vilda et al., 2008). However, the
application of glottal inverse filtering techniques to
pathological voices has a number of difficulties that
should not be disregarded. In the first place, patholog-
ical voices may not have a clear harmonic or quasi-
harmonic structure (see type 3 voice segments in
chap. 4 of (Sapienza and Hoffman-Ruddy, 2009)) and
some pathologies may prevent complete glottal clo-
sure (Sapienza and Hoffman-Ruddy, 2009)(chap. 5).
Therefore, the implementation of pitch-synchronous
approaches may be problematic in such cases. In the
second place, assumptions about the spectral enve-
lope of the glottal waveform (e.g. a 12 dB/oct decay
(Walker and Murphy, 2007)) that are inherent to some
approaches, for instance IAIF, may not be valid for
pathological voices. In addition, other not yet solved
issues of inverse filtering, no matter its application,
have to be considered too. One of the most remark-
able of such issues is the evaluation of the inverse fil-
tering algorithms themselves. Although a set of ob-
jective measures for this evaluation has been proposed
(Moore and Torres, 2008), these rely on the expected
characteristics of the glottal source waveform, not on
the measured characteristics, as the glottal source is
commonly unknown. One way to solve that problem
is the usage of synthetic voices for the assessment of
the algorithms (Walker and Murphy, 2007), but the
validity of this approach depends on the realism of
the used voice synthesisers.
In the previously described context, this article
reports on the evaluation of two inverse filtering ap-
proaches for pathological voice signal analysis. Due
to the above-mentioned potential characteristics of
pathological voices, pitch-asynchronous approaches
have been preferred. Among these, the performance
of IAIF (Alku, 1992) has been compared to that of
a variant of the homomorphic prediction (HoP) pro-
posed in (Kopec et al., 1977). The performance has
been evaluated using synthetic voice signals produced
with a physical voice model (Kob et al., 1999) (Kob,
2002a). The usage of synthetic voices has allowed
an objective and quantitative performance evaluation
that has been carried out both in temporal and spec-
tral domains. The rest of the paper is organised as fol-
lows: in section 2 a description of the voice simulator
and the voices produced with it is provided, section 3
contains a description of the analysed inverse filtering
algorithms, section 4 presents the results of applying
these algorithms to the synthetic voices and, last, sec-
tion 5 is dedicated to the conclusions.
2 SIMULATED VOICE SIGNALS
2.1 Simulation Model
The materials used for the herein reported experi-
ments have been synthetic voice signals generated
with the VOX simulator (Kob, 2002b). An overview
of the simulation model can be found in (Kob et al.,
1999) and a more thorough description in (Kob,
2002a). As far as this paper is affected, the simulation
model consists roughly of two blocks: glottis model
and vocal tract model. The glottis model is formed by
a set of vocalis-mucosa pairs connected among them
and with the larynx walls by means of springs of tun-
able stiffness. Within each pair, both the vocalis and
the mucosa are represented by one mass each, the mu-
cosa above the vocalis, and connected between them
also by a spring. For the work reported in this pa-
per, each vocal fold has been modelled by a series of
15 vocalis-mucosa pairs. Two types of glottis have
been simulated: a normal glottis with the vocal folds
having straight edges and uniform stiffness and mass
distribution and a glottis with one pair of nodules sim-
ulated by a localised concentration of mass and irreg-
ular vocal-fold edges. The specific form of the vo-
cal folds has been chosen so as to mimic the move-
BIOSIGNALS 2010 - International Conference on Bio-inspired Systems and Signal Processing
46