norm is calculated as the integration of the ratio be-
tween the signal energy and the masking threshold
in the frequency domain. It defines a inner product
which facilitates the selection of the best matching
dictionary element in a perceptual point of view.
However, PAMP does not define a psychoacoustic
stopping criterion. The inner product does not offer
any information about if a selected tone is audible or
not. Furthermore, this problem can find a worse sce-
nario, as stated in (R. Heusdens and Kleijn, 2002):
PAMP can select noisy energy as a tone when zero-
mean gaussian noise is present in the signal.
In our implementation, the main idea consists in
modifying standard matching pursuit so as to mini-
mize the perceptual distortion measure of the residue
at each iteration of the pursuit,
g
m(i)
= arg min
g
m
∈D
PDM(r
i
) (8)
The optimum atom according to equation (8) can
be computed applying the orthogonality property of
matching pursuits. Following a matching pursuit ap-
proach, the atoms are weighted by the coefficients α
i
m
obtained from the correlation as is indicated in equa-
tion (3). Therefore, the residue at the i-th iteration can
be decomposed as r
i−1
= r
i
+ α
i
m
g
m
, fulfilling that r
i
and α
i
m
g
m
are orthogonals. Due to this orthogonality
property, the perceptual distortion measure of the r
i−1
residue in a matching pursuit approach can be written
as
PDM(r
i−1
) = PDM(r
i
) + PDM(α
i
m
g
m
) (9)
As a consequence, the minimization of the perceptual
distortion measure of the r
i
residue is the same than
maximizing the perceptual distortion measure of the
weighted atoms α
i
m
g
m
. Note that in a matching pur-
suits approach, the perceptual distortion PDM(r
i−1
)
is a constant at the i-iteration. Standard matching pur-
suit algorithm chooses at the i-th iteration the most
correlated atom with the residual r
i−1
in order to min-
imize residual energy. Expression (9) allows us to
state that choosing the weighted atom with the highest
perceptual distortion measure as the optimum atom,
the perceptual distortion measure of the r
i
residue at
the i-th iteration is minimized. Perceptual matching
pursuit computes the perceptual distortion measure
associated to each weighted atom and selects the atom
with the highest measure as the optimum atom at the
i-th iteration,
g
m(i)
= arg max
g
m
∈D
PDM(α
i
m
g
m
) (10)
Distortion signals to be measured at each iteration of
perceptual matching pursuit are the weighted atoms
α
i
m
g
m
. The perceptual distortion measure of weighted
dictionary elements at the i-th iteration is expressed
as,
PDM(α
i
m
g
m
) = C
s
N
∑
b
||
α
i
m
g
m
b
||
2
||x
b
||
2
+C
a
(11)
The psychoacoustic stopping criterion can be di-
rectly defined in our approach. The pursuit should be
halted at the iteration in which all perceptual distor-
tions are below one. Under this condition, all remain-
ing tones are assured to be inaudible. This condition
can be expressed as:
PDM(α
i
m
g
m
) ≤ 1, ∀g
m
∈ D (12)
The overcomplete dictionary D = {g
m
[n]} to be
considered for sinusoidal modelling is composed of
unit-norm complex exponentials.
4 RESULTS
First, we intend to illustrate the advantages of using
PDM(α
i
m
g
m
) based on a perceptual distortion mea-
sure against the inner products |α
i
m
|
PAMP
defined in
(R. Heusdens and Kleijn, 2002).
Figure 2 shows the inner products |α
1
m
|
PAMP
and
the perceptual distortion measures PDM(α
1
m
g
m
), both
at the initial iteration, for a windowed input signal
composed of one tone plus noise. The tone power
is 19 dB above the density level of noise, the tone fre-
quency is 500Hz and the overcomplete dictionary is
composed of M = 4096 complex exponentials.
In this case, the PAMP approach does not select
the tone correctly because medium frequency noise
achieves more perceptual significance than the tone
itself. As can be seen in the same figure, the PDM
approach performs a right tonal extraction.
The proposed psychoacoustic stopping criterion
performs correctly in this case, because after the first
iteration all perceptual distortions are below 0 dB.
The better performance of our approach also hap-
pens when noise is added to a voicedspeech fragment.
Figure 3 illustrates the performance of the PAMP
approach when zero-mean white Gaussian noise is
added to a 23-ms voiced speech fragment, being 0
dB the signal-to-noise ratio. The magnitude of all ex-
tracted tones at each iteration is drawn in circles. As
can be seen, the maximum value of the perceptual in-
ner products is just below 2 KHz, which corresponds
to the most perceptually important sinusoid. This si-
nusoid is the first one to be extracted, giving rise to
the plots in Figure 3(b). The left hand plot in Fig-
ure 3(b) also shows the magnitude and frequency of
the extracted tone at the first iteration. It can be ob-
served on the right hand plot in Figure 3(b) that per-
ceptual distortion fall in the frequency region of the
MATCHING PURSUITS BASED ON PERCEPTUAL DISTORTION MINIMIZATION FOR SINUSOIDAL AUDIO
MODELLING
99