spatial information. A method to evaluate this ITD
delay over time is that of computing a running short-
time cross correlation between signals collected at the
ears (Knapp and Carter (1976)). In this method, if sig-
nals are previously filtered, we can successfully take
into account the actual spectrum of the incoming sig-
nal in order to reduce at some extent the effect of
noise and reverberation. However the failure rate of
the identification methods based on the Generalized
Cross Correlation, when used at practical values of
SNR results relatively high, thus preventing in many
cases real time applications.
Another method, proposed in (Viste, 2004) and
(Evangelista and Viste, 2004) is based on joint evalu-
ation of the ITD and ILD cues obtained by means of a
running Short Time Fourier Transform (STFT); here
the ILDs are used in order to resolve the ITD ambigu-
ity due, as is well known, to phase ambiguity: in this
method, gross identification is performed by means
of ILD, while ITD evaluation is used to improve az-
imuth estimate. This method, since it performs the
evaluation at each frequency bin, is suitable to take
into account bin per bin weights as in our proposal.
A third class consists in methods based on a very
strict sensory fusion as in Hokuno (see (Nakadai,
Okuno and Kitano 2001)). These also may take ad-
vantage of an improved calculation based on acousti-
cal cues, as in our proposal.
The last class to be taken into account is based
on the use of a neural network trained with cues ex-
tracted from binaural signals as discussed e.g. in (Irie
(1995)); even in this case the proposed method of
weighting cues may be invaluable.
Our proposed implementation may thus be em-
bodied in existent architectures, improving their per-
formance and robustness.
-20
-80
20
80
-20
-60
60
-80
20
80
-20
-60
60
Figure 1: HRIR measured for both left and right sides.
1.2 Contribution
The method here exposed belongs to the second class
of methods summarized in the previous section; his
starting point is (as in (Viste, 2004) and (Evangelista
and Viste, 2004)) the measure of the set of HRTFs
on a grid of positions corresponding to several dif-
ferent azimuth angles (in fig.1 we show their time
domain counterpart: the Head Related Impulse Re-
sponse HRIR); from these transfer functions we ob-
tain the ILD
h
(θ, ω) and the IPD
h
(θ, ω), functions of
azimuth θ and frequency ω, evaluated at discrete an-
gular positions and frequencies.
The IPD
m
(ω
k
) and ILD
m
(ω
k
) measured
1
time by
time from an observed signal pair at the robot ears us-
ing Short Time Fourier Transform (STFT), are com-
pared against the HRTF data set in order to obtain an
estimate of the source azimuth: the position yielding
minimum deviation from the stored table is selected
as true azimuth. This is referred to as HRTF Data
Lookup: azimuth estimates by use of this techniques
are based on ILD and IPD separately.
In our proposal estimate of the angle is obtained
by minimizing the error, properly weighted in fre-
quency (as shown in section 3), made by comparing
these cues with the ILD
h
(θ, ω) and IPD
h
(θ, ω) func-
tions stored in the Data Lookup Table; the effect of
weighting is clearly seen in fig.2 where the straight
deviation from a curve in the table at a specific az-
imuth is shown in the upper part of the figure. In
the lower part the same deviation is weighted by an
evaluation of the signal spectrum: only the bins in
the spectrum where signals is larger and than exhibit
higher SNR are taken into account.
The signal used in the figure shows narrowband
contents limited to a small part of the frequency axis.
The comparison is made all over the frequency
axis or, more successfully, in selected frequency
bands; in fact on the ground of Duplex Theory
((Blauert (1997))) the ITD and ILD cues are sig-
nificant in different and complementary frequency
ranges, mainly low range for ITD and high range for
ILD.
The position in the azimuth grid corresponding to
a minimum in the weighted error function is chosen
as the best estimate of sound source position,(see fig-
ure 3, where the successful estimate in weighted case
is -24, while unweighted method shows a clear error,
estimating -6 as azimuth sound position). Also, based
on the evaluation on the whole grid, a measure of the
reliability of the measurement may be given in form
of error ratio between neighbor azimuth positions.
1
The subscript h refers to the HRTF cue while subscript
m refers to the measured cue
SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications
170