ing algorithms are optimized; such signals are rare in
natural environments. The signals, and the interfer-
ence, of most relevance to humans are transients and
speech, which are rapidly time-varying, have consid-
erable harmonic structure, and are relatively sparse
in time-frequency. For example, even continuous
speech has many short intervals of silence, and speech
has formants, at many times strong harmonic struc-
ture (voiced speech), and other distinct and relatively
sparse structure in frequency as well. In the short-
time-frequency domain, the average number of inter-
ferers in any time-frequency bin is much less than
the number of sources. Thus, while beamforming
theory shows that only fewer interferers than sen-
sors can be cancelled for narrowband frequency or
broadband noise sources, with frequency decomposi-
tion and rapid adaptivity, the inherent time-frequency
sparseness can be exploited to cancel most of several
“simultaneous” interferers.
With this biologically inspired insight, new ap-
proaches better matched to implementation with cur-
rent DSP hardware can be derived that still demon-
strate performance approaching that of the more bi-
ologically faithful algorithm of Liu et al. Time-
frequency decomposition to expose the sparsity of the
sources and interferers, and rapid adaptation to take
advantage of it, are the key elements that allow a
binaural system to overcome multiple interferers in a
cocktail-partyenvironment. We havedevelopeda par-
ticular frequency-domain MVDR beamformer imple-
mentation (FMV) that provides similar interference
rejection and is easily implementable in a low-power,
fixed-point, real-time DSP system such as a digital
hearing aid (Lockwood et al., 2003). Like the L/E al-
gorithm described earlier, the algorithm begins with
overlapped short-time FFTs of the individual input
channels, and subsequently processes each channel
independently. This exposes the time-frequency spar-
sity of the interference. This transformation produces
the added advantage that the beamformers in each
frequency bin are scalar. Running short-time cross-
correlation matrices are computed at each frequency
via an efficient recursive update. In most frequency-
domain MVDR implementations, the GSC algorithm
is used to slowly adapt the beamformer due to the
O(N
3
) complexity and stability challenges of the ma-
trix inverse. However, for a binaural beamformer, im-
plemented in the frequency domain, N = 2 in each in-
dependent channel, and direct solution for the optimal
Capon weights according to (1) requires only a few
operations after algebraic simplification. We also ap-
ply a multiplicative (energy-normalized) regulariza-
tion to provide some robustness to the short-time cor-
relation estimates (Cox et al., 1987). Just as in the
first algorithm, the optimal beamforming weights are
applied at each frequency and the extracted signal of
interest is recovered via an inverse FFT.
Figure 3 shows the performance in terms of SNR
gain of a 15 cm two-element free-field array in an
anechoic environment with one through four interfer-
ers. The initial SNR for the desired source was about
-3 dB, representing a challenging cocktail-party sit-
uation at about the lower threshhold at which peo-
ple with normal hearing can follow conversational
speech. Each of these conditions summarizes many
runs with at least four different configurations of po-
sitions of the interferers (the target was always po-
sitioned at broadside, or perpendicular to the line of
the array), and at least eight combinations of differ-
ent male and/or female talkers for each configuration.
For comparison, the performance of our best imple-
mentation of the conventional GSC beamformer is
also shown. As is clear from the figure, the perfor-
Figure 3: SNR gains for one, two, three, and four simulta-
neous speech interferers of the FMV (dark) and GSC (light)
adaptive beamformers.
mance of the biologically inspired FMV beamformer
substantially exceeds that of the GSC, particularly (as
expected) for cases with more than one simultaneous
interferer. The FMV algorithm’s performance may be
somewhat inferior to the L/E method (which is too
expensive to perform the complete battery of tests for
direct comparison), but FMV clearly captures some
of the strengths of the biological system. The slow
convergence of the LMS-based iterative GSC adap-
tation prevents it from reacting fast enough to ex-
ploit the time-frequency sparseness of the interfer-
ence. (Each test is only 2.4 seconds long and both
beamformers are initialized to a conventional sum-
ming beamformer, so GSC’s somewhat inferior per-
formance even for one source also reflects slower
convergence. For one source and after convergence,
the performance of both beamformers is compara-
ble.) The results strongly suggest that the FMV beam-
former, like the L/E method, has captured at least one
of the special “tricks” that the human hearing sys-
tem uses to perform well with only two ears in the
cocktail-party context.
BIOLOGICALLY INSPIRED BEAMFORMING WITH SMALL ACOUSTIC ARRAYS
131