The subsequent section of this paper introduces a
DTD algorithm developed at the Multimedia
Systems Department by the authors, based on audio
signal watermarking techniques. The detailed
description of this algorithm is available in the
literature (Szwoch, Czyzewski and Ciarkowski,
2009) (Szwoch and Czyzewski, 2008), therefore
only brief description in provided. The actual
motivation behind this paper is the objective and
subjective quality evaluation of this algorithm,
especially against current state-of-the-art NCC
algorithm.
In accordance with above-set goal, the next
section is devoted to the description of the
evaluation procedures applied to the DTD
algorithms in order to obtain the results which are
consequently presented and discussed.
Finally, the conclusions regarding the practical
implications of obtained results are drawn.
2 WATERMARKING-BASED DTD
OVERVIEW
While most DTD algorithms rely on comparison of
far-end and microphone signals, the proposed
algorithm utilizes a different approach, which is
related to the so called “fragile” watermarking
techniques, typically used for protection of
multimedia contents against tampering. Fragile
watermarking has the property that the signature
embedded into the protected signal is destroyed and
becomes unreadable when the signal is modified. In
case of the double-talk detector algorithm such
signal protected from “tampering” is the far-end
speaker signal, and the tampering is considered an
addition of near-end signal to it. Simultaneously, any
linear modifications to the signal resulting from the
convolution with impulse response of the audio path
should not be considered as tampering, so that the
embedded signature would be detectible in “sole”
echo signal arriving at the microphone and
suppressed in combined echo-and-near-end signal.
The information contents of the signature in this
application is not important, as only the binary
decision whether the signature is present or not is
required. The applied signature embedding and
detection scheme should also be robust against A/D
and D/A conversions, which are inevitable in
telephony application, being at the same time
transparent (i.e. imperceptible) to the listener, not
affecting intelligibility of the speech and perceived
quality of the signal. Finally, minor addition of noise
and non-linear distortions resulting from
imperfections of used analogue elements of audio
path should not impair the ability of the algorithm to
detect presence of signature in echo signal.
The binary decision coming from the signature
detection block of above-described arrangement is
inverse to the expected output from DTD algorithm.
The correct detection of signature in the microphone
signal indicates that near-end speech is not present,
making it possible to control the adaptation process
of adaptive filter. The described concept is presented
in Figure 1. Adaptive filter is used to obtain an
estimate of audio path impulse response h
a
(n) based
on original far-end speaker signal x(n) and
microphone signal u(n). The far-end speaker signal
provides a subject to filtering with estimated impulse
response yielding echo estimate h
f
(n), which is
subtracted from microphone signal u(n) yielding in
turn the signal e(n) with cancelled echo. In order to
allow DTD operation the far-end speaker signal x(n)
passes through signature embedding block prior to
reproduction in the loudspeaker, producing the
signal x
w
(n) with embedded signature. This signature
is being detected in the signature detection block
yielding detection statistic f
d
(n), which is compared
to the detection threshold T
d
bringing in result binary
decision y(n) used to control the adaptation process.
Figure 1: General concept of AEC algorithm with DTD
based on audio signal watermarking.
The above-listed requirements regarding the
signature embedding and detection process make the
choice of a suitable watermarking algorithm
problematic. Most commonly used audio
watermarking methods are either limited to digital
domain only or are too susceptible to noise and
reverberation added in the acoustic path. The
research on this subject led to the choice of echo
hiding method, which adds to the signal single or
multiple echoes with short delay (below 30ms), so
the effect perceived by the listener is only a slight
“coloring” of the sound timbre (Gruhl, Lu and
Bender, 1996). In case of watermarking systems the
+
Acoustic
feedback
+
+
+
+
–
Adaptive
filter
Signature
embedding
Signature
detector
Decision
DTD
x(n) x
w
(n)
u(n) v(n)
h
a
(n)
(a, d,
δ
)
T
d
h
f
(n)
e(n)
f
d
(n)y(n)
+
Acoustic
feedback
+
+
+
+
–
Adaptive
filter
Signature
embedding
Signature
detector
Decision
DTD
x(n) x
w
(n)
u(n) v(n)
h
a
(n)
(a, d,
δ
)
T
d
h
f
(n)
e(n)
f
d
(n)y(n)
SIGMAP 2011 - International Conference on Signal Processing and Multimedia Applications
182