In Table 3, we compare the performance of the
proposed watermarking algorithm and several recent
speech watermarking strategies. In (Sagi and Malah,
2007), the MOS of the narrow band (NB) speech is
3.7 and the MOS of the NB speech with embedded
data is 3.625. The small difference between the
MOS results demonstrates the transparency of the
proposed data-embedding scheme. In simulations,
the embedding data rate is 600 information
bits/second. The method of (Celik et al., 2005)
allows a relatively low embedding capacity (about 3
bps), which is suitable for metadata tagging and
authentication applications. However, (Celik et al.,
2005) is robust with low data-rate (5-8 kbps) speech
coders. The focus of (Gurijala and Deller, 2007) is
on the robustness performance of linear prediction
embedded speech watermarking. The technique is
robust to a wide range of attacks including noise
addition, cropping, compression, and filtering, but
the achieved capacity is low.
Table 3: Comparison of different speech watermarking
algorithms.
Algorithm SNR (dB)
PESQ-
MOS
Payload
(bps)
(Sagi and Malah, 2007) 35 3.6 600
(Celik et al., 2005) – – 3
(Girin and Marchand,
2004)
High – 200
(Gurijala and Deller,
2007)
– – 24
Proposed 30–40 ~ 4 800–4000
4 CONCLUSIONS
Using the wavelet transform and a logarithmic
quantization results in an adaptive speech
watermarking scheme. Considering the fact that the
human auditory system requires more precision at
low amplitudes (soft sounds) and taking advantage
of the logarithm, a logarithmic quantization
algorithm is used to quantize the approximation
coefficients of the wavelet transform (cA) to embed
the secret bits. To improve robustness, the cA
samples are split into frames and each single secret
bit is embedded into all the samples in the
corresponding frame. Increasing the frame size
decreases the embedding capacity and increases the
robustness.
The experimental results show that the distortion
caused by the embedding algorithm is adjustable and
lower than that introduced by the G.723 speech
codec. Therefore, the marked signal has high quality
(PESQ-MOS around 4), i.e. the proposed
watermarking scheme is transparent. The embedding
rate is adjustable and can start from very low bit-
rates to 4000 bps, depending on the application. The
scheme is shown to be robust against some attacks
such as ITU-T G.711 compression (a-law and u-law
companding), amplification and RC filters.
ACKNOWLEDGEMENTS
This work was partially funded by the Spanish
Government through projects TSI2007-65406-C03-
03 E-AEGIS, TIN2011-27076-C03-02 CO-
PRIVACY and CONSOLIDER INGENIO 2010
CSD2007-0004 ARES.
REFERENCES
Celik, M., Sharma, G., Tekalp, A. M., 2005. Pitch and
duration modification for speech watermarking. In
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
(ICASSP), vol. 2, pp. 17–20.
Chen, S., Leung, H., 2007. Speech bandwidth extension
by data hiding and phonetic classification. In Proc.
IEEE Int. Conf. Acoust., Speech, Signal Process.
(ICASSP), vol. 4, pp. 593–596.
Girin, L., Marchand, S., 2004. Watermarking of speech
signals using the sinusoidal model and frequency
modulation of the partials. In Proc. IEEE Int. Conf.
Acoust., Speech, Signal Process. (ICASSP), vol. 1, pp.
633–636.
Gurijala, A., Deller, J., 2007. On the robustness of
parametric watermarking of speech. In Multimedia
Content Analysis and Mining, ser. Lecture Notes in
Computer Science, Springer, vol. 4577/2007, pp. 501–
510.
Hu, Y., Loizou, P., 2007. Subjective evaluation and
comparison of speech enhancement algorithms.
Speech Communication, 49, 588-601.
ITU-T, Recommendation P.861. http://www.itu.int/rec/T-
REC-P.861/en (accessed on June 22nd, 2012).
ITU-T, Recommendation G.711. http://www.itu.int/rec/T-
REC-G.711/en (accessed on June 22nd, 2012).
ITU-T, Recommendation G.723. http://www.itu.int/rec/T-
REC-G.723/en (accessed on June 22nd, 2012).
Lang, A., Stirmark Benchmark for Audio.
http://wwwiti.cs.uni-magdeburg.de/~alang/smba.php
(accessed on June 22nd, 2012).
Sagi, A., Malah, D., 2007. Bandwidth extension of
telephone speech aided by data embedding. EURASIP
J. Adv. Signal Process., vol. 2007, article ID 64921.
Salomon, D., 2007. Data Compression: the Complete
Reference. Springer.
AdaptiveSpeechWatermarkinginWaveletDomainbasedonLogarithm
415