ROBUST VOICE ACTIVITY DETECTION BASED ON PITCH AND SUB-BAND ENERGY

Zhihao Zhang, Jinlong Lin

2009

Abstract

A new Voice Activity Detection (VAD) method is proposed to track the various background noises and it can be robust in both stationary and variable noise environments. Many previous VAD methods assume that the background only contains certain kinds of noises, so they could not deal with the noise in practical applications efficiently. In proposed approach, determinate speech, determinate noise and potential speech regions are defined. The first two regions are located with extracted pitch contour information and the ambiguous region will be further retrieved using updated thresholds of sub-bands energy in obtained determinate noise’s frequency domain. Experiments are carried out with an exhaustive comparison to three standard VAD methods: G729b, ETSI AFE and AMR. The result shows that our approach has a more robust performance than others in the real circumstances.

References

  1. 3GPP, 2001. Speech codec speech processing functions; Adaptive Multi-Rate-Wideband (AMR-WB) speech codec; Voice Activity Detector (VAD).
  2. A. Davis, S. Nordholm, S.-Y. Low, R. Togneri, 2006. A multi-decision sub-band voice activity detector, Proceedings of EUSIPCO, Florence Italy.
  3. D. Hermes, 1988. Measurement of pitch by subharmonic summation, The Journal of the Acoustic Society of America, pp. 257-264.
  4. Der-Jenq Liu, Chin-Teng Lin, 2001. Fundamental frequency estimation based on the joint timefrequency analysis of harmonic spectral structure, IEEE Trans. Speech Audio Process, pp. 609-621.
  5. E. Fisher, J. Tabrikian, S. Dubnov, 2006. Generalized likelihood ratio test for voiced-unvoiced decision in noisy speech using the harmonic model, IEEE Transactions on Audio, Speech and Language Processing, pp. 502-510.
  6. ETSI, 2007. Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms, ETSI ES 202 050 Recommendation.
  7. ITU-T, 1997. A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70, ITU-T Rec. G. 729, Annex B.
  8. K. Woo, T. Yang, K. Park, and C. Lee, 2000. Robust voice activity detection algorithm for estimating noise spectrum, Electronics Letters, pp. 180-181.
  9. L. Karray and A. Martin, 2003. Toward improving speech detection robustness for speech recognition in adverse environments, Speech Communication, pp. 261-276.
  10. Syed W.Q., Hsiao-Chun Wu, 2007. Speech waveform compression using robust adaptive voice activity detection for nonstationary noise in multimedia communications, Global Telecommunications Conference, pp. 3096-3101.
  11. X.J Yang., H.S. Chi, 1995. Speech signal digital processing, Electronic Industry Press, Beijing.
Download


Paper Citation


in Harvard Style

Zhang Z. and Lin J. (2009). ROBUST VOICE ACTIVITY DETECTION BASED ON PITCH AND SUB-BAND ENERGY . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2009) ISBN 978-989-674-007-8, pages 44-48. DOI: 10.5220/0002221000440048


in Bibtex Style

@conference{sigmap09,
author={Zhihao Zhang and Jinlong Lin},
title={ROBUST VOICE ACTIVITY DETECTION BASED ON PITCH AND SUB-BAND ENERGY},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2009)},
year={2009},
pages={44-48},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002221000440048},
isbn={978-989-674-007-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2009)
TI - ROBUST VOICE ACTIVITY DETECTION BASED ON PITCH AND SUB-BAND ENERGY
SN - 978-989-674-007-8
AU - Zhang Z.
AU - Lin J.
PY - 2009
SP - 44
EP - 48
DO - 10.5220/0002221000440048