SPEECH SEGMENTATION IN NOISY STREET ENVIRONMENT

Jaroslaw Baszun

Abstract

Two voice activity detectors for speaker verification systems were compared in this paper. The first one is single-microphone system based on properties of human speech modulation spectrum i.e. rate of power distribution in modulation frequency domain. Based on the fact that power of modulation components of speech is concentrated in a range from 1 to 16 Hz and depends on rate of syllables uttering by a person. Second one is two-microphone system with algorithm based on coherence computation. Experiments shown superiority of two-microphone system in case of voiced sounds in background.

References

  1. Atlas, L. and Shamma, S. (2003). The modulation transfer function in room acoustics as a predictor of speech intelligibility. EURASIP Journal on Applied Signal Processing, 7:668-675.
  2. Baszun, J. (2007). Voice activity detection for speaker verification systems. In Joint Rougth Set Symposium, Toronto, Canada.
  3. Baszun, J. and Petrovsky, A. (2000). Flexible cochlear system based on digital model of cochlea: Structure, algorithms and testing. In Proceedings of the 10th European Signal Processing Conference ( EUSIPCO 2000), pages 1863-1866, Tampere, Finland. vol. III.
  4. Carter, G. C. (1987). Coherence and time delay estimation. Proceedings of the IEEE, 75(2):236-254.
  5. Doblinger, G. (1995). Computationally efficient speech enhancement by spectral minima tracking in subbands. In Proceedings of the 4th European Conference on Speech Communication and Technology, pages 1613- 1516, Madrit, Spain.
  6. Drullman, R., Festen, J., and Plomp, R. (1994). Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am., (2):1053-1064.
  7. El-Maleh, K. and Kabal, P. (1997). Comparison of voice activity detection algorithms for wireless personal communications systems. In Proceedings IEEE Canadian Conference Electrical and Computer Engineering, pages 470-473.
  8. Elhilali, M., Chi, T., and Shamma, S. (2003). A spectrotemporal modulation index (stmi) for assessment of speech intelligibility. Speech Communication, 41:331-348.
  9. Guerin, A. (2000). A two-sensor voice activity detection and speech enhancement based on coherence with additional enhancement of low frequencies using pitch information. In EUSIPCO 2000, pages 178-182, Tampere, Finland.
  10. Hermansky, H. and Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4):587-589.
  11. Houtgast, T. and Steeneken, H. J. M. (1973). The modulation transfer function in room acoustics as a predictor of speech intelligibility. Acustica, 28:66.
  12. Houtgast, T. and Steeneken, H. J. M. (1985). A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. J. Acoust. Soc. Am., 77(3):1069-1077.
  13. Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9:504-512.
  14. Martin, R. and Vary, P. (1994). Combined acoustic echo cancellation, dereverberation and noise reduction: a two microphone approach. Ann. Telecommun., 49(7- 8):429-438.
  15. Mesgarani, N., Shamma, S., and Slaney, M. (2004). Speech discrimination based on multiscale spectro-temporal modulations. In ICASSP, pages 601-604.
  16. Sovka, P. and Pollak, P. (1995). The study of speech/pause detectors for speech enhancement methods. In Proceedings of the 4th European Conference on Speech Communication and Technology, pages 1575-1578, Madrid, Spain.
  17. Thompson, J. and Atlas, L. (2003). A non-uniform modulation transform for audio coding with increased time resolution. In ICASSP, pages 397-400.
Download


Paper Citation


in Harvard Style

Baszun J. (2007). SPEECH SEGMENTATION IN NOISY STREET ENVIRONMENT . In Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007) ISBN 978-989-8111-13-5, pages 422-427. DOI: 10.5220/0002139704220427


in Bibtex Style

@conference{sigmap07,
author={Jaroslaw Baszun},
title={SPEECH SEGMENTATION IN NOISY STREET ENVIRONMENT},
booktitle={Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)},
year={2007},
pages={422-427},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002139704220427},
isbn={978-989-8111-13-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)
TI - SPEECH SEGMENTATION IN NOISY STREET ENVIRONMENT
SN - 978-989-8111-13-5
AU - Baszun J.
PY - 2007
SP - 422
EP - 427
DO - 10.5220/0002139704220427