AN ONLINE SPEAKER TRACKING SYSTEM FOR AMBIENT INTELLIGENCE ENVIRONMENTS

Maider Zamalloa, Mikel Penagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel, Juan Pedro Uribe

Abstract

Ambient intelligence is an interdisciplinary paradigm which envisages smart spaces that provide services and adapt transparently to the user. As the most natural interface for human interaction, speech can be exploited for adaptation purposes in such scenarios. Low latency is required, since adaptation must be continuous. Most speaker tracking approaches found in the literature work offline, fully processing pre-recorded audio files by a two-stage procedure: (1) performing acoustic segmentation and (2) assigning each segment a speaker label. In this work a real-time low-latency speaker tracking system is presented, which deals with continuous audio streams. Experimental results are reported on the AMI Corpus of meeting conversations, revealing the effectiveness of the proposed approach when compared to an offline speaker tracking system developed for reference.

References

  1. Abowd, G.D., Mynatt, E.D., "Designing for the Human Experience in Smart Environments" in D.J. Cook and S.K. Das, Editors, Smart Environments: Technology, Protocols, and Applications, Wiley, 153-174, 2005.
  2. Bonastre, J.F., Delacourt, P., Fredouille, C., Merlin, T. and Wellekens, C., "A Speaker Tracking System based on Speaker Turn Detection for NIST Evaluation", in Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, 2000.
  3. Brummer, N. and Preez, J., "Application Independent Evaluation of Speaker Detection", Computer Speech and Language, 20:230-275, 2006.
  4. Carletta, J., "Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus". Language Resources and Evaluation Journal , 41(2): 181-190, 2007.
  5. Chen, S.C. and Gopalakrishnan, P.S., "Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion", Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, 1998.
  6. Cook, D.J., Augusto, J.C., Jakkula, V.R., "Ambient Intelligence: Technologies, Applications, and Opportunities", Pervasive and Mobile Computing, 5(4): 277-298, 2009.
  7. ISTAG, "Scenarios for Ambient Intelligence in 2010". European Commission Report, 2001.
  8. Istrate, D., Scheffer, N., Fredouille, C. and Bonastre, J.F., "Broadcast News Speaker Tracking for ESTER 2005 Campaign", in Proceedings of the International Conference on Speech and Language Processing, Lisboa, 2005.
  9. Liu, D., Kiecza, D., Srivastava, A. and Kubala, F., "Online Speaker Adaptation and Tracking for Real-time Speech Recognition", in Proceedings of the International Conference on Speech and Language Processing, Lisboa, 2005.
  10. Lu, L. and Zhang H.J., “Unsupervised Speaker Segmentation and Tracking in Real-time Audio Content Analysis”, Multimedia Systems,10:332-343, 2005.
  11. Martin, A., Doddington, G., Kamm, T., Ordowski, M., and Przybocki, M., "The DET curve in assessment of detection task performance", in Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech), Vol. 4, pp. 1895-1898, 1997.
  12. Martin, A.F. and Przybocki, M.A., "Speaker Recognition in a Multi-Speaker Environment", in Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech), Denmark, 2001.
  13. Meignier, S., Moraru, D., Fredouille, C., Bonastre, J.F. and Besacier, L., "Step-by-step and Integrated Approaches in Broadcast News Speaker Diarization", Computer Speech and Language, 20:303-330, 2006.
  14. Moraru, D., Ben, M., Gravier, G., "Experiments on Speaker Tracking and Segmentation in Radio Broadcast News", in Proceedings of the International Conference on Speech and Language Processing, Lisboa, 2005.
  15. Reynolds, D.A., Quatieri, T.F. and Dunn, R.B., "Speaker Verification Using Adapted Gaussian Mixture Models". Digital Signal Processing, 10:19-41, 2000.
  16. Rodríguez, L.J., Peñagarikano, M. and Bordel, G., "A Simple But Effective Approach to Speaker Tracking in Broadcast News", Pattern Recognition and Image Analysis, LNCS 4478: 48-55, Springer-Verlag, 2007.
  17. Tranter, S.E. and Reynolds, D.A., "An Overview of Automatic Speaker Diarization Systems", IEEE Transactions on Audio, Speech and Language Processing, 14(5): 1557-1565, 2006.
  18. Weiser, M., "The Computer for the Twenty-First Century", Scientific American, 94-104, 1991.
  19. Wu, T.Y., Lu, L., Chen, K. and Zhang, H.J., "UBM-based Real-time Speaker Segmentation for Broadcasting News", in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, China, 2003.
Download


Paper Citation


in Harvard Style

Zamalloa M., Penagarikano M., Javier Rodríguez-Fuentes L., Bordel G. and Pedro Uribe J. (2010). AN ONLINE SPEAKER TRACKING SYSTEM FOR AMBIENT INTELLIGENCE ENVIRONMENTS . In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-674-021-4, pages 343-349. DOI: 10.5220/0002734803430349


in Bibtex Style

@conference{icaart10,
author={Maider Zamalloa and Mikel Penagarikano and Luis Javier Rodríguez-Fuentes and Germán Bordel and Juan Pedro Uribe},
title={AN ONLINE SPEAKER TRACKING SYSTEM FOR AMBIENT INTELLIGENCE ENVIRONMENTS},
booktitle={Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2010},
pages={343-349},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002734803430349},
isbn={978-989-674-021-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - AN ONLINE SPEAKER TRACKING SYSTEM FOR AMBIENT INTELLIGENCE ENVIRONMENTS
SN - 978-989-674-021-4
AU - Zamalloa M.
AU - Penagarikano M.
AU - Javier Rodríguez-Fuentes L.
AU - Bordel G.
AU - Pedro Uribe J.
PY - 2010
SP - 343
EP - 349
DO - 10.5220/0002734803430349