Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer
György Szaszák, Máté Ákos Tündik, András Beke
2016
Abstract
This paper addresses speech summarization of highly spontaneous speech. The audio signal is transcribed using an Automatic Speech Recognizer, which operates at relatively high word error rates due to the complexity of the recognition task and high spontaneity of speech. An analysis is carried out to assess the propagation of speech recognition errors into syntactic parsing. We also propose an automatic, speech prosody based audio tokenization approach and compare it to human performance. The so obtained sentence-like tokens are analysed by the syntactic parser to help ranking based on thematic terms and sentence position. The thematic term is expressed in two ways: TF-IDF and Latent Semantic Indexing. The sentence scores are calculated as a linear combination of the thematic term score and a positional score. The summary is generated from the top 10 candidates. Results show that prosody based tokenization reaches human average performance and that speech recognition errors propagate moderately into syntactic parsing (POS tagging and dependency parsing). Nouns prove to be quite error resistant. Audio summarization shows 0.62 recall and 0.79 precision by an F-measure of 0.68, compared to human reference. A subjective test is also carried out on a Likert-scale. All results apply to spontaneous Hungarian.
References
- Campr, M. and Jez?ek, K. (2015). Comparing semantic models for evaluating automatic document summarization. In Text, Speech, and Dialogue, pages 252-260.
- Christensen, H., Kolluru, B., Gotoh, Y., and Renals, S. (2004). From text summarisation to style-specific summarisation for broadcast news. In Advances in Information Retrieval, pages 223-237. Springer.
- Green, N. (2011). Dependency parsing. In Proceedings of the 20th Annual Conference of Doctoral Students: Part I - Mathematics and Computer Sciences, pages 137-142.
- Gurevych, I. and Strube, M. (2004). Semantic similarity applied to spoken dialogue summarization. In Proceedings of the 20th international conference on Computational Linguistics, page 764.
- Hakkani-T ür, D., Bechet, F., Riccardi, G., and T ür, G. (2006). Beyond asr 1-best: using word confusion networks in spoken language understanding. Computer Speech and Language, 20(4):495-514.
- Landauer, T. K., Foltz, P. W., and Laham, D. (1998). An introduction to latent semantic analysis. Discourse processes, 25(2-3):259-284.
- Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out: Proc. of the ACL-04 workshop, volume 8.
- Liu, Y. and Xie, S. (2008). Impact of automatic sentence segmentation on meeting summarization. In Proc.
- Acoustics, Speech and Signal Processing, ICASSP 2008. IEEE International Conference on, pages 5009- 5012.
- Maskey, S. and Hirschberg, J. (2005). Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization. In INTERSPEECH, pages 621-624.
- Maskey, S. and Hirschberg, J. (2006). Summarizing speech without text using hidden markov models. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 89-92.
- Nenkova, A. (2006). Summarization evaluation for text and speech: issues and approaches. In INTERSPEECH, pages 1527-1530.
- Neuberger, T., Gyarmathy, D., Gráczi, T. E., Horváth, V., Gósy, M., and Beke, A. (2014). Development of a large spontaneous speech database of agglutinative hungarian language. In Text, Speech and Dialogue, pages 424-431.
- Sarkar, K. (2012). Bengali text summarization by sentence extraction. In Proc. of International Conference on Business and Information Management ICBIM12, pages 233-245.
- Shriberg, E., Stolcke, A., Hakkani-T ür, D., and T ür, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1):127-154.
- Szarvas, M., Fegyó, T., Mihajlik, P., and Tatai, P. (2000). Automatic recognition of Hungarian: Theory and practice. Int. Journal of Speech Technology, 3(3):237- 251.
- Szaszák, G. and Beke, A. (2012). Exploiting prosody for automatic syntactic phrase boundary detection in speech. Journal of Language Modeling, 0(1):143- 172.
- Tarján, B., Fegyó, T., and Mihajlik, P. (2014). A bilingual study on the prediction of morph-based improvement. In Proceedings of the 4th International Workshop on Spoken Languages Technologies for Under-Resourced Languages, pages 131-138.
- T ündik, M. A. and Szaszák, G. (2016). Szövegalapú nyelvi elemzö kiértékelése gépi beszédfelismerö hibákkal terhelt kimenetén. In Proc. 12th Hungarian Conference on Computational Linguistics (MSZNY), pages 111-120.
- Zsibrita, J., Vincze, V., and Farkas, R. (2013). magyarlanc: A toolkit for morphological and dependency parsing of hungarian. In Proceedings of RANLP, pages 763- 771.
Paper Citation
in Harvard Style
Szaszák G., Tündik M. and Beke A. (2016). Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 221-227. DOI: 10.5220/0006044802210227
in Bibtex Style
@conference{kdir16,
author={György Szaszák and Máté Ákos Tündik and András Beke},
title={Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},
year={2016},
pages={221-227},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006044802210227},
isbn={978-989-758-203-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer
SN - 978-989-758-203-5
AU - Szaszák G.
AU - Tündik M.
AU - Beke A.
PY - 2016
SP - 221
EP - 227
DO - 10.5220/0006044802210227