Tracker Text Segmentation Approach: Integrating Complex Lexical and Conversation Cue Features

C. Chibelushi, B. Sharp

Abstract

While text segmentation is a topic which has received a great attention since 9/11, most of current research projects remain focused on expository texts, stories and broadcast news. Current segmentation methods are well suited for written and structured texts making use of their distinctive macro-level structures. Text segmentation of transcribed multi-party conversation presents a different challenge given the lack of linguistic features such as headings, paragraph, and well formed sentences. This paper describes an algorithm suited for transcribed meeting conversations combining semantically complex lexical relations with conversational cue phrases to build lexical chains in determining topic boundaries.

References

  1. Beeferman, D., Berger, A. and Laffety, J.: Text Segmentation Using Exponential Models, Proceedings of the Proceedings of EMNLP-2 (1997).
  2. Beeferman, D., Berger, A. and Laffety, J.: Statistical Models for Text Segmentation, Machine Learning, Special Issue on Natural Language Processing, Vol. 34, No. 1-3, (1999)177-210.
  3. Bengel, J., Gauch, S., Mittur, E. and Vijayaraghavan, R.: Chattrack: Chat Room Topic Detection Using Classification, Proceedings of the The 2nd Symposium on Intelligence and Security Informatics (ISI-2004), Tucson, Arizona, (2004) 266-277.
  4. Bilan, Z. and Nakagawa, M.: Segmentation of On-line Handwritten Japanese Text of Arbitrary Line Direction by a Neural Network for Improving Text Recognition Proceedings of the Proceedings of the Eighth International Conference on Document Analysis and Recognition, (2005)157 - 161.
  5. Chibelushi, C.: Text Mining for Meeting Transcripts Analysis to Support Decision Management, PhD thesis, Staffordshire University (2008).
  6. Chibelushi, C., Sharp, B. and Salter, A.: Transcripts Segmentation Using Cosine Similarity Measure, In: B. Sharp (ed.), Proceedings of the Proceedings of 2nd International Workshop on Natural Language Understanding and Cognitive Science (NLUCS2005) Collocated with ICEIS-2005, Miami, USA (2005).
  7. Choi, F., Wiemer-Hastings, P. and Moore, J.: Latent Semantic Analysis for Text Segmentation, Proceedings of the Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing, (2001)109 - 117.
  8. Choi, F. Y. Y.: Advances in domain independent linear text segmentation, Proceedings of the Proceedings of NAACL00, Seattle (2000).
  9. Halliday, M. and Hasan, R.: Cohesion in English, Longman, London (1976).
  10. Hearst, M.: Multi-paragraph Segmentation of Expository Text, Proceedings of the Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, (1994)9-16.
  11. Hirschberg, J. and Litman, D.: Empirical studies on the Disambiguation of Cue Phrases, Computational Linguistics, Vol. 19, No. 3, (1993) 501-530.
  12. Kan, M., Klavans, J. L., and. McKeown, K. R.: Linear segmentation and segment relevance. In Proceedings of the Sixth Workshop on Very Large Corpora, (1998).
  13. Levow, G.: Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dia]logue, Proceedings of the Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue, M. Strube and C. Sidner, ACL Publisher, USA, (2004) 93-96.
  14. Manning, C.: Rethinking Text Segmentation Models: An Information Extraction Case Study, University of Sydney (1998).
  15. Okumura, M. and Honda, T.: Word Sense Disambiguation and Text Segmentation Based on Lexical Cohesion, Proceedings of the Proceedings of the 15th International Conference on Computational Linguistics:(COLING-94), (1994) 775-761.
  16. Passoneau, R. and Litman, D.: Discourse Segmentation by Human and Automated Means, Computational Linguistics, Vol. 23, No. 1, (1997)103-139.
  17. Pevzner, L. and Hearst, M. evaluation Metric for Text Segmentation, Computational Linguistics, Vol. 28, No. 1, (2002)19-36.
  18. Renjie, J., Feihu, Q., Xu, L. and Wu, G.: Detecting and Segmenting Text from Natural Scenes with 2-Stage Classification Proceedings of the Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications:(ISDA'06), (2006). 819 - 824.
  19. Reynar, J.: Statistical Models for Topic Segmentation, Proceedings of the Proceedings of the Association for Computational Linguistics, ACL, College Park, USA, (1999) 357-364.
  20. Reynar, J.: Topic Segmentation: Algorithms and Applications, PhD Thesis thesis, University of Pennsylvania (1998).
  21. Senda, S. and Yamada, K.: A Maximum-likelihood Approach to Segmentation-based Recognition of Unconstrained Handwriting Text, Proceedings of the Proceedings of the Sixth International Conference on Document Analysis and Recognition, (2001) 184 - 188.
  22. Stokes, N.: Spoken and Written News Story Segmentation using Lexical Chains, Proceedings of the Proceedings of HLT-NAACL, Student Research Workshop, Edmonton, (2003) 49-54.
  23. Stokes, N.: Applications of Lexical Cohesion Analysis in the Topic Detection and Tracking Domain., PhD Thesis, University College Dublin (2004).
  24. Yamron, J., Carp, I., Gillick, L., Lowe, S. and Mulbregt, P. V.: A Hidden Markov Model Approach to Text Segmentation and Event Tracking, Proceedings of the Proceedings of ICASSP'98, IEEE, Seatle, WA,:(1998) 333-336.
  25. Youmans, G.: A New Tool for Discourse Analysis: The Vocabulary Management Profile, In: Languages, (1991)763-789.
Download


Paper Citation


in Harvard Style

Chibelushi C. and Sharp B. (2008). Tracker Text Segmentation Approach: Integrating Complex Lexical and Conversation Cue Features . In Proceedings of the 5th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2008) ISBN 978-989-8111-45-6, pages 104-113. DOI: 10.5220/0001740501040113


in Bibtex Style

@conference{nlpcs08,
author={C. Chibelushi and B. Sharp},
title={Tracker Text Segmentation Approach: Integrating Complex Lexical and Conversation Cue Features},
booktitle={Proceedings of the 5th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2008)},
year={2008},
pages={104-113},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001740501040113},
isbn={978-989-8111-45-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2008)
TI - Tracker Text Segmentation Approach: Integrating Complex Lexical and Conversation Cue Features
SN - 978-989-8111-45-6
AU - Chibelushi C.
AU - Sharp B.
PY - 2008
SP - 104
EP - 113
DO - 10.5220/0001740501040113