4 Conclusions
In this paper we have considered that topic change detection methods, for text seg-
mentation, generally rely on lexical information, and tend to discard other types of
information existing in texts, e.g., rhetorical, stylistic and syntactic information, gener-
ally subsumed under the label of structural information. They also favor default topical
boundaries detection, whereas focused detection on intended boundaries suggest other
possible tracks for asserting topic change. Assuming that structural information has a
role to play in detecting intended boundaries, we built a segmenter, called Transeg,
based on spotting transition zones between topics in texts. This paper has focused on
transition zones definition and the appropriate actions to detect them, by assigning a
transition score and a breaking score to each sentence of the text. The transition score
indicates its ability to play the role of the first sentence of a segment, and the breaking
score, its likelihood of being the last one. With values over a given threshold, transition
and breaking score become representative of an intended topical boundary. To deter-
mine the efficiency of Transeg, we evaluated it by running it on the same corpus as c99,
a popular default boundaries detection algorithm. Results have shown that structural
information has an impact on segmentation efficiency.
References
1. Kaszkiel, M., Zobel, J.: Passage retrieval revisited. Proceedings of theTwentieth International
Conference on Research and Development in Information Access (ACMSIGIR) (1997) 178–
185
2. Prince, V., Labadi
´
e, A.: Text segmentation based on document understanding for information
retrieval. In Proceedings of NLDB’07 (2007) 295–304
3. Kan, M., Klavans, J.L., McKeown, K.R.: Linear segmentation and segment significance.
Proceedings of WVLC-6 (1998) 197–205
4. Hearst, M.A.: Text-tilling : segmenting text into multi-paragraph subtopic passages. Com-
putational Linguistics (1997) 59–66
5. Pevzner, L., Hearst, M.: A critique and improvement of anevaluation metric for text segmen-
tation. Computational Linguistics (2002) 113–125
6. Choi, F.Y.Y.: Advances in domain independent linear text segmentation. Proceedings of
NAACL-00 (2000) 26–33
7. Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the
structure of text. Computational Linguistics 17 (1991) 20–48
8. Bestgen, Y., Pi
´
erard, S.: Comment
´
evaluer les algorithmes de segmentation automatiques ?
essai de construction d’un matriel de r
´
ef
´
erence. Proceedings of TALN’06 (2006)
9. Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J.: Latent semantic analysis for text segmentation.
Proceedings of EMNLP (2001) 109–117
10. Reynar, J.C.: Topic Segmentation: Algorithms and Applications. Phd thesis, University of
Pennsylvania (1998)
11. Passonneau, R.J., Litman, D.: Lintention-based segmentation: Humanreliability and corre-
lation with linguistic cues. Proceedings of the 31st Annual Meeting of theAssociation for
Computational Linguistics, (1993) 148–155
12. Chauch
´
e, J.: Un outil multidimensionnel de l’analyse du discours. Proceedings of Coling’84
1 (1984) 11–15
13. Roget, P.: Thesaurus of English Words and Phrases. Longman, London (1852)
20