EMPIRICAL TEXT MINING FOR GENRE DETECTION

Vasiliki Simaki, Sofia Stamou, Nikos Kirtsis

2012

Abstract

In this paper, we report on a preliminary study we carried out for identifying patterns that characterize the genre type of Greek texts. In the course of our study, we address four distinct genre types, we record their observable stylistic elements and we indicate their exploitation for automatic genre-based document classi-fication. The findings of our study demonstrate that texts contain lexical features with discriminative power as far as genre is concerned, however modeling those features so that they can be explored by computer-based applications is still in early stages.

References

  1. Finn, A. and Kushmerick, N. 2003. Learning to classify documents according to genre. In Proceedings of the Computational Approaches to Style Analysis and Synthesis Workshop.
  2. Finn, A., Kushmerick, N. and Smyth, B. 2002. Genre classification and domain transfer for information filtering. In Proceedings of the European Colloquium on Information Retrieval Research, pp. 353-362, Glasgow.
  3. Karlgren, J. 1999. Stylistic experiments in information retrieval. Natural Language Information Retrieval, Kluwer.
  4. Lee, Y. B. and Myaeng, S. H. 2004. Automatic identification of text genres and their roles in subject-based categorization. In the 37th Hawaiian Conference on System Sciences.
  5. Santini, M., Power, R. and Evans, R. 2006. Implementing a characterization of genre for automatic genre identification of web pages. ACL Computational Linguistics Conference.
  6. Santini, M. 2007. Automatic genre identification: towards a flexible classification scheme. In the BCS IRSG Symposium: Future Directions in Information Access, Glasgow, Scotland.
  7. Sharoff, S. 2007. Classifying web corpora into domain and genre using automatic feature identification. In the Web as Corpus Workshop, Louvain-la-Neuve.
  8. Stamatatos E., Fakotakis N. and Kokkinakis G. 2000. Automatic text categorization in terms of genre and author. Computational Linguistics, vol.26, no.4, pp. 461-485, MIT Press
Download


Paper Citation


in Harvard Style

Simaki V., Stamou S. and Kirtsis N. (2012). EMPIRICAL TEXT MINING FOR GENRE DETECTION . In Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8565-08-2, pages 733-737. DOI: 10.5220/0003956207330737


in Bibtex Style

@conference{webist12,
author={Vasiliki Simaki and Sofia Stamou and Nikos Kirtsis},
title={EMPIRICAL TEXT MINING FOR GENRE DETECTION},
booktitle={Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2012},
pages={733-737},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003956207330737},
isbn={978-989-8565-08-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - EMPIRICAL TEXT MINING FOR GENRE DETECTION
SN - 978-989-8565-08-2
AU - Simaki V.
AU - Stamou S.
AU - Kirtsis N.
PY - 2012
SP - 733
EP - 737
DO - 10.5220/0003956207330737