Descovering Collocations in Modern Greek Language

Kostas Fragos, Yannis Maistros, Christos Skourlas

2004

Abstract

In this paper two statistical methods for extracting collocations from text corpora written in Modern Greek are described, the mean and variance method and a method based on the X2 test. The mean and variance method calculates distances (“offsets”) between words in a corpus and looks for specific patterns of distance. The X2 test is combined with the formulation of a null hypothesis H0 for a sample of occurrences and we check if there are associations between the words. The X2 testing does not assume that the words in the corpus have normally distributed probabilities and hence it seems to be more flexible. The two methods extract interesting collocations that are useful in various applications e.g. computational lexicography, language generation and machine translation.

References

  1. Benson & Morton 1989. The structure of the collocational dictionary. In International Journal of Lexicography 2:1-14.
  2. Caroll J., Minnen G., Pearse D., Canning Y., Delvin S. and Tait J. (1999). Simplifying text for language-impaired readers. In Preceedings of the 9th Conference of the European Chapter of the ACL (EACL 7899), Bergen, Norway, June.
  3. . Fano, R. (1961). Transmission of Information: A Statistical Theory of Information. MIT Press. Flexner, S., ed. (1987). The Random House.
  4. Uiday, M. A. K., and Hasan, R. (1976). Cohesion in English. Longman.
  5. Dunning, T. (1993). Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, Volume 19, number 1, pp61-74.
  6. Firth J. R. (1957). A synopsis of linguistic theory 1930-1955. In Studies in Linguistic Analysis, pp 1-32. Oxford: Philological society. Reprinted in F. R. Palmer(ed), Selected papers of J. R. Firth 1952-1959, London: Longman, 1968.
  7. Gitsaki C., Daigaku N. and Taylor R. (2000). English collocations and their place in the EFL,classroom available at: http//www.hum.nagoyacu.ac.jp/taylor/publications/collocations.html.
  8. Howarth P. and Nesi H. (1996). The teaching of collocations in EAP. Technical report University of Leeds, June.
  9. Juteson S. and Katz S. (1995b). Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Languagr Engineering 1:9-27.
  10. Lin D. (1998). Extracting collocations from text corpora. In First Workshop on Computational Terminology, Montreal, Canada, Augaust.
  11. Manning C. and Schutze H.(1999). Foundations of Statistical Natural Language Processing (Fifth Printing 2002). The MIT Press.
  12. Miller G., Beckwith R., Fellbaum C., Gross D. and Miller K. (1993). Introduction to WordNet: An On-line Lexical Database. Five Papers on WordNet Princeton University.
  13. Pearce D. (2001). Synonymy in Collocation Extraction. . In WordNet and Other Lexical Resources: Applications, Extensions and Customizations (NAACL 2001 Workshop). pages 41- 46. June. 2001. Carnegie Mellon University, Pittsburgh.
  14. Richardson, S. D. (1997). Determining similarity and inferring relations in a lexical knowledge base [Diss], New York, NY: The City University of New York.
  15. Smandja F. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1):143-177, March.
  16. Smith A. David (2002). Searching across language, time, and space: Detecting events with date and place information in unstructured text July 2002 In Proceedings of the second ACM/IEEE-CS joint conference on Digital libraries
Download


Paper Citation


in Harvard Style

Fragos K., Maistros Y. and Skourlas C. (2004). Descovering Collocations in Modern Greek Language . In Proceedings of the 1st International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2004) ISBN 972-8865-05-8, pages 151-158. DOI: 10.5220/0002667101510158


in Bibtex Style

@conference{nlucs04,
author={Kostas Fragos and Yannis Maistros and Christos Skourlas},
title={Descovering Collocations in Modern Greek Language},
booktitle={Proceedings of the 1st International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2004)},
year={2004},
pages={151-158},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002667101510158},
isbn={972-8865-05-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2004)
TI - Descovering Collocations in Modern Greek Language
SN - 972-8865-05-8
AU - Fragos K.
AU - Maistros Y.
AU - Skourlas C.
PY - 2004
SP - 151
EP - 158
DO - 10.5220/0002667101510158