Descovering Collocations in Modern Greek Language

Kostas Fragos, Yannis Maistros, Christos Skourlas


In this paper two statistical methods for extracting collocations from text corpora written in Modern Greek are described, the mean and variance method and a method based on the X2 test. The mean and variance method calculates distances (“offsets”) between words in a corpus and looks for specific patterns of distance. The X2 test is combined with the formulation of a null hypothesis H0 for a sample of occurrences and we check if there are associations between the words. The X2 testing does not assume that the words in the corpus have normally distributed probabilities and hence it seems to be more flexible. The two methods extract interesting collocations that are useful in various applications e.g. computational lexicography, language generation and machine translation.


