compare behaviour of the final stem consonant before ending -y, which can occur in
Nom. Sing. and Nom. Plur. of nouns, eg. aktor-0 : aktorz-y, senior-0 : seniorz-y (’older
person’) amor -0: amor-y (’cupid’), gbur-0 : gbur-y (’bumpkin’), traktor-0 : traktor-y
and adjectives, eg. któr-y : którz-y (’which’) and stary : starz-y (’old’). It is clear that the
global phonological rule which says that a front vowel causes consonant palatalization
is not appropriate; here, as in aktorz-y, seniorz-y the palatalization takes place before
-y in Nom. Plur., but cf. the co-existence of amor-y, gbur-y, traktor-y and któr-y, star-y
alongside którz-y, starz-y respectively Nom.Sing. and Plur. of adjective. This shows that
there is a need for a new approach to the stem alternation process. The data show that
the belief that it is possible to develop efficient stemming algorithm for Polish seems
to be naïve one. We argue, that if one wants to create algorithm, which recognize a
particular word properly, one must store all inflection forms in the dictionary - word by
word.
5 The Polish Inflection Dictionary; its Lexical Grammar and
Generative Mechanism
The Polish Inflection Dictionary [3] is the base that has been used to create the stemmer
described in this article. The construction of the dictionary bases on over 420 identified
lexical categories. Each is defined by its inflection patterns used to generate it. The first
element of the dictionary is the set of rules that are used to assign a lexical category to
a word basing on its ending.
Each inflection category pattern is represented by its specific local grammar, which
consists of two elements: a vector of inflection endings associated with the category,
and the proper local grammar rules, mainly related to stem alternation rules.
There are words in Polish, that in general behave according to a specific inflection
pattern, but some of their forms do not match strictly the pattern (such as the words
handel ’commerce’ and hotel ’hotel’ will have their corresponding genetive cases, re-
spectively, handlu but hotelu). Such cases are described by additional exception rules
that describe over 11.000 such cases as mentioned above.
Although, the generative approach to build the Inflection Dictionary comes directly
from the concept of two-level morphology [2], it cannot be used directly for word form
recognition. The reason for that is that the dictionary generating mechanism has been
augmented by additional filters - the last building block used for generating the Inflec-
tion Dictionary. Those mechanisms are rejecting forms that, formally, are correct, but
the language itself has rejected them. Those rejected forms range from illegal adjectives
comparative form (bardziej chory but not chorszy ’more ill’) to plurale tantum forms
(spodnie - ’trousers’) that for some pragmatic reasons do not possess singular forms.
Morphological relations are another problem that cannot be described in rule form.
Those relations join different words, which share the same lexical meaning, eg. the im-
perfective, perfective and iterative form of verbs, pisa´c : napisa´c : pisywa´c ’to write’,
where one cannot specify the prefix to build the proper perfective form c.f. od-pisa´c ’to
answer’ prze-pisa´c ’to copy’ nad-pisa´c ’to overwrite’ and so on. In addition the pres-
ence of iterative forms depend on the meaning of a specific verb. It seems impossible
145