is imperative in the development of Amharic IR.
Although some efforts have been made to develop
Amharic IR systems using stems, their effectiveness
with respect to the use of various forms has not been
systematically analyzed thus far. Therefore, this
research analyzes the use of stems and roots for
content representation and investigates their effects
on Amharic IR.
The rest of this paper is organized as follows.
Section 2 describes Amharic language and its
morphology. Section 3 discusses related work and
Section 4 presents how documents and queries are
represented in Amharic IR system. Experimental
results and evaluation are discussed in Section 5. In
Section 6, we make conclusion along with the way
forward in Amharic IR.
2 Amharic LANGUAGE
Amharic is the official language of the government
of Ethiopia. Although several languages are spoken
in Ethiopia, Amharic is spoken as a mother tongue
by a sizeable proportion of the country's population
currently estimated to be over 110 million. Among
the Semitic language family, it is the second most
spoken language in the world, next to Arabic. Due to
its historical significance and official status,
Amharic has been serving as the lingua franca of the
country since a long time. As a result, many literary
works, government documents, educational
materials, religious literary works, etc. are
predominantly produced in Amharic. Amharic uses
Ethiopic script for writing having 34 base characters
(with a vowel ኧ /ə/), each of which are modified to
have six other orders representing vowels in the
order of ኡ /u/, ኢ /i/, ኣ /a/, ኤ /e/, እ /ɨ/, and ኦ /o/.
Like other Semitic languages, complex
morphological processes are carried out on Amharic
word classes such as verbs, nouns and adjectives
(Yimam, 2001). Amharic verbs are the most
complex word classes and can be generated by
attaching affixes on verbal stems. On the other hand,
verbal stems can be generated from verbal roots by
inserting vowels between radicals. For example, the
verbal stem ገደል- /gədəl-/ is derived from the verbal
root ግ-ድ-ል /g-d-l/. Moreover, verbal stems (e.g.
ተገደል- /təgədəl-/) can be derived from other verbal
stems (e.g. ገደል- /gəd
əl-/) by affixing morphemes.
The verb formation process is usually completed by
attaching a verbal stem with person, gender, number,
case, tense/aspect and mood markers. For example,
from the verbal stem ገደል- /gədəl-/ the following
verbs can be generated: ገደልኩ /gədəlku 'I killed'/,
ገደልኩህ /gədəlkuh 'I killed you'/, ገደልን /gədəln 'we
kill'/, ተገደልኩ /təgədəlku 'I was killed'/, ገደለች
/gədələtʃ 'she killed'/, etc. As verbs are marked for
subject and object, they alone can represent a
complete sentence. For example, the word አልሰበረንም
/ʔəlsəbərənɨm 'he did not break us'/, which is
constructed from the morphemes ʔəl-səbər-ə-nɨ-m
, is
a complete sentence with the following linguistic
information: ʔəl-…-m /not/, -səbər- /did break/, -ə-
/he/ and -nɨ- /us/. Accordingly, thousands of verbs
can be derived from a verbal root through a complex
morphological process carried out by attaching a
combination of person, case, gender, number, tense,
aspect, mood and others (Abate and Assabie, 2014;
Assabie, 2017).
Based on a morphological structure, Amharic
nouns and adjectives can be either derived or non-
derived. For example, the word መሬት /məret 'earth'/
and ዛፍ /zaf 'tree'/ are non-derived nouns whereas
words like ስብራት /sɨbɨrat 'the state of being broken'/
and ደግነት /dəgɨnət 'generosity'/ are nouns derived
from the verbal root ስ-ብ-ር /s-b-r 'to break'/ and the
adjective ደግ /dəg 'generous'/, respectively. Derived
nouns are generated from other word classes though
morphological processes. In general, Amharic nouns
can be derived from verbal roots, adjectives and
other nouns by affixing vowels or bound
morphemes. Derived adjectives can be formed from
verbal roots by infixing vowels between consonants
(e.g. ክ-ብ-ድ /
k-b-d 'to become heavy'/ → ከባድ /kəbad
'heavy'/), nouns by suffixing bound morphemes such
as -ኧኛ /ʔəɲa/ (e.g. ጉልበት /gulbət 'power'/ → ጉልበተኛ
/gulbətəɲa 'powerful') and verbal stems by prefixing
or suffixing bound morphemes (e.g. ደካም- /dəkam-/
→ ደካማ /dəkama 'weak'/). Although the
morphological process of derivation of nouns and
adjectives is complex by itself, even more
complexity arises from their inflections. Amharic
nouns and adjectives are inflected for number by
suffixing -ኦች /-ʔotʃ/ or -ዎች /-wotʃ/, definiteness by
suffixing -ኡ /-ʔu/ or -ዉ /-wu/, objective case by
suffixing -ን /-n/, possessive case by suffixing
different morphemes depending on the subject, and
gender by suffixing -ኢት
/-ʔit/. These inflections can
appear alone or in combination at the same time,
along with prepositions and negation markers which
lead to the generation of thousands of word forms
from a single noun or adjective. For example,
ያለባለቤቶቹ /jaləbaləbetotʃu 'without the owners of the
house'/ is generated from the morphemes jə-ʔələ-
balə-bet-otʃ-u (jə preposition 'of/with', ʔəl negation
marker 'not/without', balə possessive marker 'owner
of', bet noun 'house', otʃ plural marker, and u definite