Table 1: Precision and Recall results.
Frequency
rate
Word inflections
Precision
Word inflections
Recall
POS Words
Precision
POS Words
Recall
Parsed Words
Precision
Parsed Words
Recall
MWUs 1-2 52,55 64,39 63,38 84,75 65,62 84,94
MWUs 3-4 73,71 88,81 74,10 90,17 76,27 90,5
MWUs 5-6 71,39 86,34 70,60 86,27 69,52 86,25
MWUs 7-9 67,50 85,59 64,31 83,59 63,9 83,46
MWUs >10 67,35 85,19 63,90 83,06 61,24 82,83
SWUs 1-2
73,15 87,37 76,30 90,40 77,69 90,40
SWUs 3-4 82,01 89,91 88,02 99,07 88,39 99,23
SWUs 5-6 86,14 96,49 81,75 93,50 81,91 93,57
SWUs 7-9 86,90 96,30 84,12 95,12 84,17 95,07
SWUs >10 87,66 98,08 85,83 97,36 79,70 94,57
Frequency
Rate
Lemmas
Precision
Lemmas Recall POS Lemmas
Precision
POS Lemmas
Recall
Parsed Lemmas
Precision
Parsed Lemmas
Recall
MWUs 1-2 47,51 68,02 58,12 74,02 56,52 72,79
MWUs 3-4 42,01 65,68 43,23 66,88 41,52 63,60
MWUs 5-6 53,29 66,93 40,35 56,83 39,36 52,09
MWUs 7-9 56,30 78,01 53,03 74,61 51,56 73,71
MWUs >10 54,72 72,30 52,36 71,28 51,99 70,31
SWUs 1-2
50,20 59,85 52,73 62,88 51,21 62,12
SWUs 3-4 60,52 74,44 55,31 72,22 54,64 72,22
SWUs 5-6 59,33 72,97 56,54 71,17 52,55 70,97
SWUs 7-9 58,65 72,81 57,54 70.18 53,93 67,54
SWUs >10 55,67 72,44 53,86 67,95 50,76 6,51
Lemmas with POS Tags VS Lemmas without
POS/Parse Tags.
The statistical results for Lemmas
with and without POS/Parsing and the examination
of the frequency tables clarified some points of
difference. POS- tagged lemmas were more precise
when aligning compound words (which were
included among the MWUs) with low frequency rate
(1 or 2).
Low frequency POS tagged and syntactically parsed
MWUs had fewer additions, i.e. words that are
occurring in the alignments but that are not present
in the reference links (e.g. “nedsatt minneskapacitet
- memory deficit” VS “nedsatt minneskapacitet -
memory deficit failure” see table 2), and less
incorrect links.
The POS tagging and syntactic parsing proved to be
useful in aligning words consisting of dissimilar
strings and with low co-occurrence frequency, but
sharing the same POS (e.g. two nouns: “matstrupe -
oesophagus” 1-2 VS an adjective and a noun:
“överkänslig - oesophagus”).
Alignments consisting of SWUs achieved fewer
addictions in the corpus without POS nor parsing,
particularly with higher frequency rate (“fördom -
prejudice” >10, “alkoholist - alcoholic” >10, “toalett
- toilet” 9-7, “tanke- thought” 9-7, see table 2). POS
tagged lemmas produced slightly better results than
syntactically parsed lemmas in all frequency rates.
Inflected Words with POS, Syntactic Parsing VS
Inflected Words.
As shown in table 1, alignments
of inflected words with POS and syntactic parsing
obtained, in comparison to inflected words without
POS, better precision, recall results as well as a
higher number of correct links in lower frequency
rates (1-2 and 3-4) for both MWUs and SWUs. The
morphological information helped to disambiguate
the gender of Swedish adjectives in noun phrases,
including them in the alignment when they agreed
with the head noun and their inclusion was
necessary to build a conceptual unit (“dåligt
uppförande - misbehaviour” VS “uppförande -
misbehaviour” 1-2, where “dåligt” means “bad” and
“uppförande” means “behaviour”, see table 3). It
also helped to link nouns with the same number
(“barndomsupplevelser - childhood experiences” VS
“barndomsupplevelser - childhood”). For what
concerns single word units the morphological
information was helpful for aligning words sharing
the same definiteness (“förmågan - the ability” VS
“förmågan - ability”) or POS (e.g. two adjectives:
“felaktiga - inappropriate” instead of a noun and an
adjective “antaganden - inappropriate”).
CREATING A BILINGUAL PSYCHOLOGY LEXICON FOR CROSS LINGUAL QUESTION ANSWERING - A Follow
up Study
99