quality of alignment of MWUs. Considering that
multi word terms were present in a larger number in
the GS based on user queries and that the medical
domain is characterized by MWUs, either unknown
to generic lexicons or with meanings specific to this
domain (Rinaldi et al, 2004), it is advisable to utilize
corpora with POS tagged inflections as source for
the extraction of bilingual lexicons for CLQA.
Lemmatization should be applied on the frequency
tables, after producing the word alignments, in order
to group together words sharing the same base form
in the source language or target language and
facilitate the extraction of synonym lists in both
languages.
REFERENCES
Ahrenberg, L., Merkel, M., Sågvall Hein, A., &
Tiedemann, J., 2000. Evaluation of Word Alignment
Systems. In LREC'00, 2nd International Conference
on Linguistic Resources and Evaluation.
Andrenucci, A., 2006. Medical Information Portals: an
Empirical Study of Personalized Search Mechanisms
and Search Interfaces. In ICEIS'06, 8th International
Conference on Enterprise Information Systems.
INSTICC Press.
Aunino, L, Kuuskoski, R., & Makkonen, J., 2004. Cross-
Language Question Answering at the University of
Helsinky. In CLEF’ 04, Cross Language Evaluation
Forum.
Brants, T., 2000. TnT – A statistical Part-of-Speech
Tagger. In ANLP-2000, 6th Conference on Applied
Natural Language Processing.
Brown, P., Della Pietra, S., Della Pietra, V., & Mercer, R.,
1993. The mathematics of statistical machine
translation Parameter estimation [Electronic version].
Computational Linguistics, 19, 263-311
Carlberger, J., Dalianis, H., Hassel, M.,& Knutsson O.,
2001. Improving Precision in Information Retrieval
for Swedish using stemming. In NoDaLiDa 2001, 13th
Nordic Conference on Computational Linguistics.
Ejerhed, E., & Ridings, D., 1995. Parole and SUC,
http://spraakbanken.gu.se/parole/sgml2suc.html.
Germann, U., 2003. Greedy decoding for statistical
machine translation in almost linear time. In HLT-
NAACL’03. ACL press.
Ide, N., & Priest-Dorman, G., 2000. Corpus encoding
standard – document CES 1. Technical report, Vassar
College, LORIA/ CNRS. Vandoeuvre-les-Nancy,
France.
Jongejan, B., & Haltrup, D., 2005. The CST Lemmatiser,
Retrieved October 10, 2006, from Copenhagen
University:
http://cst.dk/download/cstlemma/current/doc/.
Lindberg, D., Humphreys, B., & McCray, A., 1993. The
Unified Medical Language System [Electronic
version]. Methods of Information in Medicine, 32,
281-291.
Loukachevitch, N., & Dobrov, B., 2004. Development of
Bilingual Domain-Specific Ontology for Automatic
Conceptual Indexing, In LREC’04.
Marcus, M., Santorini, B., & Marcinkiewicz, M., 1994.
Building a large annotated corpus of English: The
Penn Treebank [Electronic version]. Computational
Linguistics, 19.
Megyesi, B., 2000. Comparing Data-Driven learning
algorithms for PoS tagging of Swedish. In
NoDaLiDa2001.
Megyesi, B., 2002. DataDriven Syntactic Analysis
Methods and Applications for Swedish. PhD Thesis,
Kungliga Tekniska Högskolan. Sweden.
Melamed, D., 1995. Automatic evaluation of uniform
filter cascades for inducing N-best translation
lexicons. In 3rd Workshop on Very Large Corpora.
Merkel, M., 1999. Annotation Style guide for the PLUG
link annotator. Technical Report, Linköping,
University, Sweden.
Nyström, M., Merkel, M., Ahrenberg, L., et al., 2006.
Creating a medical English-Swedish dictionary using
interactive word alignment in BMC medical
informatics and decision making [Electronic version].
BMC Medical Informatics and Decision Making, 6.
Och, F. J., & Ney, H., 2003. A Systematic Comparison of
Various Statistical Alignment Models [Electronic
version]. Computational Linguistics, 29.
Rinaldi, F., Dowdall, J., Schneider, G., & Persidis, A.,
2004. Answering Questions in the Genomics Domain.
In ACL’04, Workshop on Question Answering in
Restricted Domains. ACL press.
Sneiders, E., 2002. Automated Question Answering:
Template-Based Approach. PhD thesis, Royal Institute
of Technology, Sweden.
SUC, 1997. SUC 1.0 Stockholm Umeå Corpus, Version
1.0. Umeå University and Stockholm University,
Sweden.
Tiedemann, J., 1999. Word alignment – step by step. In
NODALIDA’99, the 12th Nordic Conference on
Computational Linguistics.
Tiedemann, J., 2003a. Combining Clues for Word
Alignment. In EACL’03, 10th Conference of the
European Chapter of the ACL. ACL press.
Tiedemann, J., 2003b. Recycling translations. Extraction
of lexical data from parallel corpora and their
application in natural language processing. PhD
thesis, Uppsala University, Sweden.
Weijnitz, P., Forsbom, E., Gustavii, E., Pettersson, E.,&
Tiedemann, J., 2004. MT goes farming: Comparing
two machine translation approaches on a new domain.
In LREC’04, 4th International Conference on
Language Resources and Evaluation.
ICEIS 2007 - International Conference on Enterprise Information Systems
136