CREATING A BILINGUAL PSYCHOLOGY LEXICON FOR CROSS LINGUAL QUESTION ANSWERING - A Follow up Study

Andrea Andrenucci

doi:10.5220/0002803400950102

CREATING A BILINGUAL PSYCHOLOGY LEXICON FOR CROSS LINGUAL QUESTION ANSWERING - A Follow up Study

Andrea Andrenucci

2010

Abstract

This paper discusses a follow-up study aimed at investigating the extraction of word relations from a medical parallel corpus in the field of Psychology. Word relations are extracted in order to create a bilingual lexicon for cross lingual question answering between Swedish and English on a medical portal. Six different variants of the corpus were utilized: word inflections with and without POS tagging, syntactically parsed word inflections, lemmas with and without POS tagging, syntactically parsed lemmas. The purpose of the study was to analyze the quality of the word relations obtained from the different versions of the corpus and to understand which version of the corpus was more suitable for extracting a bilingual lexicon in the field of psychology. The word alignments were evaluated with the help of reference data (gold standard) and with measures such as precision and recall.

References

Ahrenberg, L., Merkel, M., Sågvall Hein, A., Tiedemann, J., 2000. Evaluation of Word Alignment Systems. In LREC'00, 2nd International Conference on Linguistic Resources and Evaluation.
Andrenucci A., 2007. Creating a Bilingual Psychology Lexicon for Cross Lingual Question Answering - A Pilot Study. In ICEIS'2007. INSTICC Press.
Aunino L, Kuuskoski R., and Makkonen J., 2004. CrossLanguage Question Answering at the University of Helsinky. In CLEF' 04, Cross Language Evaluation Forum.
Baud R., Lovis C., Rassinoux AM., Michel PA, Scherrer JR., 1998.Automatic Extraction of Linguistic Knowledge from an International Classification. Studies in health technology and informatics, 52.
Borin L. Pivot Alignment. In NoDaLiDa'99.
Brants, T., 2000. TnT - A statistical Part-of-Speech Tagger. In ANLP-2000, 6th Conference on Applied Natural Language Processing.
Brown, P., Della Pietra S., Della Pietra V., and Mercer R., 1993. The mathematics of statistical machine translation Parameter estimation. Computational Linguistics, 19, 263-311
Church, K., 1993. Char Alignment, a program for aligning texts at the character level. In ACL'93.
Ejerhed, E. and Ridings, D., 1995. Parole and SUC, http://spraakbanken.gu.se/parole/sgml2suc.html
Germann, U., 2003. Greedy decoding for statistical machine translation in almost linear time. In HLTNAACL'03.
Ide, N. and Priest-Dorman, G., 2000. Corpus encoding standard - document CES 1. Technical report, Vassar College, LORIA/ CNRS. Vandoeuvre-les-Nancy, France.
Jongejan, B. and Haltrup, D., 2005. The CST Lemmatiser, Copenhagen University, Denmark.
Lindberg, D., Humphreys, B. and McCray, A., 1993. The Unified Medical Language System. Methods of Information in Medicine, 32, 281-291.
Marcus, M., Santorini, B., and Marcinkiewicz, M., 1994. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19.
Marko K, Baud R, Zweigenbaum P, Merkel M, Toporowska-Gronostaj M, Kokkinakis D, Schulz S: Cross-Lingual Alignment of Medical Lexicons. In LREC 2006.
Megyesi, B., 2000. Comparing Data-Driven learning algorithms for PoS tagging of Swedish. In NoDaLiDa2001, 13th Nordic Conference on Computational Linguistics.
Megyesi, B., 2002. DataDriven Syntactic Analysis Methods and Applications for Swedish. PhD Thesis, Kungliga Tekniska Högskolan. Sweden.
Megyesi, B., 2002. Shallow parsing with POS taggers and linguistic features. Journal of machine leaning research, special issues on shallow parsing.
Melamed, D., 1995. Automatic evaluation of uniform filter cascades for inducing N-best translation lexicons. In 3rd Workshop on Very Large Corpora.
Merkel, M., 1999. Annotation Style guide for the PLUG link annotator. Technical Report, Linköping, University, Linkökping.
Nyström, M. Merkel M., Ahrenberg L., and Zweigenbaum P., 2006. Creating a medical English-Swedish dictionary using interactive word alignment. BMC Medical Informatics and Decision Making, 6(35).
Och FJ, Ney H., 2005. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1), 19-51.
Rinaldi, F., Dowdall, J., Schneider, G., and Persidis, A., 2004. Answering Questions in the Genomics Domain. In ACL'04, Workshop on Question Answering in Restricted Domains. ACL press.
SUC, 1997. SUC 1.0 Stockholm Umeå Corpus, Version 1.0. Umeå University and Stockholm University, Sweden.
Tiedemann, J., 1999. Word alignment - step by step. In NODALIDA'99, the 12th Nordic Conference on Computational Linguistics.
Tiedemann, J., 2003a. Combining Clues for Word Alignment. In EACL'03, 10th Conference of the European Chapter of the ACL. ACL press.
Tiedemann, J., 2003b. Recycling translations. Extraction of lexical data from parallel corpora and their application in natural language processing. PhD thesis, Uppsala University, Sweden.

Download

Paper Citation

in Harvard Style

Andrenucci A. (2010). CREATING A BILINGUAL PSYCHOLOGY LEXICON FOR CROSS LINGUAL QUESTION ANSWERING - A Follow up Study . In Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 2: WEBIST, ISBN 978-989-674-025-2, pages 95-102. DOI: 10.5220/0002803400950102

in Bibtex Style

@conference{webist10,
author={Andrea Andrenucci},
title={CREATING A BILINGUAL PSYCHOLOGY LEXICON FOR CROSS LINGUAL QUESTION ANSWERING - A Follow up Study},
booktitle={Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 2: WEBIST,},
year={2010},
pages={95-102},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002803400950102},
isbn={978-989-674-025-2},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 2: WEBIST,
TI - CREATING A BILINGUAL PSYCHOLOGY LEXICON FOR CROSS LINGUAL QUESTION ANSWERING - A Follow up Study
SN - 978-989-674-025-2
AU - Andrenucci A.
PY - 2010
SP - 95
EP - 102
DO - 10.5220/0002803400950102