CREATING A BILINGUAL PSYCHOLOGY LEXICON FOR CROSS LINGUAL QUESTION ANSWERING - A Pilot Study

Andrea Andrenucci

Abstract

This paper introduces a pilot study aimed at investigating the extraction of word relations from a sample of a medical parallel corpus in the field of Psychology. Word relations are extracted in order to create a bilingual lexicon for cross lingual question answering between Swedish and English. Four different variants of the sample corpus were utilized: word inflections with and without POS tagging, lemmas with and without POS tagging. The purpose of the study was to analyze the quality of the word relations obtained from the different versions of the corpus and to understand which version of the corpus was more suitable for extracting a bilingual lexicon in the field of psychology. The word alignments were evaluated with the help of reference data (gold standards), which were constructed before the word alignment process.

References

  1. Ahrenberg, L., Merkel, M., Sågvall Hein, A., & Tiedemann, J., 2000. Evaluation of Word Alignment Systems. In LREC'00, 2nd International Conference on Linguistic Resources and Evaluation.
  2. Andrenucci, A., 2006. Medical Information Portals: an Empirical Study of Personalized Search Mechanisms and Search Interfaces. In ICEIS'06, 8th International Conference on Enterprise Information Systems. INSTICC Press.
  3. Aunino, L, Kuuskoski, R., & Makkonen, J., 2004. CrossLanguage Question Answering at the University of Helsinky. In CLEF' 04, Cross Language Evaluation Forum.
  4. Brants, T., 2000. TnT - A statistical Part-of-Speech Tagger. In ANLP-2000, 6th Conference on Applied Natural Language Processing.
  5. Brown, P., Della Pietra, S., Della Pietra, V., & Mercer, R., 1993. The mathematics of statistical machine translation Parameter estimation [Electronic version]. Computational Linguistics, 19, 263-311
  6. Carlberger, J., Dalianis, H., Hassel, M.,& Knutsson O., 2001. Improving Precision in Information Retrieval for Swedish using stemming. In NoDaLiDa 2001, 13th Nordic Conference on Computational Linguistics.
  7. Ejerhed, E., & Ridings, D., 1995. Parole and SUC, http://spraakbanken.gu.se/parole/sgml2suc.html.
  8. Germann, U., 2003. Greedy decoding for statistical machine translation in almost linear time. In HLTNAACL'03. ACL press.
  9. Ide, N., & Priest-Dorman, G., 2000. Corpus encoding standard - document CES 1. Technical report, Vassar College, LORIA/ CNRS. Vandoeuvre-les-Nancy, France.
  10. Jongejan, B., & Haltrup, D., 2005. The CST Lemmatiser, Retrieved October 10, 2006, from Copenhagen University: http://cst.dk/download/cstlemma/current/doc/.
  11. Lindberg, D., Humphreys, B., & McCray, A., 1993. The Unified Medical Language System [Electronic version]. Methods of Information in Medicine, 32, 281-291.
  12. Loukachevitch, N., & Dobrov, B., 2004. Development of Bilingual Domain-Specific Ontology for Automatic Conceptual Indexing, In LREC'04.
  13. Marcus, M., Santorini, B., & Marcinkiewicz, M., 1994. Building a large annotated corpus of English: The Penn Treebank [Electronic version]. Computational Linguistics, 19.
  14. Megyesi, B., 2000. Comparing Data-Driven learning algorithms for PoS tagging of Swedish. In NoDaLiDa2001.
  15. Megyesi, B., 2002. DataDriven Syntactic Analysis Methods and Applications for Swedish. PhD Thesis, Kungliga Tekniska Högskolan. Sweden.
  16. Melamed, D., 1995. Automatic evaluation of uniform filter cascades for inducing N-best translation lexicons. In 3rd Workshop on Very Large Corpora.
  17. Merkel, M., 1999. Annotation Style guide for the PLUG link annotator. Technical Report, Linköping, University, Sweden.
  18. Nyström, M., Merkel, M., Ahrenberg, L., et al., 2006. Creating a medical English-Swedish dictionary using interactive word alignment in BMC medical informatics and decision making [Electronic version]. BMC Medical Informatics and Decision Making, 6.
  19. Och, F. J., & Ney, H., 2003. A Systematic Comparison of Various Statistical Alignment Models [Electronic version]. Computational Linguistics, 29.
  20. Rinaldi, F., Dowdall, J., Schneider, G., & Persidis, A., 2004. Answering Questions in the Genomics Domain. In ACL'04, Workshop on Question Answering in Restricted Domains. ACL press.
  21. Sneiders, E., 2002. Automated Question Answering: Template-Based Approach. PhD thesis, Royal Institute of Technology, Sweden.
  22. SUC, 1997. SUC 1.0 Stockholm Umeå Corpus, Version 1.0. Umeå University and Stockholm University, Sweden.
  23. Tiedemann, J., 1999. Word alignment - step by step. In NODALIDA'99, the 12th Nordic Conference on Computational Linguistics.
  24. Tiedemann, J., 2003a. Combining Clues for Word Alignment. In EACL'03, 10th Conference of the European Chapter of the ACL. ACL press.
  25. Tiedemann, J., 2003b. Recycling translations. Extraction of lexical data from parallel corpora and their application in natural language processing. PhD thesis, Uppsala University, Sweden.
  26. Weijnitz, P., Forsbom, E., Gustavii, E., Pettersson, E.,& Tiedemann, J., 2004. MT goes farming: Comparing two machine translation approaches on a new domain. In LREC'04, 4th International Conference on Language Resources and Evaluation.
Download


Paper Citation


in Harvard Style

Andrenucci A. (2007). CREATING A BILINGUAL PSYCHOLOGY LEXICON FOR CROSS LINGUAL QUESTION ANSWERING - A Pilot Study . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-972-8865-89-4, pages 129-136. DOI: 10.5220/0002366601290136


in Bibtex Style

@conference{iceis07,
author={Andrea Andrenucci},
title={CREATING A BILINGUAL PSYCHOLOGY LEXICON FOR CROSS LINGUAL QUESTION ANSWERING - A Pilot Study},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2007},
pages={129-136},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002366601290136},
isbn={978-972-8865-89-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - CREATING A BILINGUAL PSYCHOLOGY LEXICON FOR CROSS LINGUAL QUESTION ANSWERING - A Pilot Study
SN - 978-972-8865-89-4
AU - Andrenucci A.
PY - 2007
SP - 129
EP - 136
DO - 10.5220/0002366601290136