AUTOMATIC MULTILINGUAL LEXICON GENERATION USING WIKIPEDIA AS A RESOURCE

Ahmad R. Shahid, Dimitar Kazakov

Abstract

This paper proposes a method for creating a multilingual dictionary by taking the titles of Wikipedia pages in English and then finding the titles of the corresponding articles in other languages. The creation of such multilingual dictionaries has become possible as a result of exponential increase in the size of multilingual information on the web. Wikipedia is a prime example of such multilingual source of information on any conceivable topic in the world, which is edited by the readers. Here, a web crawler has been used to traverse Wikipedia following the links on a given page. The crawler takes out the title along with the titles of the corresponding pages in other targeted languages. The result is a set of words and phrases that are translations of each other. For efficiency, the URLs are organized using hash tables. A lexicon has been constructed which contains 7-tuples corresponding to 7 different languages, namely: English, German, French, Polish, Bulgarian, Greek and Chinese.

References

  1. Apel, U. (2002). WaDokuJT - A Japanese-German Dictionary Database. In Papillon 2002 Seminar, Tokyo.
  2. Boitet, C., Mangeot-Lerebours, M., and Serasset, G. (2002). The PAPILLON Project: Cooperatively Building a Multilingual Lexical Data-base to Derive Open Source Dictionaries & Lexicons. In Proceedings of the 2nd Workshop NLPXML 2002, Post COLING 2002 Workshop, Taipei.
  3. Breen, J. (1995). Building an Electronic Japanese-English Dictionary. In Japanese Studies Association of Australia Conference.
  4. Breen, J. (2004). JMdict: a Japanese-Multilingual Dictionary. In Coling 2004 Workshop on Multilingual Linguistic Resources, pages 71-78, Geneva.
  5. Desperrier, J.-M. (2002). Analysis of the Results of a Collaborative Project for the Creation of a JapaneseFrench Dictionary. In Papillon 2002 Seminar, Tokyo.
  6. Lafourcade, M. (1997). Multilingual Dictionary Construction and Services Case Study with the Fe* Projects. In Proc. PACLING'97, pages 173-181.
  7. Mangeot-Lerebours, M. (2001). Environnements Centraliss et Distribus pour Lexicographes et Lexicologues en Contexte Multilingue. PhD thesis, Universite Joseph Fourier.
  8. Pirkola, A. (1998). The Effects of Query Structure and Dictionary Setups in Dictonary-Based Cross-Language Information Retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 55-63, Melbourne.
  9. Richman, A. and Schone, P. (2008). Mining Wiki Resrouces for Multilingual Named Entity Recognition. In Proceedings of ACL-08: HLT, pages 1-9, Columbus, Ohio, USA.
Download


Paper Citation


in Harvard Style

R. Shahid A. and Kazakov D. (2009). AUTOMATIC MULTILINGUAL LEXICON GENERATION USING WIKIPEDIA AS A RESOURCE . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8111-66-1, pages 357-360. DOI: 10.5220/0001783003570360


in Bibtex Style

@conference{icaart09,
author={Ahmad R. Shahid and Dimitar Kazakov},
title={AUTOMATIC MULTILINGUAL LEXICON GENERATION USING WIKIPEDIA AS A RESOURCE},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2009},
pages={357-360},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001783003570360},
isbn={978-989-8111-66-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - AUTOMATIC MULTILINGUAL LEXICON GENERATION USING WIKIPEDIA AS A RESOURCE
SN - 978-989-8111-66-1
AU - R. Shahid A.
AU - Kazakov D.
PY - 2009
SP - 357
EP - 360
DO - 10.5220/0001783003570360