ACCURATE QUERY TRANSLATION FOR JAPANESE-ENGLISH CROSS-LANGUAGE INFORMATION RETRIEVAL

Vitaly Klyuev, Yannis Haralambous

2012

Abstract

In this paper, a novel approach to translate queries from Japanese into English for the CLIR task is discussed. To get all possible English senses for every Japanese term, the online dictionary SPACEALC is utilized. The EWC semantic relatedness measure is used to select the most related meanings for the results of translation. This measure combines the Wikipedia-based Explicit Semantic Analysis measure, the WordNet path measure and the mixed collocation index. The preliminary tests of the proposed technique are done utilizing the NTCIR data collection. The performance of retrieval is compared with the variant of retrieval using queries generated by Google Translate.

References

  1. Chen, A., Gey, F. C., Kishida, K., Jiang, H., and Liang, Q. (1999). Comparing Multiple Methods for Japanese and Japanese-English Text Retrieval, In Proc. The First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition.
  2. Chen, A., Gey, F. C., Kishida, K., Jiang, H., and Liang, Q. (1999). Comparing Multiple Methods for Japanese and Japanese-English Text Retrieval, In Proc. The First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition.
  3. Egozi, O., Markovitch, S., and Gabrilovich, E. (2011). Concept-Based Information Retrieval using Explicit Semantic Analysis. ACM Transactions on Information Systems, 29(2).
  4. Egozi, O., Markovitch, S., and Gabrilovich, E. (2011). Concept-Based Information Retrieval using Explicit Semantic Analysis. ACM Transactions on Information Systems, 29(2).
  5. Haralambous, Y. and Klyuev, V. (2011). A Semantic Relatedness Measure Based on Combined Encyclopedic, Ontological and Collocational Knowledge. In IJCNLP2011, Thailand.
  6. Haralambous, Y. and Klyuev, V. (2011). A Semantic Relatedness Measure Based on Combined Encyclopedic, Ontological and Collocational Knowledge. In IJCNLP2011, Thailand.
  7. Klyuev, V., and Haralambous Y. (2011). Query Expansion: Term Selection using the EWC Semantic Relatedness Measure, In FedCSIS 2011, Poland.
  8. Klyuev, V., and Haralambous Y. (2011). Query Expansion: Term Selection using the EWC Semantic Relatedness Measure, In FedCSIS 2011, Poland.
  9. MeCab: Yet Another Part-of-Speech and Morphological Analyzer. Retrieved November 18, 2011, from http://mecab.sourceforge.net/
  10. MeCab: Yet Another Part-of-Speech and Morphological Analyzer. Retrieved November 18, 2011, from http://mecab.sourceforge.net/
  11. Mitamura, T., Shima, H., Sakai, T., Kando, N., Mori, T., Takeda, K., Lin, C., Song, R., Lin, Chuan, and Lee., C. (2010). Overview of the NTCIR-8 ACLIA Tasks: Advanced Cross-Lingual Information Access. In: Proc. The 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access, Japan.
  12. Mitamura, T., Shima, H., Sakai, T., Kando, N., Mori, T., Takeda, K., Lin, C., Song, R., Lin, Chuan, and Lee., C. (2010). Overview of the NTCIR-8 ACLIA Tasks: Advanced Cross-Lingual Information Access. In: Proc. The 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access, Japan.
  13. Nie, J. (2011). Cross-Language Information Retrieval, Association for Computational Linguistics.
  14. Nie, J. (2011). Cross-Language Information Retrieval, Association for Computational Linguistics.
  15. Nguyen, D., Overwijk, A., Hauff, C., Trieschnigg, D., Hiemstra, D., and Franciska M. G. de Jong. (2009). WikiTranslate: Query Translation for Cross-Lingual Information Retrieval Using Only Wikipedia, CLEF 2008, LNCS 5706, 58-65.
  16. Nguyen, D., Overwijk, A., Hauff, C., Trieschnigg, D., Hiemstra, D., and Franciska M. G. de Jong. (2009). WikiTranslate: Query Translation for Cross-Lingual Information Retrieval Using Only Wikipedia, CLEF 2008, LNCS 5706, 58-65.
  17. NTCIR-1 CLIR data collection. Retrieved November 18, 2011, from http://research.nii.ac.jp/ntcir/data/dataen.html.
  18. NTCIR-1 CLIR data collection. Retrieved November 18, 2011, from http://research.nii.ac.jp/ntcir/data/dataen.html.
  19. Patwardhan, Banerjee, and Pedersen. (2007). UMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness, In: Proc. SemEval2007: 4th International Workshop on Semantic Evaluations, 390-393, Prague, Czech Republic.
  20. Patwardhan, Banerjee, and Pedersen. (2007). UMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness, In: Proc. SemEval2007: 4th International Workshop on Semantic Evaluations, 390-393, Prague, Czech Republic.
  21. Pinto1, D., Civera, J., Juan, A., Rosso, R., and BarronCedeno, A. (2009). A statistical approach to cross lingual natural language tasks. Journal of Algorithms Volume 64 Issue 1, 51 - 60.
  22. Pinto1, D., Civera, J., Juan, A., Rosso, R., and BarronCedeno, A. (2009). A statistical approach to cross lingual natural language tasks. Journal of Algorithms Volume 64 Issue 1, 51 - 60.
  23. Robertson, S., Walker, S., Beaulieu, M., Gatford, M., and Payne, A. (1995). Okapi at TREC-4, in Proc. TREC 4.
  24. Robertson, S., Walker, S., Beaulieu, M., Gatford, M., and Payne, A. (1995). Okapi at TREC-4, in Proc. TREC 4.
  25. Sorg., P., Cimiano, P. (2008). Cross-lingual Information Retrieval with Explicit Semantic Analysis. In CLEF 2008.
  26. Sorg., P., Cimiano, P. (2008). Cross-lingual Information Retrieval with Explicit Semantic Analysis. In CLEF 2008.
  27. Terrier. Retrieved November 18, 2011, from http://terrier. net
  28. Terrier. Retrieved November 18, 2011, from http://terrier. net
  29. TREC. Retrieved November 18, 2011, from http://trec.nist. gov/
  30. TREC. Retrieved November 18, 2011, from http://trec.nist. gov/
  31. SPACEALC. Retrieved November 18, 2011, from http:// www.alc.co.jp/
  32. SPACEALC. Retrieved November 18, 2011, from http:// www.alc.co.jp/
  33. Voorness, E. and Hartman, D. (eds.). (2005). TREC: experiment and evaluation in information retrieval. The MIT Press.
  34. Voorness, E. and Hartman, D. (eds.). (2005). TREC: experiment and evaluation in information retrieval. The MIT Press.
  35. Word frequency lists and dictionary. Retrieved November 18, 2011, from http://www.wordfrequency.info/
  36. Word frequency lists and dictionary. Retrieved November 18, 2011, from http://www.wordfrequency.info/
  37. Xiaoning, H., Peidong, W., Haoliang, Q., Muyun, Y., Guohua, L., and Yong, X. (2008). Using Google Translation in Cross-Lingual Information Retrieval, Proc. NTCIR-7 Workshop Meeting, Tokyo, Japan.
  38. Xiaoning, H., Peidong, W., Haoliang, Q., Muyun, Y., Guohua, L., and Yong, X. (2008). Using Google Translation in Cross-Lingual Information Retrieval, Proc. NTCIR-7 Workshop Meeting, Tokyo, Japan.
Download


Paper Citation


in Harvard Style

Klyuev V. and Haralambous Y. (2012). ACCURATE QUERY TRANSLATION FOR JAPANESE-ENGLISH CROSS-LANGUAGE INFORMATION RETRIEVAL . In Proceedings of the 2nd International Conference on Pervasive Embedded Computing and Communication Systems - Volume 1: PECCS, ISBN 978-989-8565-00-6, pages 214-219. DOI: 10.5220/0003905902140219


in Harvard Style

Klyuev V. and Haralambous Y. (2012). ACCURATE QUERY TRANSLATION FOR JAPANESE-ENGLISH CROSS-LANGUAGE INFORMATION RETRIEVAL . In Proceedings of the 2nd International Conference on Pervasive Embedded Computing and Communication Systems - Volume 1: PECCS, ISBN 978-989-8565-00-6, pages 214-219. DOI: 10.5220/0003905902140219


in Bibtex Style

@conference{peccs12,
author={Vitaly Klyuev and Yannis Haralambous},
title={ACCURATE QUERY TRANSLATION FOR JAPANESE-ENGLISH CROSS-LANGUAGE INFORMATION RETRIEVAL},
booktitle={Proceedings of the 2nd International Conference on Pervasive Embedded Computing and Communication Systems - Volume 1: PECCS,},
year={2012},
pages={214-219},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003905902140219},
isbn={978-989-8565-00-6},
}


in Bibtex Style

@conference{peccs12,
author={Vitaly Klyuev and Yannis Haralambous},
title={ACCURATE QUERY TRANSLATION FOR JAPANESE-ENGLISH CROSS-LANGUAGE INFORMATION RETRIEVAL},
booktitle={Proceedings of the 2nd International Conference on Pervasive Embedded Computing and Communication Systems - Volume 1: PECCS,},
year={2012},
pages={214-219},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003905902140219},
isbn={978-989-8565-00-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pervasive Embedded Computing and Communication Systems - Volume 1: PECCS,
TI - ACCURATE QUERY TRANSLATION FOR JAPANESE-ENGLISH CROSS-LANGUAGE INFORMATION RETRIEVAL
SN - 978-989-8565-00-6
AU - Klyuev V.
AU - Haralambous Y.
PY - 2012
SP - 214
EP - 219
DO - 10.5220/0003905902140219


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pervasive Embedded Computing and Communication Systems - Volume 1: PECCS,
TI - ACCURATE QUERY TRANSLATION FOR JAPANESE-ENGLISH CROSS-LANGUAGE INFORMATION RETRIEVAL
SN - 978-989-8565-00-6
AU - Klyuev V.
AU - Haralambous Y.
PY - 2012
SP - 214
EP - 219
DO - 10.5220/0003905902140219