QUERY PROCESSING FOR ENTERPRISE SEARCH WITH WIKIPEDIA LINK STRUCTURE

Nihar Sharma, Vasudeva Varma

Abstract

We present a phrase based query expansion (QE) technique for enterprise search using a domain independent concept thesaurus constructed from Wikipedia link structure. Our approach analyzes article and category link information for deriving sets of related concepts for building up the thesaurus. In addition, we build a vocabulary set containing natural word order and usage which semantically represent concepts. We extract query-representational concepts from vocabulary set with a three layered approach. Concept Thesaurus then yields related concepts for expanding a query. Evaluation on TRECENT 2007 data shows an impressive 9 percent increase in recall for fifty queries. In addition to we also observed that our implementation improves precision at top k results by 0.7, 1, 6 and 9 percent for top 10, top 20, top 50 and top 100 search results respectively, thus demonstrating the promise that Wikipedia based thesaurus holds in domain specific search.

References

  1. Demartini G., 2007. Leveraging semantic technologies for enterprise search. In Proceedings of the ACM first Ph.D. workshop in CIKM, PIKM. ACM press.
  2. Mukherjee R., Mao J., 2004. Enterprise Search: Tough Stuff. In Queue, Volume 2, Issue 2. ACM press.
  3. Ding C., 2005. Probabilistic model for Latent Semantic Indexing: Research Articles. In Journal of the American Society for Information Science and Technology, Volume 56 Issue 6. ASIS&T.
  4. Ricardo A. Baeza-Yates , Ribeiro-Neto B., 1996. Modern Information Retrieval. Addison-Wesley, Longman Publishing Co., Inc., Boston, MA.
  5. Xu J, Croft W. B., 1996. Query expansion using local and global document analysis. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM press.
  6. Medelyan O., Milne D., Legg C., Witten I. H., 2009. Mining meaning from Wikipedia. In International Journal of Human-Computer Studies, Volume 67, Issue 9. Elsevier.
  7. Milne D., Witten I. H., Nichols D. M., 2007. A knowledge-based search engine powered by wikipedia. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (CIKM), Lisbon, Portugal. ACM press.
  8. Milne D., Medelyan O., Witten I. H., 2006. Mining Domain-Specific Thesauri from Wikipedia: A Case Study. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE.
  9. Masahiro Ito, Kotaro Nakayama, Takahiro Hara, Shojiro Nishio, 2008. Association thesaurus construction methods based on link co-occurrence analysis for wikipedia. In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM 7808). ACM press.
  10. Kotaro Nakayama, Takahiro Hara, Shojiro Nishio, 2007. A Thesaurus Construction Method from Large ScaleWeb Dictionaries. In Proceeding of the 21st International Conference on Advanced Networking and Applications. IEEE.
  11. Varma V., Pingali P, Sharma N., 2009. Role Based Personalization in Enterprise Search. 4th Indian International Conference on Artificial Intelligence, Special Session on Web 2.0 and Natural Language Engineering Tasks, Bangalore, India.
  12. TREC Enterprise Track: TRECENT 2007. http:// trec.nist.gov/tracks.html
Download


Paper Citation


in Harvard Style

Sharma N. and Varma V. (2010). QUERY PROCESSING FOR ENTERPRISE SEARCH WITH WIKIPEDIA LINK STRUCTURE . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 243-249. DOI: 10.5220/0003093702430249


in Bibtex Style

@conference{kdir10,
author={Nihar Sharma and Vasudeva Varma},
title={QUERY PROCESSING FOR ENTERPRISE SEARCH WITH WIKIPEDIA LINK STRUCTURE },
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={243-249},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003093702430249},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - QUERY PROCESSING FOR ENTERPRISE SEARCH WITH WIKIPEDIA LINK STRUCTURE
SN - 978-989-8425-28-7
AU - Sharma N.
AU - Varma V.
PY - 2010
SP - 243
EP - 249
DO - 10.5220/0003093702430249