DOES CAPITALIZATION MATTER IN WEB SEARCH?

Silviu Cucerzan

Abstract

We investigate the capitalization features of queries submitted to Web search engines and the relation between capitalization information, either as received from users or as hypothesized based on Web statistics, and search relevance. We observe that users tend to lowercase words in their queries significantly more often than as predicted from Web data. More importantly, we determine that document relevance is strongly correlated with the matching in capitalization between the instances of query tokens in the target document and the tokens of the truecased form of the query as obtained by using Web n-gram data.

References

  1. Batista, F., Marmede, N., and Trancoso, I. 2008. Language Dynamics and Capitalization using Maximum Entropy. In Proceedings of ACL 2008: HLT Companion volume, pages 1-4.
  2. Brants, T. and Franz, G. 2006. Web 1T 5-gram Version 1. Linguistic Data Consortium, Catalog ID: LDC2006T13.
  3. Chelba, C. and Acero, A. 2004. Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lot. In Proceedings of EMNLP 2004, pages 285-292.
  4. Church, K. 1995. One Term or Two? In Proceedings of SIGIR 1995, pages 310-318.
  5. Cucerzan, S. 2010. A Case Study of Using Web Search Statistics: Case Restoration. In Proceedings of CICLing 2010, LNCS 6008, pages 199-211.
  6. Lita, L. V., Ittycheriah, A., Roukos, S., and Kambhatla, N. 2003. tRuEcasIng. In Proceedings of ACL 2003, pages 152-159.
  7. Mikev, A. 1999. A Knowledge-free Method for Capitalized Word Disambiguation. In Proceedings of ACL 1999, pages 159-166.
Download


Paper Citation


in Harvard Style

Cucerzan S. (2010). DOES CAPITALIZATION MATTER IN WEB SEARCH? . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 302-306. DOI: 10.5220/0003102503020306


in Bibtex Style

@conference{kdir10,
author={Silviu Cucerzan},
title={DOES CAPITALIZATION MATTER IN WEB SEARCH?},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={302-306},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003102503020306},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - DOES CAPITALIZATION MATTER IN WEB SEARCH?
SN - 978-989-8425-28-7
AU - Cucerzan S.
PY - 2010
SP - 302
EP - 306
DO - 10.5220/0003102503020306