A SEARCH ENGINE FOR WEB IMAGES USING DOCUMENT TEXT STEMMING

Ryan Hardt, Ethan V. Munson, Hien Nguyen

Abstract

A Web image search application was built using a previously-developed image relevance model for retrieval of images via text-based image retrieval. The application includes a text stemmer that converts a word to a canonical form, making it possible to match text in the face of changes in tense or plurality that have little effect on semantics. The usefulness of stemming in Web image retrieval was evaluated via a test on ten queries that were submitted both with and without stemming. Relevance of retrieved images was determined via ratings by three trained individuals. With stemming, the average unique relevance recall (a measure of the proportion of relevant images returned by one algorithm and not another) was 27.7%, while without stemming, it was only 0.5%. These results may more accurately apply to queries containing at least one plural noun, present tense verb, present participle verb, or past tense verb.

References

  1. Google, 2007, "Google." http://www.google.com/.
  2. Google, 2007, "Google Image Search."http:// images.google.com
  3. Gulli, A., Signorini, A., 2005, "The Indexable Web is More Than 11.5 Billion Pages," World Wide Web Conference 2005.
  4. Harman, D., 1991, "How effective is suffixing?," Journal of the American Society for Information Science, Vol. 42(1), pp. 7-15.
  5. Hull, D. A., 1996, "Stemming Algorithms: A Case Study for Detailed Evaluation," Journal of the American Society for Informational Science, Vol.47, No.1, pp.70-84.
  6. Kowalski, G., 1997, Information Retrieval Systems - Theory and Implementation, Springer, pp. 223-233.
  7. Kraaij, W., Pohlmann, R., 1996, "Viewing stemming as recall enhancement," ACM Special Interest Group on Information Retrieval 7896.
  8. Paice, C. D., 1994, "An Evaluation Method for Stemming Algorithms," Proceedings of the 17th annual international ACM Special Interest Group on Information Retrieval.
  9. Porter, M., 2007, "Snowball." http://snowball.tartarus.org Tars, A., 1976, "Stemming as a System Design Consideration," 5th Annual Ada Semantics Iinterface Specification Conference.
  10. Thao, C., Munson, E., 2005, "A Relevance Model for Web Image Search," Workshop on Web Document Analysis 2003.
  11. Van Rijsbergen, C., Robertson, S., Porter, M., 1980, New Models in Probabilistic Information Retrieval, British Library Research and Development Report.
  12. Yahoo! Inc., 2007, "Yahoo Image Search." http:// images.search.yahoo.com/.
  13. Zhang, C., Chai, J. Y., Jin R., 2005, "User Term Feedback in Interactive Text based Image Retrieval," ACM Special Interest Group on Information Retrieval 7805.
Download


Paper Citation


in Harvard Style

Hardt R., V. Munson E. and Nguyen H. (2008). A SEARCH ENGINE FOR WEB IMAGES USING DOCUMENT TEXT STEMMING . In Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-989-8111-27-2, pages 223-230. DOI: 10.5220/0001526502230230


in Bibtex Style

@conference{webist08,
author={Ryan Hardt and Ethan V. Munson and Hien Nguyen},
title={A SEARCH ENGINE FOR WEB IMAGES USING DOCUMENT TEXT STEMMING},
booktitle={Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2008},
pages={223-230},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001526502230230},
isbn={978-989-8111-27-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - A SEARCH ENGINE FOR WEB IMAGES USING DOCUMENT TEXT STEMMING
SN - 978-989-8111-27-2
AU - Hardt R.
AU - V. Munson E.
AU - Nguyen H.
PY - 2008
SP - 223
EP - 230
DO - 10.5220/0001526502230230