CLASSIFYING WEB PAGES WITH VISUAL FEATURES

Viktor de Boer, Maarten van Someren, Tiberiu Lupascu

Abstract

To automatically classify and process web pages, current systems use the textual content of those pages, including both the displayed content and the underlying (HTML) code. However, a very important feature of a web page is its visual appearance. In this paper, we show that using generic visual features we can classify the web pages for several different types of tasks. The features used in this document are simple color and edge histograms, Gabor and texture features. These were extracted using an off-the-shelf visual feature extraction method. In three experiments, we classify web pages by their aesthetic value, their recency and the type of website. Results show that these simple, global visual features already produce good classification results. We also introduce an online tool that uses the trained classifiers to assess new web pages.

References

  1. Amento, B., Terveen, L., and Hill, W. (2000). Does ”authority” mean quality? predicting expert quality ratings of web documents. In In Proc. ACM SIGIR 2000, pages 296-303. ACM.
  2. Andrade, L. (2009). The worlds ugliest websites!!! http://www.nikibrown.com/designoblog/2009/03/03/theworlds-ugliest-websites/ retrieved October 2009.
  3. Crazyleafdesign.com (2009). 40 most beautiful and inspirational website designs of 2008. http://www.crazyleafdesign.com/blog/top-40- beautiful-and-inspirational-website-designs-of-2008/ retrieved October 2009.
  4. Ester, M., Kriegel, H.-P., and Schubert, M. (2002). Web site mining: a new way to spot competitors, customers and suppliers in the world wide web. In KDD, pages 249- 258. ACM.
  5. Evers, V. and Day, D. L. (1997). The role of culture in interface acceptance. In Howard, S., Hammond, J., and Lindgaard, G., editors, INTERACT, volume 96 of IFIP Conference Proceedings, pages 260-267. Chapman & Hall.
  6. Fogg, B. J., Marshall, J., Laraki, O., Osipovich, A., Varma, C., Fang, N., Paul, J., Rangnekar, A., Shon, J., Swani, P., and Treinen, M. (2001). What makes web sites credible?: a report on a large quantitative study. In CHI 7801: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 61-68, New York, NY, USA. ACM.
  7. Hollink, V., de Boer, V., and van Someren, M. (2009). Siteguide: An example-based approach to web site development assistance. In Filipe, J. and Cordeiro, J., editors, WEBIST, pages 143-150. INSTICC Press.
  8. Kwon, O.-W. and Lee, J.-H. (2003). Text categorization based on k-nearest neighbor approach for web site classification. Inf. Process. Manage., 39(1):25-44.
  9. Lux, M. and Chatzichristofis, S. A. (2008). Lire: lucene image retrieval: an extensible java cbir library. In MM 7808: Proceeding of the 16th ACM international conference on Multimedia, pages 1085-1088, New York, NY, USA. ACM.
  10. Mandl, T. (2006). Implementation and evaluation of a quality-based search engine. In HYPERTEXT 7806: Proceedings of the seventeenth conference on Hypertext and hypermedia, pages 73-84, New York, NY, USA. ACM.
  11. Moss, G., Gunn, R., and Heller, J. (2006). Some men like it black, some women like it pink: consumer implications of differences in male and female website design. Journal of Consumer Behaviour, 5:328-341.
  12. Park, D. K., Jeon, Y. S., and Won, C. S. (2000). Efficient use of local edge histogram descriptor. In MULTIMEDIA 7800: Proceedings of the 2000 ACM workshops on Multimedia, pages 51-54, New York, NY, USA. ACM.
  13. Tamura, H., Mori, T., and Yamawaki, T. (1978). Textural features corresponding to visual perception. Systems, Man, and Cybernetics Society, 8:460-473.
Download


Paper Citation


in Harvard Style

de Boer V., van Someren M. and Lupascu T. (2010). CLASSIFYING WEB PAGES WITH VISUAL FEATURES . In Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST, ISBN 978-989-674-025-2, pages 245-252. DOI: 10.5220/0002804102450252


in Bibtex Style

@conference{webist10,
author={Viktor de Boer and Maarten van Someren and Tiberiu Lupascu},
title={CLASSIFYING WEB PAGES WITH VISUAL FEATURES},
booktitle={Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,},
year={2010},
pages={245-252},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002804102450252},
isbn={978-989-674-025-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,
TI - CLASSIFYING WEB PAGES WITH VISUAL FEATURES
SN - 978-989-674-025-2
AU - de Boer V.
AU - van Someren M.
AU - Lupascu T.
PY - 2010
SP - 245
EP - 252
DO - 10.5220/0002804102450252