Extracting Representative Image from Web Page

Najlah Gali, Andrei Tabarcea, Pasi Fränti

2015

Abstract

A web page typically contains a blend of information. For a particular user, only informative data such as main content and representative images are considered useful, while non-informative data such as advertisements and navigational banners are not. In this work, we focus on selecting a representative image that would best represent the content of a web page. Existing techniques rely on prior knowledge of website specific templates and on text body. We extract all images, analyze and rank them according to their features and functionality in the web page. We select the highest scored image as the representative image. Our method is fully automated, template independent, and not limited to a certain type of web pages.

References

  1. Adam, G., Bouras, C., & Poulopoulos, V., 2010. Image Extraction from Online Text Streams: A Straightforward Template Independent Approach without Training. In Advanced Information Networking and Applications Workshops (WAINA), 24th International Conference, pp. 609-614. IEEE.
  2. Azad, H. K., Raj, R., Kumar, R., Ranjan, H., Abhishek, K., & Singh, M. P. 2014. Removal of Noisy Information in Web Pages. In Proceedings of the 2014 International Conference on Information and Communication Technology for Competitive Strategies. ACM.
  3. Fauzi, F., Hong, J. L., & Belkhatir, M. 2009. Webpage segmentation for extracting images and their surrounding contextual information. In Proceedings of the 17th ACM international conference on Multimedia, pp. 649-652. ACM.
  4. Fränti, P., Chen, J., & Tabarcea, A. 2011. Four Aspects of Relevance in Sharing Location-based Media: Content, Time, Location and Network. In WEBIST, pp. 413-417.
  5. Google+ platform, 2014, https:/developers.google.com/+/web/snippet/?hl=no.
  6. Gupta, S., Kaiser, G., Neistadt, D., & Grimm, P., 2003. DOM-based content extraction of HTML documents. In Proceedings of the 12th international conference on World Wide Web, pp. 207-214. ACM.
  7. Helfman, J. I., & Hollan, J. D., 2000. Image representations for accessing and organizing Web information. In Photonics West 2001-Electronic Imaging, pp. 91-101. International Society for Optics and Photonics.
  8. Hu, J., & Bagga, A., 2003. Functionality-Based Web Image Categorization. WWW (Posters), 2(003).
  9. Joshi, P. M., & Liu, S., 2009. Web document text and images extraction using DOM analysis and natural language processing. In Proceedings of the 9th ACM symposium on Document engineering, pp. 218-221. ACM.
  10. Kherfi, M. L., Ziou, D., & Bernardi, A., 2004. Image retrieval from the world wide web: Issues, techniques, and systems. ACM Computing Surveys (CSUR), 36(1), pp. 35-67.
  11. Kim, M., Kim, Y., Song, W., & Khil, A., 2013. Main Content Extraction from Web Documents Using Text.
  12. Block Context. In Database and Expert Systems Applications, pp. 81-93. Springer Berlin Heidelberg.
  13. Park, G., Baek, Y., & Lee, H. K. 2006. Web image retrieval using majority-based ranking approach. Multimedia Tools and Applications, 31(2), pp.195-219.
  14. Parmar, H. R., & Gadge, J., 2011. Removal of Image Advertisement from Web Page. International Journal of Computer Applications, 27(7).
  15. Tsymbalenko, Y., & Munson, E. V., 2001. Using HTML metadata to find relevant images on the world wide web. Proceedings of internet computing, 2, pp.842-848.
  16. Yu, S., Cai, D., Wen, J. R., & Ma, W. Y., 2003. Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In Proceedings of the 12th international conference on World Wide Web, pp. 11-18. ACM.
Download


Paper Citation


in Harvard Style

Gali N., Tabarcea A. and Fränti P. (2015). Extracting Representative Image from Web Page . In Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-106-9, pages 411-419. DOI: 10.5220/0005438704110419


in Bibtex Style

@conference{webist15,
author={Najlah Gali and Andrei Tabarcea and Pasi Fränti},
title={Extracting Representative Image from Web Page},
booktitle={Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2015},
pages={411-419},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005438704110419},
isbn={978-989-758-106-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Extracting Representative Image from Web Page
SN - 978-989-758-106-9
AU - Gali N.
AU - Tabarcea A.
AU - Fränti P.
PY - 2015
SP - 411
EP - 419
DO - 10.5220/0005438704110419