Authors:
Najlah Gali
;
Andrei Tabarcea
and
Pasi Fränti
Affiliation:
University of Eastern Finland, Finland
Keyword(s):
Representative Image, Image Extraction, Web Page Information Extraction, Web Mining.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
A web page typically contains a blend of information. For a particular user, only informative data such as main content and representative images are considered useful, while non-informative data such as advertisements and navigational banners are not. In this work, we focus on selecting a representative image that would best represent the content of a web page. Existing techniques rely on prior knowledge of website specific templates and on text body. We extract all images, analyze and rank them according to their features and functionality in the web page. We select the highest scored image as the representative image. Our method is fully automated, template independent, and not limited to a certain type of web pages.