loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Andrea Horch ; Holger Kett and Anette Weisbecker

Affiliation: Fraunhofer Institute for Industrial Engineering IAO, Germany

Keyword(s): Web Data Extraction, Product Record Extraction, Tag Path Clustering.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Data Mining ; Databases and Information Systems Integration ; Enterprise Information Systems ; Internet Technology ; Searching and Browsing ; Sensor Networks ; Signal Processing ; Soft Computing ; Web Information Systems and Technologies ; Web Interfaces and Applications ; Web Services and Web Engineering

Abstract: Gathering product records from the Web is very important to both shoppers and on-line retailers for the purpose of comparing products and prices. For consumers, the reason for doing this is to find the best price for a product, whereas on-line retailers want to compare their offers with those of their competitors in order to remain competitive. Due to the huge number and vast array of product offers in the Web an automated approach for collecting product data is needed. In this paper we propose a lightweight approach to automatically identify and extract product records from arbitrary e-shop websites. For this purpose we have adopted and extended the existing technique called Tag Path Clustering for clustering similar HTML tag paths and developed a novel filtering mechanism especially for extracting product records from websites.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.220.97.161

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Horch, A.; Kett, H. and Weisbecker, A. (2015). A Lightweight Approach for Extracting Product Records from the Web. In Proceedings of the 11th International Conference on Web Information Systems and Technologies - WEBIST; ISBN 978-989-758-106-9; ISSN 2184-3252, SciTePress, pages 420-430. DOI: 10.5220/0005441404200430

@conference{webist15,
author={Andrea Horch. and Holger Kett. and Anette Weisbecker.},
title={A Lightweight Approach for Extracting Product Records from the Web},
booktitle={Proceedings of the 11th International Conference on Web Information Systems and Technologies - WEBIST},
year={2015},
pages={420-430},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005441404200430},
isbn={978-989-758-106-9},
issn={2184-3252},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Web Information Systems and Technologies - WEBIST
TI - A Lightweight Approach for Extracting Product Records from the Web
SN - 978-989-758-106-9
IS - 2184-3252
AU - Horch, A.
AU - Kett, H.
AU - Weisbecker, A.
PY - 2015
SP - 420
EP - 430
DO - 10.5220/0005441404200430
PB - SciTePress