loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Estella Annoni 1 and C. I. Ezeife 2

Affiliations: 1 University of Toulouse, France ; 2 University of Windsor, Canada

Keyword(s): Web data model, Object-oriented mining, Automatic web data extraction.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Biomedical Engineering ; Business Analytics ; Data Engineering ; Data Mining ; Databases and Information Systems Integration ; Datamining ; Enterprise Information Systems ; Health Information Systems ; Information Systems Analysis and Specification ; Modeling Formalisms, Languages and Notations ; Object-Oriented Database Systems ; Sensor Networks ; Signal Processing ; Soft Computing ; Web Databases

Abstract: Traditionally, mining web page contents involves modeling their contents to discover the underlying knowledge. Data extraction proposals represent web data in a formal structure such as database structures specific to application domains. Those models fail to catch the full diversity of web data structures which can be composed of different types of contents, and can be also unstructured. In fact, with these proposals, it is not possible to focus on a given type of contents, to work on data of different structures and to mine on data of different application domains as required to mine efficiently a given content type or web documents from different domains. On top of that, since web pages are designed to be understood by users, this paper considers modeling of web document presentations expressed through HTML tag attributes as useful for an efficient web content mining. Hence, this paper provides a general framework composed of an object-oriented web data model based on HTML tags an d algorithms for web content and web presentation object extraction from any given web document. From the HTML code of a web document, web objects are extracted for mining, regardless of the domain. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.81.139.99

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Annoni, E. and I. Ezeife, C. (2009). MODELING WEB DOCUMENTS AS OBJECTS FOR AUTOMATIC WEB CONTENT EXTRACTION - Object-oriented Web Data Model. In Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 3: ICEIS; ISBN 978-989-8111-84-5; ISSN 2184-4992, SciTePress, pages 91-100. DOI: 10.5220/0001967400910100

@conference{iceis09,
author={Estella Annoni. and C. {I. Ezeife}.},
title={MODELING WEB DOCUMENTS AS OBJECTS FOR AUTOMATIC WEB CONTENT EXTRACTION - Object-oriented Web Data Model},
booktitle={Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 3: ICEIS},
year={2009},
pages={91-100},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001967400910100},
isbn={978-989-8111-84-5},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 3: ICEIS
TI - MODELING WEB DOCUMENTS AS OBJECTS FOR AUTOMATIC WEB CONTENT EXTRACTION - Object-oriented Web Data Model
SN - 978-989-8111-84-5
IS - 2184-4992
AU - Annoni, E.
AU - I. Ezeife, C.
PY - 2009
SP - 91
EP - 100
DO - 10.5220/0001967400910100
PB - SciTePress