Authors:
Marco Scotto
1
;
Tullio Vernazza
1
;
Alberto Sillitti
2
and
Giancarlo Succi
2
Affiliations:
1
DIST – Università di Genova, Italy
;
2
Libera Università di Bolzano, Italy
Keyword(s):
Web Mining, Information Retrieval
Related
Ontology
Subjects/Areas/Topics:
Coupling and Integrating Heterogeneous Data Sources
;
Databases and Information Systems Integration
;
Enterprise Information Systems
Abstract:
The heterogeneity and the lack of structure of World Wide Web make automated discovery, organization, and management of Web-based information a non-trivial task. Traditional search and indexing tools provide some comfort to users, but they generally provide neither structured information nor categorize, filter, or interpret documents in an automated way. In recent years, these factors have prompted the need for developing data mining techniques applied to the web, giving rise to the term “Web Mining”. This paper introduces the problem of web data extraction and gives a brief analysis of the various techniques to address it. Then, News Miner, a tool for Web Content Mining applied to the news retrieval is presented.