Marco Scotto, Tullio Vernazza, Alberto Sillitti, Giancarlo Succi


The heterogeneity and the lack of structure of World Wide Web make automated discovery, organization, and management of Web-based information a non-trivial task. Traditional search and indexing tools provide some comfort to users, but they generally provide neither structured information nor categorize, filter, or interpret documents in an automated way. In recent years, these factors have prompted the need for developing data mining techniques applied to the web, giving rise to the term “Web Mining”. This paper introduces the problem of web data extraction and gives a brief analysis of the various techniques to address it. Then, News Miner, a tool for Web Content Mining applied to the news retrieval is presented.


  1. Ashish N., Knoblock C., 1997. Wrapper Generation for Semi-structured Internet Sources. Workshop on Management of Semistructured Data, Ventana Canyon Resort, Tucson, Arizona.
  2. Cooley R., Mobasher B., Srivastava J., 1997. Web Mining: Information and Pattern Discovery on the World Wide Web, In ICTAI 7897, 9th International Conference on Tools with Artificial Intelligence.
  3. DOM (Document Object Model) specifications - web site:
  4. Etzioni O., 1996. The World Wide Web: quagmire or gold mine?, In Communications of the ACM 39(11).
  5. Kleinberg J. M., 1998. Authorative Sources in a Hyperlinked Enviroment, In Proc. of the ACM-SIAM Symposium on Discrete Algorithms.
  6. Mobasher B., Jain N., Han E.-H., Srivastava J., 1997. Web Mining: Patterns from WWW Transactions. Dept. Comput. Sci., Univ. Minnesota, Tech. Rep. TR96-050.
  7. Sillitti A., Vernazza T., Succi G., 2002. Service Oriented Programming: A New Paradigm of Software Reuse. In Seventh International Conference on Software Reuse ICSR-7.
  8. Srivastava J., Cooley R., Deshpande M., Tan P., 2000. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, In SIGKDD Explorations, Vol. 1, Issue 2, 2000.

Paper Citation

in Harvard Style

Scotto M., Vernazza T., Sillitti A. and Succi G. (2004). MANAGING WEB-BASED INFORMATION . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 972-8865-00-7, pages 575-578. DOI: 10.5220/0002633005750578

in Bibtex Style

author={Marco Scotto and Tullio Vernazza and Alberto Sillitti and Giancarlo Succi},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},

in EndNote Style

JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
SN - 972-8865-00-7
AU - Scotto M.
AU - Vernazza T.
AU - Sillitti A.
AU - Succi G.
PY - 2004
SP - 575
EP - 578
DO - 10.5220/0002633005750578