SEMANTIC INDEXING OF WEB PAGES VIA PROBABILISTIC METHODS - In Search of Semantics Project

Fabio Clarizia, Francesco Colace, Massimo De Santo, Paolo Napoletano

Abstract

In this paper we address the problem of modeling large collections of data, namely web pages by exploiting jointly traditional information retrieval techniques with probabilistic ones in order to find semantic descriptions for the collections. This novel technique is embedded in a real Web Search Engine in order to provide semantics functionalities, as prediction of words related to a single term query. Experiments on different small domains (web repositories) are presented and discussed.

References

  1. Aldous, D. (1985). Exchangeability and related topics. In Springer, B., editor, Ecole d'ete de probabilites de Saint-Flour XIII- 1983, pages 1-198.
  2. Berners-Lee, T., Hendler, J., and Lassila, O. (2001). The semantic web. Scientific American, May.
  3. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
  4. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(993-1022).
  5. Brin, S. (1998). The anatomy of a large-scale hypertextual web search engine. In Computer Networks and ISDN Systems, pages 107-117.
  6. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391-407.
  7. Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (1995). Bayesian data analysis. New York: Chapman & Hall.
  8. Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the Twenty-Second Annual International SIGIR Conference.
  9. R., B.-Y. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. ACM Press, New York.
  10. Salton, G. and McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill.
  11. T. L. Griffiths, M. Steyvers, J. B. T. (2007). Topics in semantic representation. Psychological Review, 114(2):211-244.
Download


Paper Citation


in Harvard Style

Clarizia F., Colace F., De Santo M. and Napoletano P. (2009). SEMANTIC INDEXING OF WEB PAGES VIA PROBABILISTIC METHODS - In Search of Semantics Project . In Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 4: ICEIS, ISBN 978-989-8111-87-6, pages 134-140. DOI: 10.5220/0002010401340140


in Bibtex Style

@conference{iceis09,
author={Fabio Clarizia and Francesco Colace and Massimo De Santo and Paolo Napoletano},
title={SEMANTIC INDEXING OF WEB PAGES VIA PROBABILISTIC METHODS - In Search of Semantics Project},
booktitle={Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 4: ICEIS,},
year={2009},
pages={134-140},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002010401340140},
isbn={978-989-8111-87-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 4: ICEIS,
TI - SEMANTIC INDEXING OF WEB PAGES VIA PROBABILISTIC METHODS - In Search of Semantics Project
SN - 978-989-8111-87-6
AU - Clarizia F.
AU - Colace F.
AU - De Santo M.
AU - Napoletano P.
PY - 2009
SP - 134
EP - 140
DO - 10.5220/0002010401340140