XHITS - Multiple Roles in a Hyperlinked Structure

Francisco Benjamim Filho, Raul Pierre Renteria, Ruy Luiz Milidiú

2009

Abstract

The WWW is a huge and rich environment. Web pages can be viewed as a large community of elements that are connected through links due to several issues. The HITS approach introduces two basic concepts, hubs and authorities, that reveal some hidden semantic information from the links. In this paper, we present XHITS, a generalization of HITS, that models multiple classes problems and a machine learning algorithm to calibrate it. We split classification influence into two sources. The first one is due to link propagation, whereas the second one is due to classification reinforcement. We derive a simple linear iterative equation to compute the classification values. We also provide an influence equation that shows how the two influence sources can be combined. Two special cases are explored: symmetric reinforcement and positive reinforcement. We show that for these two special cases the iterative scheme converges. Some illustrative examples and empirical test are also provided. They indicate that XHITS is a powerful and efficient modeling approach.

References

  1. Agichtein, E., Brill, E., and Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In SIGIR 7806: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19-26, New York, NY, USA. ACM.
  2. Agosti, M. and Pretto, L. (2005). A theoretical study of a generalized version of kleinberg's hits algorithm. Inf. Retr., 8(2):219-243.
  3. Borodin, A., Roberts, G. O., Rosenthal, J. S., and Tsaparas, P. (2001). Finding authorities and hubs from link structures on the world wide web.
  4. Chakrabarti, S., Joshi, M., and Tawde, V. (2001). Enhanced topic distillation using text, markup tags, and hyperlinks. pages 208-216.
  5. Cohn, D. and Chang, H. (2000). Learning to probabilistically identify authoritative documents.
  6. Ding, C., He, X., Husbands, P., Zha, H., and Simon, H. D. (2002a). Pagerank, HITS and a unified framework for link analysis. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Poster session, pages 353-354.
  7. Ding, C., Zha, H., Simon, H., and He, X. (2002b). Link analysis: Hubs and authorities on the world wide web.
  8. Filho, F. B. (2005). Xhits: Extending the hits algorithm for distillation of broad search topic on www. Master's thesis, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil.
  9. Fowler, R. H. and Karadayi, T. (2002). Visualizing the web as hubs and authorities richard H. fowler and tarkan karadayi.
  10. Giles, C. L., Flake, G. W., and Lawrence, S. (2000). Efficient identification of web communities.
  11. Kalaba, R., Spingarn, K., and Tesfatsion, L. (1981). Variational equations for the eigenvalues and eigenvectors of nonsymmetric matrices. Journal of Optimization Theory and Applications: Vol. 33, No. 1.
  12. Kleinberg, J. M. (1999). Hubs, authorities, and communities. ACM Computing Surveys (CSUR), 31(4es):5.
  13. Lempel, R. and Moran, S. (2001). SALSA: the stochastic approach for link-structure analysis. ACM Transactions on Information Systems, 19(2):131-160.
  14. Mendelzon, A. O. and Rafiei, D. (2000). What is this page known for? computing web page reputations.
  15. Mizzaro, S. and Robertson, S. (2007). Hits hits trec: exploring ir evaluation results with network analysis. In SIGIR 7807: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 479-486, New York, NY, USA. ACM.
  16. Searle, S. R. (1982). Matrix Algebra Useful for Statistics. John Wiley & Sons, NY, USA.
  17. yu Kao, H., ming Ho, J., syan Chen, M., and hua Lin, S. (2003). Entropy-based link analysis for mining web informative structures.
Download


Paper Citation


in Harvard Style

Benjamim Filho F., Pierre Renteria R. and Luiz Milidiú R. (2009). XHITS - Multiple Roles in a Hyperlinked Structure . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009) ISBN 978-989-674-011-5, pages 189-195. DOI: 10.5220/0002305601890195


in Bibtex Style

@conference{kdir09,
author={Francisco Benjamim Filho and Raul Pierre Renteria and Ruy Luiz Milidiú},
title={XHITS - Multiple Roles in a Hyperlinked Structure},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)},
year={2009},
pages={189-195},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002305601890195},
isbn={978-989-674-011-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)
TI - XHITS - Multiple Roles in a Hyperlinked Structure
SN - 978-989-674-011-5
AU - Benjamim Filho F.
AU - Pierre Renteria R.
AU - Luiz Milidiú R.
PY - 2009
SP - 189
EP - 195
DO - 10.5220/0002305601890195