XHITS: LEARNING TO RANK IN A HYPERLINKED STRUCTURE

Francisco Benjamim Filho, Raúl Pierre Renteria, Ruy Luiz Milidiú

Abstract

The explosive growth and the widespread accessibility of the Web has led to a surge of research activity in the area of information retrieval on the WWW. This is a huge and rich environment where the web pages can be viewed as a large community of elements that are connected through links due to several issues. The HITS approach introduces two basic concepts, hubs and authorities, which reveal some hidden semantic information from the links. In this paper, we review the XHITS, a generalization of HITS, which expands the model from two to several concepts and present a new Machine Learning algorithm to calibrate an XHITS model. The new learning algorithm uses latent feature concepts. Furthermore, we provide some illustrative examples and empirical tests. Our findings indicate that the new learning approach provides a more accurate XHITS model.

References

  1. Agichtein, E., Brill, E., and Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In SIGIR 7806: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19-26, New York, NY, USA. ACM.
  2. Agosti, M. and Pretto, L. (2005). A theoretical study of a generalized version of kleinberg's hits algorithm. Inf. Retr., 8(2):219-243.
  3. Borodin, A., Roberts, G. O., Rosenthal, J. S., and Tsaparas, P. (2001). Finding authorities and hubs from link structures on the world wide web. In Tenth International World Wide Web Conference.
  4. Brand, M. (2002). Incremental singular value decomposition of uncertain data with missing values. In Proceedings of the 7th European Conference on Computer Vision-Part I, ECCV 7802, pages 707-720, London, UK, UK. Springer-Verlag.
  5. Chakrabarti, S., Joshi, M., and Tawde, V. (2001). Enhanced topic distillation using text, markup tags, and hyperlinks. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 208-216.
  6. Cohn, D. and Chang, H. (2000). Learning to probabilistically identify authoritative documents. http://citeseer.ist.psu.edu/438414.html; http://www.andrew.cmu.edu/~huan/phits.ps.gz.
  7. Craswell, N. and Szummer, M. (2007). Random walks on the click graph. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 7807, pages 239-246, New York, NY, USA. ACM.
  8. Ding, C., He, X., Husbands, P., Zha, H., and Simon, H. D. (2002a). Pagerank, HITS and a unified framework for Ding, C., Zha, H., Simon, H., and He, X. (2002b). Link analysis: Hubs and authorities on the world wide web. http://citeseer.ist.psu.edu/546869.html; http://www.nersc.gov/research/SCG/cding/papers ps/ hits3.ps.
  9. Filho, F. B. (2005). Xhits: Extending the hits algorithm for distillation of broad search topic on www. Master's thesis, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, Rio de Janeiro, Brazil.
  10. Filho, F. B., Rentería, R. P., and Milidiú, R. L. (2009). Xhits - multiple roles in a hyperlinked structure. In Fred, A. L. N., editor, KDIR, pages 189-195. INSTICC Press.
  11. Fowler, R. H. and Karadayi, T. (2002). Visualizing the web as hubs and authorities richard H. fowler and tarkan karadayi. http://citeseer.ist.psu.edu/551939.html; http://bahia.cs.panam.edu/TR/TR CS 02 27.pdf.
  12. Giles, C. L., Flake, G. W., and Lawrence, S. (2000). Efficient identification of web communities. http://citeseer.ist.psu.edu/347042.html; http://www.neci.nec.com/~lawrence/papers/webkdd00/web-kdd00.ps.gz.
  13. Gorrell, G. (2006). Generalized hebbian algorithm for incremental latent semantic analysis. In Proceedings of Interspeech.
  14. Kleinberg, J. M. (1999). Hubs, authorities, and communities. ACM Computing Surveys (CSUR), 31(4es):5.
  15. Langville, A. N., Carl, and Meyer, D. (2005). A survey of eigenvector methods of web information retrieval. SIAM Rev.
  16. Lempel, R. and Moran, S. (2001). SALSA: the stochastic approach for link-structure analysis. ACM Transactions on Information Systems, 19(2):131-160.
  17. Mendelzon, A. O. and Rafiei, D. (2000). What is this page known for? computing web page reputations. http://citeseer.ist.psu.edu/295882.html; ftp://ftp.db.toronto.edu/pub/papers/www9.ps.gz.
  18. Mizzaro, S. and Robertson, S. (2007). Hits hits trec: exploring ir evaluation results with network analysis. In SIGIR 7807: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 479-486, New York, NY, USA. ACM.
  19. yu Kao, H., ming Ho, J., syan Chen, M., and hua Lin, S. (2003). Entropy-based link analysis for mining web informative structures. http://citeseer.ist.psu.edu/572554.html; http://kp05.iis.sinica.edu.tw/shlin/paper/CIKM02.pdf.
Download


Paper Citation


in Harvard Style

Benjamim Filho F., Pierre Renteria R. and Luiz Milidiú R. (2011). XHITS: LEARNING TO RANK IN A HYPERLINKED STRUCTURE . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 377-381. DOI: 10.5220/0003632503850389


in Bibtex Style

@conference{kdir11,
author={Francisco Benjamim Filho and Raúl Pierre Renteria and Ruy Luiz Milidiú},
title={XHITS: LEARNING TO RANK IN A HYPERLINKED STRUCTURE},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={377-381},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003632503850389},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - XHITS: LEARNING TO RANK IN A HYPERLINKED STRUCTURE
SN - 978-989-8425-79-9
AU - Benjamim Filho F.
AU - Pierre Renteria R.
AU - Luiz Milidiú R.
PY - 2011
SP - 377
EP - 381
DO - 10.5220/0003632503850389