Table 1: Precision at 10, HITS, XHITS ALP and XHITS
(SVDLP).
Algorithm Precision at Ten (P@10)
HITS 0.143542
XHITS (ALP) 0.372455
XHITS (SVDLP) 0.519875
cess (SVDLP) is approximately 400%, both with re-
spect to the HITS. Another important fact that can be
drawn is the increased performance of XHITS with
the new approach of machine learning. We obtained
a 40% improvement with the new approach.
Looking inside the rankings, the best and worst
case of the proximity of the ranks produced by XHITS
SVDLP and Google was observed for query oprah
and the minimum for query michele bachmann. The
corresponding values were 0.9 and 0.1 in P@10. Dur-
ing the period we selected the queries, oprah, the star,
was about to reveal something involving her family
and nine of the ten first pages matched with Google’s
first ones. You can see the result in table 2.
Table 2: The first ten links returned by XHITS engine after
the training
Position URL
1 http://www.oprah.com/
2 http://www.oprah.com/
omagazine.html
3 http://www.imdb.com/name/nm0001856/
4 http://www.tmz.com/person/
oprah-winfrey/
5 http://www.nydailynews.com/
topics/Oprah+Winfrey
6 http://oprahsangelnetwork.org
7 http://www.livingoprah.com/
8 http://bossip.com/category/
celeb-directory/oprah/
9 http://www.myspace.com/everything/
oprah-winfrey
10 http://www.quotationspage.com/
quotes/OprahWinfrey/
5 CONCLUSIONS AND FUTURE
WORK
We explored the fact that XHITS model provides a
powerful approach and rebuild the part of the model
that is an open problem: how to find the set of pa-
rameters that best fit to a given data set (Filho et al.,
2009). In the way to improve the model, a new learn-
ing process using SDV for the XHITS model has been
presented. Previous analysis and empirical results
have shown that SVDLP performs well in XHITS
model. SVDLP learns an higher precision XHITS
model, when compared to ALP. This approach has its
own benefits, as follows:
• the SVDLP approach has no more approximate
steps;
• the training function is fully differentiable;
For testing the new approach, we chose Google as
our ranking expert, because we kept the compatibil-
ity with the previous learning process, and compared
the performance of HITS, XHITS ALP and XHITS
SVDLP in relation with each other. The gains of
XHITS SVD’ model over HITS’ are substantial as
shown in the experimental result, over 400 % gain
of quality or proximity of the Google’s ranking. We
are not affirming that this gain reflects necessarily the
quality of the ranking, but it shows that we can learn
well a judged set of pages.
For future work, we are changing the benchmark
to the ClueWeb09 collection and comparing the per-
formance with other ranking algorithms already ex-
plored and reported in the literature.
REFERENCES
Agichtein, E., Brill, E., and Dumais, S. (2006). Improv-
ing web search ranking by incorporating user behav-
ior information. In SIGIR ’06: Proceedings of the
29th annual international ACM SIGIR conference on
Research and development in information retrieval,
pages 19–26, New York, NY, USA. ACM.
Agosti, M. and Pretto, L. (2005). A theoretical study of a
generalized version of kleinberg’s hits algorithm. Inf.
Retr., 8(2):219–243.
Borodin, A., Roberts, G. O., Rosenthal, J. S., and Tsaparas,
P. (2001). Finding authorities and hubs from link
structures on the world wide web. In Tenth Interna-
tional World Wide Web Conference.
Brand, M. (2002). Incremental singular value decomposi-
tion of uncertain data with missing values. In Pro-
ceedings of the 7th European Conference on Com-
puter Vision-Part I, ECCV ’02, pages 707–720, Lon-
don, UK, UK. Springer-Verlag.
Chakrabarti, S., Joshi, M., and Tawde, V. (2001). Enhanced
topic distillation using text, markup tags, and hyper-
links. In Proceedings of the 24th Annual International
ACM SIGIR Conference on Research and Develop-
ment in Information Retrieval, pages 208–216.
Cohn, D. and Chang, H. (2000). Learning to
probabilistically identify authoritative docu-
ments. http://citeseer.ist.psu.edu/438414.html;
http://www.andrew.cmu.edu/∼huan/phits.ps.gz.
Craswell, N. and Szummer, M. (2007). Random walks on
the click graph. In Proceedings of the 30th annual
international ACM SIGIR conference on Research
and development in information retrieval, SIGIR ’07,
pages 239–246, New York, NY, USA. ACM.
Ding, C., He, X., Husbands, P., Zha, H., and Simon, H. D.
(2002a). Pagerank, HITS and a unified framework for
KDIR 2011 - International Conference on Knowledge Discovery and Information Retrieval
388