are very related to the paragraphs content and, more-
over, they relates the subject of a paragraph as well as
a discussion and opinions it arises, beyond the text
overlapping. Such performance is provided by a re-
cursive nature of PageRank, where the relationships
between comments are iteratively elaborated. Unlike
this approach, ranking comments by a (text) similarity
to a given paragraph would not retrieve related com-
ments with a different vocabulary.
The plugin implementing our approach is publicly
available from http://goo.gl/To4Rd.
9
In future, we in-
tend to evaluate our system by comparing it to the
other state-of-the-art ranking techniques.
10
ACKNOWLEDGEMENTS
Authors thank project students: Maxim Magaziner,
Anatoly Shpilgerman and Sergey Pinsky for imple-
menting the introduced approach as a Chrome Exten-
sion for Yahoo! News
11
website, and Igor Vinokur for
a technical support of the software. Especial thanks
to Dr. Amin Mantrach from Yahoo! Labs, Barcelona,
for very constructive and helpful comments.
REFERENCES
Agarwal, D., Chen, B.-C., and Pang, B. (2011). Per-
sonalized recommendation of user comments via fac-
tor models. In Proceedings of the Conference on
Empirical Methods in Natural Language Processing,
EMNLP ’11, pages 571–582.
Agichtein, E., Castillo, C., Donato, D., Gionis, A., and
Mishne, G. (2008). Finding high-quality content in
social media. In Proceedings of the international con-
ference on Web search and web data mining, WSDM
’08, pages 183–194.
Avrachenkov, K., Litvak, N., and Pham, K. S. (2007). Dis-
tribution of pagerank mass among principle compo-
nents of the web.
Brin, S. and Page, L. (1998). The anatomy of a large-scale
hypertextual web search engine. Computer networks
and ISDN systems, 30(1-7):107–117.
Dalal, O., Sengemedu, S. H., and Sanyal, S. (2012). Multi-
objective ranking of comments on web. In Proceed-
ings of the 21st international conference on World
Wide Web, pages 419–428.
9
Unzip the archive, press ”Load unpacked extension”
in ”Developer mode” of chrome ”Extensions” tool, and
choose the unzipped plugin folder.
10
Currently, we are performing an experiment aimed at
creating the Gold Standard collection of ranked comments.
Since it is a very time/labor/budget-consuming process, we
are expecting to be able to run evaluations only in several
months.
11
http://news.yahoo.com/
Hsu, C.-F., Khabiri, E., and Caverlee, J. (2009). Ranking
comments on the social web. In Proceedings of the
2009 International Conference on Computational Sci-
ence and Engineering - Volume 04, pages 90–97.
Hu, M., Sun, A., and peng Lim, E. (2008). Comments-
oriented document summarization: Understanding
documents with readers feedback. In In Proceedings
of the 31st annual international ACM SIGIR confer-
ence on Research and development in information re-
trieval. SIGIR 08. ACM.
Jindal, N. and Liu, B. (2008). Opinion spam and analysis.
In Proceedings of the international conference on Web
search and web data mining, WSDM ’08, pages 219–
230.
Mihalcea, R., Tarau, P., and Figa, E. (2004). Pagerank
on semantic networks, with application to word sense
disambiguation. In In Proceedings of The 20st In-
ternational Conference on Computational Linguistics
(COLING 2004).
Mishne, G. (2005). Blocking blog spam with language
model disagreement. In In Proceedings of the First In-
ternational Workshop on Adversarial Information Re-
trieval on the Web (AIRWeb).
Mishne, G. (2007). Using blog properties to improve re-
trieval. In In Proceedings of the International Confer-
ence on Weblogs and Social Media (ICWSM 2007).
Otterbacher, J., Erkan, G., and Radev, D. R. (2005). Using
random walks for question-focused sentence retrieval.
In In Proceedings of Human Language Technology
Conference and Conference on Empirical Methods in
Natural Language Processing (HLT/EMNLP, pages
915–922.
Otterbacher, J., Erkan, G., and Radev, D. R. (2009). Bi-
ased lexrank: Passage retrieval using random walks
with question-based priors. Inf. Process. Manage.,
45(1):42–54.
Salton, G., Singhal, A., Mitra, M., and Buckley, C. (1997).
Automatic text structuring and summarization. Infor-
mation Processing and Management, 33(2):193–207.
Salton, G., Yang, C., and Wong, A. (1975). A vector-space
model for information retrieval. Communications of
the ACM, 18.
Sobek, M. (2003). A Survey of Google’s PageRank.
http://pr.efactory.de/.
Szabo, G. and Huberman, B. A. (2010). Predicting the
popularity of online content. Communications of the
ACM, 53(8):80–88.
KDIR2013-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
196