relevant pages. We are presently investigating the
issues presented in this paper more extensively. In
addition to new strategies, we consider the effects of
a search engine used to rank the pages and seed
URL sets on crawling performance. One aim of this
study is to find an effective strategy for multilingual
focused crawling (Pirkola and Talvensaari, 2009).
ACKNOWLEDGEMENTS
This study was funded by the Academy of Finland
(research projects 125679 and 129835).
REFERENCES
Baeza-Yates, R., Castillo, C., Marin, M. and Rodriguez,
A., 2005. Crawling a country: better strategies than
breadth-first for web page ordering. Proc. of the 14th
International conference on World Wide Web /
Industrial and Practical Experience Track, Chiba,
Japan, pp.864-872.
Beitzel, S., Jensen, E., Chowdhury, A, Grossman, D.,
Frieder, O. and Goharian. N., 2004. Fusion of effective
retrieval strategies in the same information retrieval
system. Journal of the American Society for
Information Science and Technology, 55(10): 859-868.
Bergmark, D., Lagoze, C. and Sbityakov, A., 2002.
Focused crawls, tunneling, and digital libraries. Proc.
of the 6th European Conference on Research and
Advanced Technology for Digital Libraries, Rome,
Italy, September 16-18, pp. 91 – 106.
Braschler, M., 2004. Combination approaches for
multilingual text retrieval. Information Retrieval, 7 (1-
2): 183-204.
Brin, S. and Page, L., 1998. The anatomy of a large-scale
hypertextual Web search engine. Computer Networks
and ISDN Systems, 30(1-7): 107-117.
Castillo, C., 2004. Effective Web crawling. Ph.D. Thesis.
University of Chile, Department of Computer Science,
180 pages. http://www.chato.cl/534/article-63160.html
Chakrabarti, S., van den Berg, M. and Dom, B., 1999.
Focused crawling: a new approach to topic-specific
Web resource discovery. Proc. of the Eighth
International World Wide Web Conference, Toronto,
May 11 - 14.
Chakrabarti, S., Punera, K. and Subramanyam, M., 2002.
Accelerated focused crawling through online
relevance feedback. Proc. of the 11th International
Conference on World Wide Web, Honolulu, Hawaii,
May 7 - 11, pp. 148-159.
Diligenti, M., Coetzee, F. .M., Lawrence, S., Giles, C.L.
and Gori, M., 2000. Focused crawling using context
graphs. Proc. of the 26th International Conference on
Very Large Databases (VLDB), pp. 527-534.
Hersh, W. R., Bhuptiraju, R. T., Ross, L., Johnson, P.,
Cohen, A. M. and Kraemer, D. F., 2005. TREC 2004
genomics track overview. Proceedings of the
Thirteenth TExt REtrieval conference (TREC-13)
(Gaithersburg, MD). http://trec.nist.gov/pubs/
trec13/t13_proceedings.html
Manmatha, R., Feng, F. and Rath, T., 2001. Using models
of score distributions in information retrieval. Proc. of
the 27th ACM SIGIR Conference on Research and
Development in Information Retrieval, New Orleans,
Louisiana.
Montague, M. and Aslam, J. 2002: Condorcet fusion for
improved retrieval. Proc. of the Eleventh International
Conference on Information and Knowledge
Management, McLean, VA, November 4-9, pp. 538-
548.
Novak, B., 2004. A Survey of focused Web crawling
algorithms. Proc. of SIKDD 2004 at Muticonference
IS, Ljubljana, Slovenia, October 12-15, pp. 55–58.
Pirkola, A. and Talvensaari, T. 2009. Developing a system
for multilingual focused crawling. Submitted to
WWW’2009 - 18
th
International World Wide Web
Conference, Madrid, Spain, April 29-24, 2009. Poster
manuscript.
Rennie, J. and McCallum, A., 1999.Using reinforcement
learning to spider the web efficiently. Proc. of the
Sixteenth International Conference on Machine
Learning (ICML).
Srinivasan, P., Menczer, F., Pant, G. 2005. A general
evaluation framework for topical crawlers.
Information Retrieval, 8(3): 417-447.
Tang, T., Hawking, D., Craswell, N. and Griffiths, K.,
2005. Focused crawling for both topical relevance and
quality of medical information. Proc. of the 14th ACM
International Conference on Information and
Knowledge Management CIKM '05.
Zhuang, Z., Wagle, R. and Giles, C.L., 2005. What's there
and what's not?: focused crawling for missing
documents in digital libraries. Proc. of the 5th
ACM/IEEE-CS Joint Conference on Digital Libraries,
Denver, CO, pp. 301 – 310.
EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING
381