loading
Documents

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Ari Pirkola and Tuomas Talvensaari

Affiliation: University of Tampere, Finland

ISBN: 978-989-8111-81-4

ISSN: 2184-3252

Keyword(s): Focused crawling, Web crawling.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Data Engineering ; Digital Libraries ; Knowledge Management and Information Sharing ; Knowledge-Based Systems ; Ontologies and the Semantic Web ; Searching and Browsing ; Symbolic Systems ; Web Information Systems and Technologies ; Web Interfaces and Applications

Abstract: Focused crawlers are programs that selectively download Web documents (pages), restricting the scope of crawling to a specific domain or topic. We investigate different focused crawling strategies including the use of data fusion in focused crawling. Documents in the domains of genomics and genetics were fetched by Nalanda iVia Focused Crawler using three crawling strategies. In the first one, a text classifier was trained to identify relevant documents. In the latter two strategies, the identification of relevant documents was based on query-document matching. In experiments, the crawling results of the single strategies were combined to yield fused crawling results. The experiments showed, first, that different single strategies overlap only to a small extent, identifying mainly different relevant documents. Second, a query-based strategy where the words of the link context were weighted gave the best coverage (i.e., number of relevant documents) after 10 000 and 40 000 documents ha d been downloaded. The combination of the two query-based strategies was the best fused strategy but it did not perform better than the best single strategy. (More)

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.215.182.81

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Pirkola A.; Talvensaari T. and (2009). EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING.In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 376-381. DOI: 10.5220/0002037603760381

@conference{webist09,
author={Ari Pirkola and Tuomas Talvensaari},
title={EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2009},
pages={376-381},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002037603760381},
isbn={978-989-8111-81-4},
}

TY - CONF

JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING
SN - 978-989-8111-81-4
AU - Pirkola, A.
AU - Talvensaari, T.
PY - 2009
SP - 376
EP - 381
DO - 10.5220/0002037603760381

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.