EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING

Ari Pirkola; Tuomas Talvensaari

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING

Topics: Digital Libraries; Searching and Browsing; Web Information Filtering and Retrieval

In Proceedings of the Fifth International Conference on Web Information Systems and Technologies WEBIST - Volume 1, 376-381, 2009 , Lisbon, Portugal

Authors: Ari Pirkola and Tuomas Talvensaari

Affiliation: University of Tampere, Finland

Keyword(s): Focused crawling, Web crawling.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Data Engineering ; Digital Libraries ; Knowledge Management and Information Sharing ; Knowledge-Based Systems ; Ontologies and the Semantic Web ; Searching and Browsing ; Symbolic Systems ; Web Information Systems and Technologies ; Web Interfaces and Applications

Abstract: Focused crawlers are programs that selectively download Web documents (pages), restricting the scope of crawling to a specific domain or topic. We investigate different focused crawling strategies including the use of data fusion in focused crawling. Documents in the domains of genomics and genetics were fetched by Nalanda iVia Focused Crawler using three crawling strategies. In the first one, a text classifier was trained to identify relevant documents. In the latter two strategies, the identification of relevant documents was based on query-document matching. In experiments, the crawling results of the single strategies were combined to yield fused crawling results. The experiments showed, first, that different single strategies overlap only to a small extent, identifying mainly different relevant documents. Second, a query-based strategy where the words of the link context were weighted gave the best coverage (i.e., number of relevant documents) after 10 000 and 40 000 documents h ad been downloaded. The combination of the two query-based strategies was the best fused strategy but it did not perform better than the best single strategy. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.52

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Pirkola, A. and Talvensaari, T. (2009). EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING. In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - WEBIST; ISBN 978-989-8111-81-4; ISSN 2184-3252, SciTePress, pages 376-381. DOI: 10.5220/0002037603760381

@conference{webist09,
author={Ari Pirkola and Tuomas Talvensaari},
title={EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - WEBIST},
year={2009},
pages={376-381},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002037603760381},
isbn={978-989-8111-81-4},
issn={2184-3252},
}

TY - CONF

JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - WEBIST
TI - EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING
SN - 978-989-8111-81-4
IS - 2184-3252
AU - Pirkola, A.
AU - Talvensaari, T.
PY - 2009
SP - 376
EP - 381
DO - 10.5220/0002037603760381
PB - SciTePress