loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Anshul Saxena 1 ; Keshav Dubey 1 ; Sanjay K. Dhurandher 1 and Issac Woungang 2

Affiliations: 1 University of Delhi, India ; 2 Ryerson University, Canada

Keyword(s): Crawler, Compressed Tries, Crawling Strategies, Distributed Processing.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Content Management & Digital Rights Management ; Data Engineering ; Digital Libraries ; Knowledge Management and Information Sharing ; Knowledge-Based Systems ; Ontologies and the Semantic Web ; Symbolic Systems ; Web Information Systems and Technologies ; Web Interfaces and Applications

Abstract: Web crawlers today suffer from poor navigation techniques which reduce their scalability while crawling the World Wide Web (WWW). In this paper we present a web crawler named Tarantula that is scalable, platform independent and fully configurable. The work on Tarantula project was started with the aim of making a simple, elegant yet efficient Web Crawler offering better crawling strategies while walking through the WWW. This paper also presents a comparison with the Heritrix crawler. The structure of the crawler facilitates new navigation techniques which when used with existing techniques gives better crawl results. Tarantula has a pluggable, extensible architecture that further facilitates customization by the user.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 52.14.224.197

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Saxena, A.; Dubey, K.; Dhurandher, S. and Woungang, I. (2009). TARANTULA - A Scalable and Extensible Web Spider. In Proceedings of the International Conference on Knowledge Management and Information Sharing (IC3K 2009) - KMIS; ISBN 978-989-674-013-9; ISSN 2184-3228, SciTePress, pages 167-172. DOI: 10.5220/0002302001670172

@conference{kmis09,
author={Anshul Saxena. and Keshav Dubey. and Sanjay K. Dhurandher. and Issac Woungang.},
title={TARANTULA - A Scalable and Extensible Web Spider},
booktitle={Proceedings of the International Conference on Knowledge Management and Information Sharing (IC3K 2009) - KMIS},
year={2009},
pages={167-172},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002302001670172},
isbn={978-989-674-013-9},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Management and Information Sharing (IC3K 2009) - KMIS
TI - TARANTULA - A Scalable and Extensible Web Spider
SN - 978-989-674-013-9
IS - 2184-3228
AU - Saxena, A.
AU - Dubey, K.
AU - Dhurandher, S.
AU - Woungang, I.
PY - 2009
SP - 167
EP - 172
DO - 10.5220/0002302001670172
PB - SciTePress