loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Pedro Lopes ; Davide Pinto ; David Campos and José Luís Oliveira

Affiliation: Universidade de Aveiro, Portugal

Keyword(s): Information retrieval, Web crawling, Crawler, Text processing, Multithread, Directed crawling.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Information Extraction ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Symbolic Systems

Abstract: The Internet is becoming the primary source of knowledge. However, its disorganized evolution brought about an exponential increase in the amount of distributed, heterogeneous information. Web crawling engines were the first answer to ease the task of finding the desired information. Nevertheless, when one is searching for quality information related to a certain scientific domain, typical search engines like Google are not enough. This is the problem that directed crawlers try to solve. Arabella is a directed web crawler that navigates through a predefined set of domains searching for specific information. It includes text-processing capabilities that increase the system’s flexibility and the number of documents that can be crawled: any structured document or REST web service can be processed. These complex processes do not harm overall system performance due to the multithreaded engine that was implemented, resulting in an efficient and scalable web crawler.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.191.135.224

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Lopes, P.; Pinto, D.; Campos, D. and Luís Oliveira, J. (2009). ARABELLA - A Directed Web Crawler. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2009) - KDIR; ISBN 978-989-674-011-5; ISSN 2184-3228, SciTePress, pages 270-273. DOI: 10.5220/0002291602700273

@conference{kdir09,
author={Pedro Lopes. and Davide Pinto. and David Campos. and José {Luís Oliveira}.},
title={ARABELLA - A Directed Web Crawler},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2009) - KDIR},
year={2009},
pages={270-273},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002291602700273},
isbn={978-989-674-011-5},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2009) - KDIR
TI - ARABELLA - A Directed Web Crawler
SN - 978-989-674-011-5
IS - 2184-3228
AU - Lopes, P.
AU - Pinto, D.
AU - Campos, D.
AU - Luís Oliveira, J.
PY - 2009
SP - 270
EP - 273
DO - 10.5220/0002291602700273
PB - SciTePress