loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Domenico Benfenati ; Antonio M. Rinaldi ; Cristiano Russo and Cristian Tommasino

Affiliation: Department of Electrical Engineering and Information Technology (DIETI), University of Naples Federico II, Naples, Italy

Keyword(s): Web Crawling, Web Pages Classification, Generative AI, Web Topic Analysis.

Abstract: The unprecedented expansion of the internet necessitates the development of increasingly efficient techniques for systematic data categorization and organization. However, contemporary state-of-the-art techniques often need help with the complex nature of heterogeneous multimedia content within web pages. These challenges, which are becoming more pressing with the rapid growth of the internet, highlight the urgent need for advancements in information retrieval methods to improve classification accuracy and relevance in the context of varied and dynamic web content. In this work, we propose GenCrawl, a generative multimedia-focused crawler designed to enhance web document classification by integrating textual and visual content analysis. Our approach combines the most relevant topics extracted from textual and visual content, using innovative generative techniques to create a visual topic. The reported findings demonstrate significant improvements and a paradigm shift in classificatio n efficiency and accuracy over traditional methods. GenCrawl represents a substantial advancement in web page classification, offering a promising solution for systematically organizing web content. Its practical benefits are immense, paving the way for more efficient and accurate information retrieval in the era of the expanding internet. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.141.29.202

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Benfenati, D.; M. Rinaldi, A.; Russo, C. and Tommasino, C. (2024). GenCrawl: A Generative Multimedia Focused Crawler for Web Pages Classification. In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR; ISBN 978-989-758-716-0; ISSN 2184-3228, SciTePress, pages 91-101. DOI: 10.5220/0012998900003838

@conference{kdir24,
author={Domenico Benfenati. and Antonio {M. Rinaldi}. and Cristiano Russo. and Cristian Tommasino.},
title={GenCrawl: A Generative Multimedia Focused Crawler for Web Pages Classification},
booktitle={Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR},
year={2024},
pages={91-101},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012998900003838},
isbn={978-989-758-716-0},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR
TI - GenCrawl: A Generative Multimedia Focused Crawler for Web Pages Classification
SN - 978-989-758-716-0
IS - 2184-3228
AU - Benfenati, D.
AU - M. Rinaldi, A.
AU - Russo, C.
AU - Tommasino, C.
PY - 2024
SP - 91
EP - 101
DO - 10.5220/0012998900003838
PB - SciTePress