loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Giuseppe Cascavilla 1 ; Gemma Catolino 2 and Mirella Sangiovanni 2

Affiliations: 1 Eindhoven University of Technology, Jheronimus Academy of Data Science, The Netherlands ; 2 Tilburg University, Jheronimus Academy of Data Science, The Netherlands

Keyword(s): Natural-language Processing, DarkWeb, Bert, RoBERTA, Machine Learning, ULMFit, LSTM, AI.

Abstract: This work aims at expanding previous works done in the context of illegal activities classification, performing three different steps. First, we created a heterogeneous dataset of 113995 onion sites and dark marketplaces. Then, we compared pre-trained transferable models, i.e., ULMFit (Universal Language Model Fine-tuning), Bert (Bidirectional Encoder Representations from Transformers), and RoBERTa (Robustly optimized BERT approach) with a traditional text classification approach like LSTM (Long short-term memory) neural networks. Finally, we developed two illegal activities classification approaches, one for illicit content on the Dark Web and one for identifying the specific types of drugs. Results show that Bert obtained the best approach, classifying the dark web’s general content and the types of Drugs with 96.08% and 91.98% of accuracy.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.19.240.129

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Cascavilla, G., Catolino, G. and Sangiovanni, M. (2022). Illicit Darkweb Classification via Natural-language Processing: Classifying Illicit Content of Webpages based on Textual Information. In Proceedings of the 19th International Conference on Security and Cryptography - SECRYPT; ISBN 978-989-758-590-6; ISSN 2184-7711, SciTePress, pages 620-626. DOI: 10.5220/0011298600003283

@conference{secrypt22,
author={Giuseppe Cascavilla and Gemma Catolino and Mirella Sangiovanni},
title={Illicit Darkweb Classification via Natural-language Processing: Classifying Illicit Content of Webpages based on Textual Information},
booktitle={Proceedings of the 19th International Conference on Security and Cryptography - SECRYPT},
year={2022},
pages={620-626},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011298600003283},
isbn={978-989-758-590-6},
issn={2184-7711},
}

TY - CONF

JO - Proceedings of the 19th International Conference on Security and Cryptography - SECRYPT
TI - Illicit Darkweb Classification via Natural-language Processing: Classifying Illicit Content of Webpages based on Textual Information
SN - 978-989-758-590-6
IS - 2184-7711
AU - Cascavilla, G.
AU - Catolino, G.
AU - Sangiovanni, M.
PY - 2022
SP - 620
EP - 626
DO - 10.5220/0011298600003283
PB - SciTePress