Identifying Suspects on Social Networks: An Approach based on Non-structured and Non-labeled Data

Érick Florentino, Ronaldo Goldschmidt, Maria Cavalcanti

2021

Abstract

The identification of suspects of committing virtual crimes (e.g., pedophilia, terrorism, bullying, among others) has become one of the tasks of high relevance when it comes to social network analysis. Most of the time, analysis methods use the supervised machine learning (SML) approach, which requires a previously labeled set of data, i.e., having identified in the network, the users who are and who are not suspects. From such a labeled network data, some SML algorithm generates a model capable of identifying new suspects. However, in practice, when analyzing a social network, one does not know previously who the suspects are (i.e., labeled data are rare and difficult to obtain in this context). Furthermore, social networks have a very dynamic nature, varying significantly, which demands the model to be frequently updated with recent data. Thus, this work presents a method for identifying suspects based on messages and a controlled vocabulary composed of suspicious terms and their categories, according to a given domain. Different from the SML algorithms, the proposed method does not demand labeled data. Instead, it analyzes the messages exchanged on a given social network, and scores people according to the occurrence of the vocabulary terms. It is worth to highlight the endurance aspect of the proposed method since a controlled vocabulary is quite stable and evolves slowly. Moreover, the method was implemented for Portuguese texts and was applied to the “PAN-2012-BR” data set, showing some promising results in the pedophilia domain.

Download


Paper Citation


in Harvard Style

Florentino É., Goldschmidt R. and Cavalcanti M. (2021). Identifying Suspects on Social Networks: An Approach based on Non-structured and Non-labeled Data. In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-509-8, pages 51-62. DOI: 10.5220/0010440300510062


in Bibtex Style

@conference{iceis21,
author={Érick Florentino and Ronaldo Goldschmidt and Maria Cavalcanti},
title={Identifying Suspects on Social Networks: An Approach based on Non-structured and Non-labeled Data},
booktitle={Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2021},
pages={51-62},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010440300510062},
isbn={978-989-758-509-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Identifying Suspects on Social Networks: An Approach based on Non-structured and Non-labeled Data
SN - 978-989-758-509-8
AU - Florentino É.
AU - Goldschmidt R.
AU - Cavalcanti M.
PY - 2021
SP - 51
EP - 62
DO - 10.5220/0010440300510062