Authors:
Giacomo Iadarola
1
;
Fabio Martinelli
1
;
Francesco Mercaldo
1
;
2
;
Luca Petrillo
1
and
Antonella Santone
2
Affiliations:
1
Institute for Informatics and Telematics, National Research Council of Italy (CNR), Pisa, Italy
;
2
Department of Medicine and Health Sciences “Vincenzo Tiberio”, University of Molise, Campobasso, Italy
Keyword(s):
Unsupervised Classification, X, CVE, Clustering, Neural Networks, Deep Learning.
Abstract:
The use of computing devices such as computers, smartphones, and IoT systems has increased exponentially over the past decade. Given this great expansion, it becomes important to identify and correct the vulnerabilities present to ensure the safety of systems and people. Over time, many official entities have emerged that publish news about these vulnerabilities; in addition to these sources, however, social media, such as X (commonly referred to by its former name Twitter), can be used to learn about these vulnerabilities even before they are made public. The goal of this work is to create clusters of tweets, which are grouped according to the description of the vulnerability in the relevant text. This process is accomplished through the use of a combination of two Doc2Vec models and a variant of a BERT model, which allow a text document to be converted into its numerical representation. Once this step was completed, K-means, an unsupervised model for performing clustering, was used
, which through this numerical representation obtained in the previous step, groups tweets based on text content.
(More)