Authors:
Fabio Martinelli
1
;
Francesco Mercaldo
1
;
2
and
Antonella Santone
2
Affiliations:
1
Institute for Informatics and Telematics, National Research Council of Italy (CNR), Pisa, Italy
;
2
Department of Medicine and Health Sciences “Vincenzo Tiberio”, University of Molise, Campobasso, Italy
Keyword(s):
Malware, Deep Learning, GAN, Android, Security.
Abstract:
The recent development of Generative Adversarial Networks demonstrated a great ability to generate images indistinguishable from real images, leading the academic and industrial community to pose the problem of recognizing a fake image from a real one. This aspect is really crucial, as a matter of fact, images are used in many fields, from video surveillance but also to cybersecurity, in particular in malware detection, where the scientific community has recently proposed a plethora of approaches aimed at identifying malware applications previously converted into images. In fact, in the context of malware detection, using a Generative Adversarial Network it might be possible to generate examples of malware applications capable of evading detection by antimalware (and also able to generate new malware variants). In this paper, we propose a method to evaluate whether the images produced by a Generative Adversarial Network, obtained starting from a dataset of malicious Android applicati
ons, can be distinguishable from images obtained from real malware applications. Once the images are generated, we train several supervised machine learning models to understand if the classifiers are able to discriminate between real malicious applications and generated malicious applications. We perform experiments with the Deep Convolutional Generative Adversarial Network, a type of Generative Adversarial Network, showing that currently the images generated, although indistinguishable to the human eye, are correctly identified by a classifier with an F-Measure greater than 0.8. Although most of the generated images are correctly identified as fake, some of them are not recognized as such, they are therefore considered images generated by real applications.
(More)