
duce more parameters, noise, or redundancy to the
models, or may require more data or fine-tuning to
adapt to the task.
• For sequence-based malware analysis, BERT is an
effective model for encoding and classifying the
API call sequences extracted from Android APKs.
Initializing the API embeddings with Skip-gram,
an API2vec technique, significantly improves the
performance of BERT. These results suggest that
the Skip-gram embeddings capture more seman-
tic and syntactic information about the API se-
quences than the linear projection, and that they
help BERT to learn better representations and
classifications of the malware families. However,
the Skip-gram embeddings may fail to capture the
diversities of the API sequences across different
malware and benign families. Moreover, BERT
may need higher volume of data or more sophis-
ticated methods to achieve higher performance
on dataset containing a large number of malware
families.
Our study provides a comprehensive and system-
atic comparison of different transformer-based mod-
els for malware analysis on image and sequence in-
puts, and has revealed the strengths and weaknesses
of these models. Our study has also highlighted the
challenges and opportunities for future research on
applying transformer-based models for accurate mal-
ware analysis and related tasks.
ACKNOWLEDGEMENT
The authors gratefully acknowledge the computing
time provided on the high performance computing fa-
cility, Sharanga, at the Birla Institute of Technology
and Science - Pilani, Hyderabad Campus.
REFERENCES
Allix, K., Bissyand
´
e, T. F., Klein, J., and Le Traon, Y.
(2016). Androzoo: Collecting millions of android
apps for the research community. In 13th Int. Conf.
on Mining Software Repositories, pages 468–471.
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., and
Rieck, K. (2014). Drebin: Effective and explain-
able detection of android malware in your pocket. In
NDSS. The Internet Society.
Cannarile, A., Carrera, F., alantucci1, S., Iannacone, A.,
and Pirlo, G. (2022). A study on malware detection
and classification using the analysis of api calls se-
quences through shallow learning and recurrent neural
networks. Italian Conference on Cybersecurity, 3260.
Conti, G., Dean, E., Sinda, M., and Sangster, B. (2008).
Visual reverse engineering of binary and data files. In
Int. Workshop on Visualization for Computer Security,
page 1 – 17.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018).
BERT: pre-training of deep bidirectional transformers
for language understanding. CoRR, abs/1810.04805.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Min-
derer, M., Heigold, G., Gelly, S., Uszkoreit, J., and
Houlsby, N. (2020). An image is worth 16x16 words:
Transformers for image recognition at scale. ArXiv,
abs/2010.11929.
Freitas, S., Duggal, R., and Chau, D. H. (2022). Malnet:
A large-scale image database of malicious software.
In 31st ACM Int. Conf. on Information & Knowledge
Management, page 3948–3952.
Gennissen, J. and Blasco, J. (2017). Gamut : Sifting
through images to detect android malware.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep resid-
ual learning for image recognition.
Jo, J., Cho, J., and Moon, J. (2023). A malware detection
and extraction method for the related information us-
ing the vit attention mechanism on android operating
system. Applied Sciences, 13(11).
Lashkari, A. H., Kadir, A. F. A., Taheri, L., and Ghor-
bani, A. A. (2018). Toward developing a system-
atic approach to generate benchmark android malware
datasets and classification. In 2018 Int. Carnahan
Conf. on Security Technology, pages 1–7.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space.
Nataraj, L., Karthikeyan, S., Jacob, G., and Manjunath, B.
(2011). Malware images: Visualization and automatic
classification. In 8th Int. Symposium on Visualization
for Cyber Security.
Ravi, A., Chaturvedi, V., and Shafique, M. (2023). Vit4mal:
Lightweight vision transformer for malware detection
on edge devices. ACM Transactions on Embedded
Computing Systems, 22(117).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2018). Mobilenetv2: Inverted residuals
and linear bottlenecks. In 2018 IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 4510–4520.
Seneviratne, S., Shariffdeen, R., Rasnayaka, S., and
Kasthuriarachchi, N. (2022). Self-supervised vision
transformers for malware detection. IEEE Access,
10:103121–103135.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2023). Attention is all you need. In 31st Conf. on
Neural Information Processing Systems.
Wang, W., Zhu, M., Zeng, X., Ye, X., and Sheng, Y. (2017).
Malware traffic classification using convolutional neu-
ral network for representation learning. In 2017 Int.
Conf. on Information Networking, pages 712–717.
Yuan, B., Wang, J., Wu, P., and Qing, X. (2022). Iot mal-
ware classification based on lightweight convolutional
neural networks. IEEE Internet of Things Journal,
9(5):3770–3783.
Malware Analysis Using Transformer Based Models: An Empirical Study
865