models in the malware realm, as detecting zero-day
malware is the holy grail in the AV field. However,
the simplicity and ease of training k-NN models could
be a major advantage in some situations.
5 CONCLUSION
In this paper, we treated malware binaries as images
and classified samples based on pre-trained deep lear-
ning image recognition models. We compared these
image-based deep learning (DL) results to a simpler
k-nearest neighbor (k-NN) approach based on a more
typical set of static features. We carried out a wide
variety of experiments, each representing a different
combination of dataset, classification level, and lear-
ning technique. The multiclass experiments were par-
ticularly impressive, with high accuracy attained over
a large number of malware families.
Our DL method overall delivered results compara-
ble to previous work, yet it was outperformed by the
much simpler k-NN learning technique in some cases.
The image-based DL models did outperform k-NN in
simulated zero-day experiments, which indicates that
this DL implementation better generalizes the training
data, as compared to k-NN. This is a significant point,
since zero-day malware, arguably, represents the ulti-
mate challenge in malware detection.
There are many promising avenues for future
work related to image-based malware analysis. For
example, it seems likely that a major strength of any
image-based strategy is its robustness. Consequently,
additional experiments along these lines would be
helpful to better quantify this effect.
REFERENCES
Austin, T. H., Filiol, E., Josse, S., and Stamp, M. (2013).
Exploring hidden Markov models for virus analysis:
A semantic approach. In 46th Hawaii International
Conference on System Sciences, HICSS 2013, Wai-
lea, HI, USA, January 7-10, 2013, pages 5039–5048.
IEEE Computer Society.
Baysa, D., Low, R. M., and Stamp, M. (2013). Structural
entropy and metamorphic malware. Journal of Com-
puter Virology and Hacking Techniques, 9(4):179–
192.
Damodaran, A., Troia, F. D., Visaggio, C. A., Austin, T. H.,
and Stamp, M. (2017). A comparison of static, dyna-
mic, and hybrid analysis for malware detection. Jour-
nal of Computer Virology and Hacking Techniques,
13(1):1–12.
Fast.ai (2018). Fast.ai lectures. https://course.fast.ai/
lessons/lessons.html.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep re-
sidual learning for image recognition. In 2016 IEEE
Conference on Computer Vision and Pattern Recogni-
tion, CVPR 2016, pages 770–778.
Machine Learning (2018). Machine learning: Github repo-
sitory. https://github.com/tuff96/Malware-detection-
using-Machine-Learning.
Nappa, A., Rafique, M. Z., and Caballero, J. (2015). The
Malicia dataset: Identification and analysis of drive-
by download operations. International Journal of In-
formation Security, 14(1):15–33.
Nataraj, L., Karthikeyan, S., Jacob, G., and Manjunath, B.
(2011). Malware images: Visualization and automatic
classification. In Proceedings of the 8th International
Symposium on Visualization for Cyber Security, Viz-
Sec ’11.
PE File (2018). Pe file: Github repository. https://github.
com/erocarrera/pefile.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,
Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., and Duche-
snay, E. (2011). Scikit-learn: Machine learning in py-
thon. J. Mach. Learn. Res., 12:2825–2830.
Singh, T., Troia, F. D., Visaggio, C. A., Austin, T. H., and
Stamp, M. (2016). Support vector machines and mal-
ware detection. Journal of Computer Virology and
Hacking Techniques, 12(4):203–212.
Smith, L. N. (2015). Cyclical learning rates for training
neural networks. https://arxiv.org/abs/1506.01186.
Stamp, M. (2017). Introduction to Machine Learning with
Applications in Information Security. Chapman and
Hall/CRC, Boca Raton.
Toderici, A. H. and Stamp, M. (2013). Chi-squared distance
and metamorphic virus detection. Journal of Compu-
ter Virology and Hacking Techniques, 9(1):1–14.
Wong, W. and Stamp, M. (2006). Hunting for metamorphic
engines. Journal in Computer Virology, 2(3):211–229.
Yajamanam, S., Selvin, V. R. S., Troia, F. D., and Stamp,
M. (2018). Deep learning versus gist descriptors for
image-based malware classification. In Proceedings
of the 4th International Conference on Information
Systems Security and Privacy, ICISSP 2018, pages
553–561.
ForSE 2019 - 3rd International Workshop on FORmal methods for Security Engineering
726