various tasks, strengthening our approach’s robust-
ness.
Although our approach underperforms compared
to the results from Parker et al. (2021), which re-
port scores of approximately 0.99, our approach
achieves a score of ≈ 0.99 in identifying whether code
is obfuscated, with respect to accuracies. Despite
lower scores for identifying the particular obfuscation
method, we highlight our model’s superior general-
izability and nuanced classification. Using synthetic
code introduces bias, and the inability of CNNs to
handle arbitrary binary lengths implies that such mod-
els while enhancing certain features, do not generalize
well to real-world applications.
We can identify which features are crucial at each
classification level and interpret these features. For
example, different SVD metrics reveal the informa-
tion density and compressibility of the underlying bi-
nary. This not only allows us to discern that ob-
fuscated and non-obfuscated code differ primarily in
their SVD-energy but also provides insights for fu-
ture obfuscation techniques to avoid these character-
istics. Additionally, we observed that different com-
pilers produce signature binary densities, which are
identifiable in the classification process.
Ultimately, our approach demonstrates that the
generalizable, interpretable detection of obfuscation
techniques in real-life scenarios remains a challenge.
However, the ability of researchers to use these results
to circumvent traits that distinguish obfuscated from
non-obfuscated code suggests that this will be an ac-
tive area of ongoing research. Developments in ob-
fuscation techniques are likely to continue challeng-
ing older identification models and vice versa.
We encourage future research to focus on inter-
pretable, tree-based classifiers combined with com-
plexity metrics, as they offer interpretability and gen-
eralizability, contrary to overly specific and non-
interpretable neural network solutions that require
significant expertise to build and analyze and do not
allow for subsequent research on their inner workings.
ACKNOWLEDGEMENTS
The financial support by the Austrian Federal Min-
istry of Labour and Economy, the National Founda-
tion for Research, Technology and Development and
the Christian Doppler Research Association is grate-
fully acknowledged.
REFERENCES
Alt, J., Erd
˝
os, L., and Kr
¨
uger, T. (2021). Spectral radius of
random matrices with independent entries. Probabil-
ity and Mathematical Physics, 2(2):221–280.
Antoulas, A., Sorensen, D., and Zhou, Y. (2002). On the
decay rate of hankel singular values and related issues.
Systems & Control Letters, 46(5):323–342.
Banescu, S., Ochoa, M., and Pretschner, A. (2015).
A framework for measuring software obfuscation
resilience against automated attacks. In 2015
IEEE/ACM 1st International Workshop on Software
Protection, pages 45–51. IEEE.
Conti, M., Khandhar, S., and Vinod, P. (2022). A few-shot
malware classification approach for unknown fam-
ily recognition using malware feature visualization.
Computers & Security, 122:102887.
Deng, H., Guo, C., Shen, G., Cui, Y., and Ping, Y. (2023).
Mctvd: A malware classification method based on
three-channel visualization and deep learning. Com-
puters & Security, 126:103084.
Edelman, A. (1988). Eigenvalues and condition numbers of
random matrices. SIAM Journal on Matrix Analysis
and Applications, 9(4):543–560.
Fisher, R. A. (1922). On the mathematical foundations
of theoretical statistics. Philosophical Transactions
of the Royal Society of London. Series A, Contain-
ing Papers of a Mathematical or Physical Character,
222:309–368.
Geurts, P., Ernst, D., and Wehenkel, L. (2006). Extremely
randomized trees. Machine Learning, 63(1):3–42.
Guo, J., Xu, Y., Xu, W., Zhan, Y., Sun, Y., and Guo, S.
(2023). Mdenet: Multi-modal dual-embedding net-
works for malware open-set recognition.
He, H., Bai, Y., Garcia, E. A., and Li, S. (2008). Adasyn:
Adaptive synthetic sampling approach for imbalanced
learning. 2008 IEEE International Joint Conference
on Neural Networks (IEEE World Congress on Com-
putational Intelligence), pages 1322–1328.
Kalash, M., Rochan, M., Mohammed, N., Bruce, N. D.,
Wang, Y., and Iqbal, F. (2018). Malware classifica-
tion with deep convolutional neural networks. In 2018
9th IFIP international conference on new technolo-
gies, mobility and security (NTMS), pages 1–5. IEEE.
Kumar, S., Janet, B., and Neelakantan, S. (2024). Im-
cnn:intelligent malware classification using deep con-
volution neural networks as transfer learning and en-
semble learning in honeypot enabled organizational
network. Computer Communications, 216:16–33.
Makowski, D., Pham, T., Lau, Z. J., Brammer, J. C.,
Lespinasse, F., Pham, H., Sch
¨
olzel, C., and Chen, S.
H. A. (2021). NeuroKit2: A python toolbox for neu-
rophysiological signal processing. Behavior Research
Methods, 53(4):1689–1696.
Nataraj, L., Karthikeyan, S., Jacob, G., and Manjunath,
B. S. (2011). Malware images: visualization and auto-
matic classification. In Proceedings of the 8th interna-
tional symposium on visualization for cyber security,
pages 1–7.
SECRYPT 2024 - 21st International Conference on Security and Cryptography
332