various tasks, strengthening our approach’s robust-
Although our approach underperforms compared
to the results from Parker et al. (2021), which re-
port scores of approximately 0.99, our approach
achieves a score of ≈ 0.99 in identifying whether code
is obfuscated, with respect to accuracies. Despite
lower scores for identifying the particular obfuscation
method, we highlight our model’s superior general-
izability and nuanced classification. Using synthetic
code introduces bias, and the inability of CNNs to
handle arbitrary binary lengths implies that such mod-
els while enhancing certain features, do not generalize
well to real-world applications.
We can identify which features are crucial at each
classification level and interpret these features. For
example, different SVD metrics reveal the informa-
tion density and compressibility of the underlying bi-
nary. This not only allows us to discern that ob-
fuscated and non-obfuscated code differ primarily in
their SVD-energy but also provides insights for fu-
ture obfuscation techniques to avoid these character-
istics. Additionally, we observed that different com-
pilers produce signature binary densities, which are
identifiable in the classification process.
Ultimately, our approach demonstrates that the
generalizable, interpretable detection of obfuscation
techniques in real-life scenarios remains a challenge.
However, the ability of researchers to use these results
to circumvent traits that distinguish obfuscated from
non-obfuscated code suggests that this will be an ac-
tive area of ongoing research. Developments in ob-
fuscation techniques are likely to continue challeng-
ing older identification models and vice versa.
We encourage future research to focus on inter-
pretable, tree-based classifiers combined with com-
plexity metrics, as they offer interpretability and gen-
eralizability, contrary to overly specific and non-
interpretable neural network solutions that require
significant expertise to build and analyze and do not
allow for subsequent research on their inner workings.
The financial support by the Austrian Federal Min-
istry of Labour and Economy, the National Founda-
tion for Research, Technology and Development and
the Christian Doppler Research Association is grate-
fully acknowledged.
