Machine Learning Classiﬁcation of Obfuscation using Image

Visualization

Colby B. Parker, J. Todd McDonald

and Dimitrios Damopoulos

Department of Computer Science, University of South Alabama, Mobile, AL, U.S.A.

Keywords:

Software Protection, MATE Attacks, Code Visualization, Neural Networks, Obfuscation.

Abstract:

As the need for new techniques to analyze obfuscated software has grown, recent work has shown the ability

to analyze programs via machine learning in order to perform automated metadata recovery. Often these tech-

niques really on disassembly or other means of direct code analysis. We showcase an approach combining

code visualization and image analysis via convolutional neural networks capable of statically classifying ob-

fuscation transformations. By ﬁrst turning samples into gray scale images, we are able to analyze the structure

and side effects of transformations used in the software with no heavy code analysis or feature preparation.

With experimental results samples produced with the Tigress and OLLVM obfuscators, our models are ca-

pable of labeling transformations with F1-scores between 90% and 100% across all tests. We showcase our

approach via models designed as both a binary classiﬁcation problem as well as a multi label and multi output

problem. We retain high performance even in the presence of multiple transformations in a ﬁle.

1 INTRODUCTION

Obfuscation involves the process of transforming a

given program into one that is syntactically differ-

ent but semantically equivalent (Collberg and Nagra,

2009). This new obfuscated program now has its in-

ner workings, code and/or data, changed so that they

are hidden and difﬁcult to understand to attackers,

both human and machine. Obfuscation is an impor-

tant security tool for intellectural property protection,

but is also employed by malware authors. Deobfusca-

tion exists to remove transformations so that further

analysis can proceed . From a protection viewpoint,

an adversary could employ metadata recovery attacks

which reveals the type of obfuscation used (Salem

and Banescu, 2016). This information then fuels other

techniques, meaning speedy and effective metadata

recovery methods are of high importance (Coogan

et al., 2011).

When applied to a program, different transforma-

tions exhibit varying degrees of complexity (Toﬁghi-

Shirazi et al., 2019). By studying these side effects,

one can associate patterns with different transforma-

tions (Jones et al., 2018) (Banescu et al., 2016). By

proﬁling these unique side effects, it is possible to

https://orcid.org/0000-0001-5266-7470

https://orcid.org/0000-0002-7193-0710

create a classiﬁer using machine learning which can

identify what transformations were used (Toﬁghi-

Shirazi et al., 2019). This kind of analysis can be used

to expedite and enhance software analysis and meta-

data recovery attacks (Salem and Banescu, 2016).

It has been shown that a piece of software can

be transformed into a gray scale image (Nataraj

et al., 2011). Prior work using binary visualization

has mostly focused on determining if a given binary

is malware or not, with extended work on labeling

malware families. We extend this further by using

binary visualization and Convolutional Neural Net-

works (CNN) to label transformations implemented

in a binary to hinder reverse engineering and analy-

sis. Our work showcases not only a more ﬁne-grained

analysis, but also can be used to label protections in

non-malware binaries (legitimate software). To the

best of our knowledge, no one has performed this kind

of analysis using binary visualization.

In this paper, we show how images of obfuscated

binary code can be analyzed by machine learning al-

gorithms to derive features based on spatial relation-

ships. Our tests frame this as both a binary problem

through the use of many individual models as well as

a multi-label problem via a single model. Both ap-

proaches are shown to be highly effective at classify-

ing a transformation in a program, even in the pres-

ence of layered transformations.

854

Parker, C., McDonald, J. and Damopoulos, D.

Machine Learning Classiﬁcation of Obfuscation using Image Visualization.

DOI: 10.5220/0010607408540859

In Proceedings of the 18th International Conference on Security and Cryptography (SECRYPT 2021), pages 854-859

ISBN: 978-989-758-524-1

Contributions. We make the following contribu-

tions:

• We show how visualization of binary code and the

use of CNN enable metadata recovery attacks on

obfuscated programs, with no reverse engineering

required.

• We show effectiveness of supervised learning

models trained on images of obfuscated programs

at classifying transformations, with F1-scores >

90% for both binary and multi-label classiﬁcation

across a range of transformations.

• We demonstrate binary image analysis provides

higher granularity and F1-scores > 90% for sam-

ples with layers of obfuscating transformations.

• We show our image analysis technique is effective

against well known obfuscators such as Tigress

and OLLVM (Junod et al., 2015).

2 BACKGROUND

In this section we brieﬂy discuss the basics of soft-

ware obfuscation and introduce the obfuscating trans-

formations used in our work. We then explain con-

cepts related to supervised machine learning and the

area of code visualization.

2.1 Obfuscating Transformations

Obfuscation transformations are traditionally divided

into three categories: 1) layout, focuses on mak-

ing source-code unreadable; 2) data, focuses on re-

placing data structures; and 3) control, manipulates

branch structures. It is possible for transformations

to be classiﬁed as dynamic, which implies that code

changes at run-time (Schrittwieser et al., 2016). Typ-

ical transformations (Collberg and Nagra, 2009) in-

clude:

• Virtualization: creates a process-level virtual en-

vironment by constructing an interpreter in the

form of a large switch statement with a unique in-

struction set.

• Control-ﬂow Flattening: combines branch state-

ments into one large seemingly inﬁnite loop con-

trolled by a dispatcher

• Just-in-Time Compilation: adds dynamic com-

pilation to the transformed program that replaces

target statements in the original program with

method calls unique to that statement.

http://tigress.cs.arizona.edu/

• Encoding: obscures static values within a pro-

gram via parameterized functions instead of clear

text to help avoid pattern matching.

• Opaque Predicates: An if-then-else statement

that always evaluates to the same true or false con-

dition, crafted in order to confuse static analysis.

2.2 Supervised Machine Learning

Supervised Learning is the category of machine learn-

ing that deals with the analysis of labeled data, with

the goal of correctly labeling previously unseen sam-

ples (Sutskever and others., 2014). This is accom-

plished using a mapping function f (X) = Y where X

is an unlabeled input of the kind previously analyzed

and Y is the correct label for X (Murphy, 2012). The

function is found by using a machine learning algo-

rithm to train a model on the data. Proper learning

will approximate f to allow the correct labeling of

any new sample. When the labels being predicted

represent classes this task is known as classiﬁcation.

The number of classes will further specify a task as

binary classiﬁcation (classes = 2) or multi-class clas-

siﬁcation (classes > 2). It is also possible in certain

tasks for inputs to belong to more than one class, re-

sulting in multi-label classiﬁcation (Bishop, 2006).

Machine learning classiﬁcation of images involves

capturing spatial relationships of the pixels (Albawi

et al., 2017). While there are many ways to achieve

this, Convolutional Neural Networks (CNN) have be-

come a popular choice. CNNs were designed for im-

age analysis in mind and create information from im-

age pieces, called convolutional layers, in order to un-

derstand the whole (Sainath et al., 2013). CNN-based

image analysis has also been used widely for malware

detection (Kabanga and Kim, 2017) (Kalash et al.,

2018).

2.3 Code Visualization

Since the pixel of a gray-scale image is a value be-

tween 0 and 255, it is possible to translate a ﬁle into

an image by having the bytes of the ﬁle become the

pixels of the new image. From this newly formed

image, it is possible to visually see certain features,

patterns, and structures that are present in the source

ﬁle (Seok and Kim, 2016). For instance, when used

on a program such as a Windows binary, the sections

of the program (.text, .data. etc.) are distinct from

each other. Previous work has shown that code im-

ages can be used for malware classiﬁcation (Vasan

et al., 2020). Fig. 1 is an example of what a typical

image made from an executable ﬁle will look like.

Machine Learning Classiﬁcation of Obfuscation using Image Visualization

855

Figure 1: A windows executable converted to gray-scale.

3 METHODOLOGY

In this section we will walk through the steps of our

methodology. To train and test our models, we ﬁrst

create a data set of images produced from both clean

and obfuscated programs. The obfuscation of our

samples is accomplished with the Tigress and OL-

LVM obfuscators. We apply one or more transfor-

mations to each program, grouping the into samples

with a single transformation and those with two or

more. This is to allow us to perform multi-label clas-

siﬁcation. We have our own python code to then take

in the programs and produce the needed images. With

our two data sets created, we move to the design and

training of our CNN. Multiple version of our CNN

our trained to to be used in different experiments that

will be explained in Section 4.

3.1 Obfuscation Images

Using existing data sets of C programs, we use ob-

fuscators to apply various transformations in order

to create a large set of obfuscated programs. The

application of obfuscation is done in two sets. For

the ﬁrst, we apply only single transformations to the

clean ﬁles. Every ﬁle is obfuscated with each trans-

formation to produce a number of variants equal to

the number of transformations. Then, for the sec-

ond set, we once again obfuscate the clean samples,

but this time performing multiple transformation lay-

ered on top of each other. This produces a number

of variants equal to the number of chosen permuta-

tions. Once all the obfuscated samples are produced,

we compile the samples and use a Python script in or-

der to create our obfuscation images. As mentioned

before, this process is done by reading in a ﬁle byte by

byte and creating a .png gray-scale image, whose pix-

els are formed from each consecutive byte.The size of

a generated image is based on the size of the binary

in question. Any pixels in an image that do not have

a corresponding byte in the binary will be left a black

pixel (a zero). For our images, the width of the image

is based on the data length of the input ﬁle, while the

height is obtained from the ﬁle size and width. Since

we tested and trained on small samples, this produced

images of a similar size. It would be possible to ﬁx

one of these values, resulting in images of a constant

height or width. Our width ranges are based on the

ones given in (Seok and Kim, 2016). Once all images

have been created, we form two data sets. One con-

tains images with either none or only a single layer of

obfuscation. The second contains images with either

none or a permutation of obfuscations.

3.2 Convolutional Neural Network

For our supervised learning model, we choose to use a

CNN, as high accuracy has been observed from CNN

when classifying images made from malware samples

(Nataraj et al., 2011). For our model architecture, we

choose to use a small model consisting of two con-

volutional layers which feed into two dense layers.

This is because models with this architecture and oth-

ers similar to it have been shown to be proﬁcient at

classifying the Malimg (Bensaoud et al., 2020; Mal-

let, 2020) data set without requiring high degree of

resources. The speciﬁcs of our model can be seen in

Fig. 2.

Figure 2: Architecture of our CNN.

Our input layer is shaped to take in the pixel values of

our images as opposed to extracting some feature or

aspect of the whole image. This is to see how much

information can be obtained without a high degree of

preproccesing or prior analysis. Since images from

various binaries can vary in size, all images are re-

sized to 64*64 before being given into the model.

SECRYPT 2021 - 18th International Conference on Security and Cryptography

856

3.3 Multi-label Classiﬁcation

We frame our classiﬁcation problem multiple ways

using the types described in Section 2. We do this

to show the overall robustness of code image analy-

sis, as well as to see which method achieves the best

results

Binary-classiﬁcation. Classiﬁcation of obfuscating

transforms becomes a binary problem by having a

classiﬁer only detect a speciﬁc transformation. This

requires a unique classiﬁer for each transform, but al-

lows the potential for the model to better learn speciﬁc

features of the transformation without requiring large

amounts of data or a larger network. This enhanced

understanding could result in high accuracy even on

multi-layered samples. Our CNN can become a bi-

nary classiﬁer by reducing the number of nodes in the

output layer to 2. Then, by grouping the data sets by

type, our binary CNN can be used across both data

sets (Toﬁghi-Shirazi et al., 2019).

Multi-class and Multi-label. Both Multi-Class and

Multi-label classiﬁcation is accomplished via training

one model to detect multiple classes of transforma-

tions (Tsoumakas and Katakis, 2009). The difference

lies in the number of transformations present on the

input ﬁles. If samples are only obfuscated with one

transform, then the model would only have to label

the one transform. This is a Multi-Class model. How-

ever, when given the Multi-Layered samples which

each have more than one transformation, the model

must now identify all transformations in use. This is

a Multi-Label problem (Murphy, 2012). Multi-Class

CNN can be easily used for Multi-Label problems by

making use of the sigmoid activation function in the

output layer (Sutskever and others., 2014). This al-

lows us to test our model on both the single and multi

layer data sets.

4 EXPERIMENTS

To evaluate the effectiveness of image analysis for ob-

fuscation detection, we perform a number of experi-

ments on both our single and multi layer data sets.

These experiments are:

1. Binary classiﬁcation of obfuscating transforms on

single and multi layer samples.

2. Multi class classiﬁcation of obfuscating trans-

forms on single layer samples.

3. Multi label classiﬁcation of transforms on multi

layer samples.

For training and evaluating the models in our exper-

iments, we make use of traditional k-fold cross vali-

dation with 10 folds as well as the functionality-based

validation approach from in Salem et al. related work.

Models were evaluated using the F1-Score to take into

account false positives and negatives produced by the

models.

4.1 Data Sets and Environment

The C programs used to create our obfuscated variants

are the same as the ones used in (Banescu et al., 2017)

and consist of:

1. A set of 48 programs with few lines of code con-

structed by varying code characteristics such as:

symbolic inputs, depth of control ﬂow, amount of

loops, etc.

2. Programs automatically generated by the Ran-

domFuns transformation of the Tigress C Diver-

siﬁer/Obfuscator.

3. Non-cryptographic hash functions

4. Algorithms taught in Bachelor level computer sci-

ence and programming courses.

This brings us to a data set of 5,136 source c ﬁles. Af-

ter producing the obfuscated variants with Tigress and

OLLVM, our data set consists of over 100,000 images

split between both data sets. A list of the commands

for our obfuscators and the permutations used for the

multi layered samples will be made available, along

with the code used for our experiments. Transforma-

tions from both obfuscators have parameters which

when altered change how the transformation is ap-

plied. These were varied during sample generation

to ensure that there is variance for speciﬁc transfor-

mations within the data set.

All experiments were performed on a desktop

computer running Windows 10 with an AMD Ryzen 7

3700X processor and an Nvidia 2070 Super GPU. The

Keras python package with a Tensorﬂow back-end

was used for the creation and training of our CNN.

An Ubuntu VM was used for running Tigress.

4.2 Transformation Classiﬁcation

We ﬁrst focus on transformations from the Tigress ob-

fuscator, followed by OLLVM. We then evaluate the

ability of our models to classify all transformations

present in our data set. We begin each phase ﬁrst

with samples containing a single transformation, be-

fore moving to multiple transforms. All models were

evaluated using our two approaches and can be seen

in the relevant tables. Kfold results are in black while

Machine Learning Classiﬁcation of Obfuscation using Image Visualization

857

the functional results are in red. Clean samples are

those with no obfuscations present

Table 1: F1-Scores for all Tigress transformations.

Tigress. A set of binary models and a multi-class

model were created and evaluated for this portion.

Both types of models performed comparably, with

both achieving f1-scores above 99% for all transfor-

mations. Both evaluation methods produced similar

results. Samples with multiple transforms were eval-

uated next, with both types of models again achiev-

ing similar scores. The added complexity of layered

transformations only slightly impacted classiﬁcation,

with only some transformations seeing a 1 - 2% de-

crease to score. Table 1 contains the results for these

tests.

OLLVM. Similar to previous, our models were able

to classify the OLLVM transforms accurately across

single and multi layer samples and both evaluation

methods, with only marginal differences. The excep-

tion is ’sub’ transform, which achieved a low f1-score

in the multi layered samples due to a hugh number of

false positives. Table 2 shows the F1-scores for our

models.

All Transformations. Table 3 shows the results for

the ﬁnal test. Training the models across all trans-

formations had very limited to no impact on the per-

formance of the models. We chose to combine the

Bcf and Fla transformations from OLLVM with the

Flattening and Opaque Predicate transforms from Ti-

gress to view the impact on classiﬁcation. Both the bi-

nary and multi model performed extremely well, both

maintiaining high scores. This implies that, at least

for our data set, transformations can be classiﬁed ac-

curately even if the model is expected to identify a

broad range.

4.3 Limitations

We acknowledge limitations in the experiments per-

formed. Since we make use of the same data used in

other papers, we inherit the ﬂaws of the data, namely

that it consists only of small programs and may not

Table 2: F1-Scores for all OLLVM transformations.

be a true reﬂection of the larger scope of software.

However, the features of these programs would be

present in larger samples, allowing our image anal-

ysis to still be applicable and accurate. Another po-

tential limitation is that as programs become larger,

so do the images created from them. Larger images

would become more computationally expensive to an-

alyze and train on. However, this can be compen-

sated for by extracting parts of the image for analy-

sis. Dynamic obfuscation methods could potentially

evade our method, however it is possible image anal-

ysis could be used to detect the portions of the sample

responsible for dynamically altering the code. This

has been shown to be possible via other methods.

5 RELATED WORK

(Salem and Banescu, 2016) demonstrated that an ML

model could be effectively trained to perform meta-

data recovery on an obfuscated program, instead of

using manual reverse engineering to identify transfor-

mations used. They proposed that since transforma-

tions leave uniquely identiﬁable side effects in pro-

grams, it would be possible to use ML to detect these

changes. (Toﬁghi-Shirazi et al., 2019) furthered meta-

data recovery to account for the layering of multi-

ple obfuscations in a single program, allowing de-

tection of multiple transformations types versus one.

Their ﬁne-grained approach was achieved via seman-

tic analysis of code segments. Both of these ap-

proaches have the same end goal as our method of

image analysis; however, our method does not rely on

any direct analysis of the program.

6 CONCLUSIONS AND FUTURE

WORK

In this paper, we demonstrated effectiveness of a su-

pervised learning approach based on code visualiza-

tion for classifying obfuscating transformations. The

experiments performed in this work represent an ini-

tial study and proof of concept to determine if visual-

ization is viable for this extended analysis and to serve

as the basis for more in depth work. Even on samples

SECRYPT 2021 - 18th International Conference on Security and Cryptography

858

Table 3: F1-Scores for all transformations.

with multiple transformations present, a CNN trained

on such data was able to properly label the transfor-

mations. Across all testing, in only one instance did

any of our models perform below a 90% F1-score and

most tests performed at > 95 F1-score. We conclude

that code visualization offers a potentially powerful

adversarial means for classifying program protection

types, without the need for reverse engineering or

symbolic execution. Use and expansion of this avenue

of analysis could greatly enhance applicability of the

technique to other avenues of metadata recovery at-

tacks and software analysis. In showing that image

analysis can be used to classify obfuscating transfor-

mations, we believe that there many directions that

this work can be taken in. One potential avenue is to

explore the granularity of our classiﬁcation by test-

ing if image analysis can detect features of obfuscat-

ing transform deeper than simply the type. It is also

worth exploring if image analysis be used to label the

portions of the image that correspond to transformed

code. That capability would assist even further in re-

verse engineering and analysis. The perceived limi-

tation of ﬁle size could also be explored in this way,

as if sections of an image could be labeled as obfus-

cated instead of a whole image, this would allow the

use of sub image searching. This would allow large

programs to be broken into many smaller images, re-

quiring less intensive analysis.

ACKNOWLEDGMENTS

This work is supported in part by the National Sci-

ence Foundation under the CyberCorps Scholarship

for Service program, grant DGE-1564518.

REFERENCES

Albawi, S., Mohammed, T. A., and Al-Zawi, S. (2017).

Understanding of a convolutional neural network. In

ICET’17.

Banescu, S., Collberg, C., et al. (2016). Code obfuscation

against symbolic execution attacks. In ACSAC ’16.

Banescu, S. et al. (2017). Predicting the resilience of ob-

fuscated code against symbolic execution attacks via

machine learning. In USENIX SEC’17.

Bensaoud, A., Abudawaood, N., and Kalita, J. (2020). Clas-

sifying malware images with convolutional neural net-

work models. CoRR, abs/2010.16108.

Bishop, C. M. (2006). Pattern Recognition and Ma-

chine Learning (Information Science and Statistics).

Springer-Verlag, Berlin, Heidelberg.

Collberg, C. and Nagra, J. (2009). Surreptitious Software:

Obfuscation, Watermarking, and Tamperprooﬁng for

Software Protection. Addison-Wesley Professional.

Coogan, K., Lu, G., and Debray, S. (2011). Deobfuscation

of virtualization-obfuscated software: a semantics-

based approach. In ACM CCS ’11.

Jones, L., Christman, D., Banescu, S., and Carlisle, M.

(2018). Bytewise: A case study in neural network

obfuscation identiﬁcation. In CCWC ’18.

Junod, P., Rinaldini, J., Wehrli, J., and Michielin, J. (2015).

Obfuscator-llvm – software protection for the masses.

In SPRO ’15.

Kabanga, E. K. and Kim, C. H. (2017). Malware images

classiﬁcation using convolutional neural network. J.

of Comp. and Com., 6(1).

Kalash, M. et al. (2018). Malware classiﬁcation with deep

convolutional neural networks. In NTMS ’18.

Mallet, H. (2020). Malware classiﬁcation using convolu-

tional neural networks.

Murphy, K. P. (2012). Machine Learning: A Probabilistic

Perspective. The MIT Press.

Nataraj, L., Karthikeyan, S., Jacob, G., and Manjunath,

B. S. (2011). Malware images: Visualization and au-

tomatic classiﬁcation. In VizSec ’11.

Sainath, T. N., Mohamed, A.-r., Kingsbury, B., and Ram-

abhadran, B. (2013). Deep convolutional neural net-

works for lvcsr. In ICASSP ’13.

Salem, A. and Banescu, S. (2016). Metadata recovery

from obfuscated programs using machine learning. In

SSPREW ’16.

Schrittwieser, S., Katzenbeisser, S., et al. (2016). Protect-

ing software through obfuscation: Can it keep pace

with progress in code analysis? ACM Comput. Surv.,

49(1):4:1–4:37.

Seok, S. and Kim, H. (2016). Visualized malware classi-

ﬁcation based-on convolutional neural network. J. of

The Korea Inst. of Info. Sec. & Crypt., 26(1).

Sutskever, I. and others. (2014). Sequence to sequence

learning with neural networks. In NIPS ’14.

Toﬁghi-Shirazi, R., Asavoae, I. M., and Elbaz-Vincent, P.

(2019). Fine-grained static detection of obfuscation

transforms using ensemble-learning and semantic rea-

soning. In SSPREW ’19.

Tsoumakas, G. and Katakis, I. (2009). Multi-label classiﬁ-

cation: An overview. Int. J. Data WH & Mining, 3.

Vasan, D. et al. (2020). Imcfn: Image-based malware classi-

ﬁcation using ﬁne-tuned convolutional neural network

architecture. Computer Networks, 171.

Machine Learning Classiﬁcation of Obfuscation using Image Visualization

859