computational aspects of neural networks less daun-
ting (Oh and Jung, 2004). In particular, with recent
advances in GPU technology and software, it is now
practical to train neural networks on GPUs (Oh and
Jung, 2004; TensorFlow, 2017).
TensorFlow
TM
is an open source software library
by Google that is designed for numerical computation
on data flow graphs. TensorFlow is the successor of
the earlier closed source DistBelief (also from Goo-
gle) which was used for training and deploying neural
networks for pattern recognition (TensorFlow, 2017).
The unit of data in TensorFlow is a set of primi-
tive values in the form of an n-dimensional array. A
TensorFlow program builds a “computational graph,”
which is defined as a series of TensorFlow operati-
ons arranged into a graph form. A node may or may
not have a tensor as its input but it usually produces a
tensor as output. Once the computational graph is cre-
ated, it can be evaluated by executing it, which con-
sists of creating a session to encapsulate the control
and state of the TensorFlow runtime, and executing
the graph within it (TensorFlow, 2017).
TensorFlow was used to conduct the deep lear-
ning experiments in this section. The model used
here relies on transfer learning, which involves star-
ting with a model pre-trained on another problem.
Then, we retrain this existing model on a similar pro-
blem. The motivation for this approach comes from
the fact that training deep learning model from scra-
tch is generally extremely computationally intensive,
while simply modifying an existing model can be or-
ders of magnitude faster, depending on the dataset.
For the experiments reported here, the model was
pre-trained on the ImageNet Large Visual Recogni-
tion Challenge dataset, and is capable of differenti-
ating between 1000 different image classes, such as
Dalmatian, helmet, motorcycle, person, etc. (Google
Codelabs, 2017). We then retrained this model on the
raw malware images in the Malimg dataset.
In addition to the software listed in Table 1, for
the deep learning experiments discussed here, we em-
ploy the TensorFlow architecture on an NVIDIA DI-
GITS DevBox provided by Ford Motor Company. All
experiments described in this section use 4 GPUs on
the NVIDIA DIGITS DevBox As previously mentio-
ned, the training and test data for these experiments is
form the Malimg dataset, which contains more than
9000 grayscale images representing malware from 25
malware families, as summarized in Table 2, above.
3.9 TensorFlow Results
As a first experiment of our TensorFlow model, we
classified two very different families, Allaple.A and
Yuner.A, using a 90-10 split (i.e., 90% of images were
used for training and 10% for testing). When training,
we use 4000 iterations to retrain the model. These re-
sults are given in Table 6, and we note that all samples
were classified correctly.
Table 6: Accuracy results on Allaple.A and Yuner.A.
Images Accuracy
Training 3374 100%
Testing 375 100%
Next, as a more challenging test for our Ten-
sorFlow model, we classified two closely related fa-
milies, Allaple.A and Allaple.L, again using a 90-10
split and 4000 iterations to retrain the model. These
results are given in Table 7, and we again see that we
have achieved perfect classification. These are cer-
tainly impressive initial results for this deep learning
approach.
Table 7: Accuracy results on Allaple.A and Allaple.L.
Images Accuracy
Training 4086 100%
Testing 454 100%
Finally, we attempt to classify samples from all 25
Malimg malware families. For these experiments, we
test various splits of the training and test data, ranging
from 30-70 to 90-10 split. The resulting accuracies
are given in Figure 7.
30 40
50 60
70 80 90
86
88
90
92
94
96
98
100
Training data percentage
Accuracy (percentage)
Testing
Training
Figure 7: Accuracy vs training/testing split for TensorFlow
experiments with all 25 Malimg families.
From the results in Figure 7, we see that by ap-
plying deep learning directly to raw image files, we
can attain a testing accuracy in excess of 98%, which
is as good as the accuracy obtained based on gist des-
criptors as reported in (Nataraj et al., 2011), and as