VizMal: A Visualization Tool for

Analyzing the Behavior of Android Malware

Alessandro Bacci

, Fabio Martinelli

, Eric Medvet

and Francesco Mercaldo

Dipartimento di Ingegneria e Architettura, Universit

a degli Studi di Trieste, Trieste, Italy

Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, Pisa, Italy

Keywords:

Malware Analysis, Android, Machine Learning, Multiple Instance Learning.

Abstract:

Malware signature extraction is currently a manual and a time-consuming process. As a matter of fact, security

analysts have to manually inspect samples under analysis in order to ﬁnd the malicious behavior. From

research side, current literature is lacking of methods focused on the malicious behavior localization: designed

approaches basically mark an entire application as malware or non-malware (i.e., take a binary decision)

without knowledge about the malicious behavior localization inside the analysed sample. In this paper, with the

twofold aim of assisting the malware analyst in the inspection process and of pushing the research community

in malicious behavior localization, we propose VizMal, a tool for visualizing the dynamic trace of an Android

application which highlights the portions of the application which look potentially malicious. VizMal performs

a detailed analysis of the application activities showing for each second of the execution whether the behavior

exhibited is legitimate or malicious. The analyst may hence visualize at a glance when at to which degree an

application execution looks malicious.

1 INTRODUCTION

In recent years, mobile phones have become among

the favorite devices for running programs, browsing

websites, and communication. Thanks to their ever in-

creasing capabilities, these devices often represent the

preferred gateways for accessing sensitive assets, like

private data, ﬁles and applications, and to connectivity

services. On the other hand, this trend also stimulated

malware writers to target the mobile platform.

Effective mobile malware detection methods and

tools are needed in order to protect privacy and to ena-

ble secure usage of mobile phones and tablets. Many

detection methods rely on static analysis, i.e., they base

on the investigation of static features that are observed

before running the application (e.g., occurrences of

opcodes in disassembled code (Medvet and Mercaldo,

2016), API usage in code (Aafer et al., 2013)): these

techniques are effective under controlled conditions,

but may be often easily circumvented by means of

malware obfuscation techniques (Egele et al., 2012;

Moser et al., 2007). In order to evade current malware

detection techniques and in the context of an ongoing

adversarial game, malware writers in facts implement

increasingly sophisticated techniques (Zhou and Jiang,

2012; Canfora et al., 2014). During its propagation,

malware code changes its structure (Canfora et al.,

2015d), through a set of transformations, in order to

elude signature-based detection strategies (Maiorca

et al., 2017; Dalla Preda and Maggi, 2017; Cimitile

et al., 2017). Indeed, polymorphism and metamor-

phism are rapidly spreading among malware targeting

mobile applications (Rastogi et al., 2014).

Beyond limitations related to code obfuscation, sta-

tic analysis detection faces also some limitations pe-

culiar to the Android platform. Indeed, one of the

main problems is given by how Android manage ap-

plications permissions for observing the ﬁle system.

Common antimalware technologies derived from the

desktop platform exploit the possibility of monitoring

the ﬁle system operations: this way, it is possible to

check whether some applications assume a suspicious

behavior; for example, if an application starts to down-

load malicious code, it will be detected immediately by

the antimalware responsible for scanning the disk drive.

On the other hand, Android does not allow for an ap-

plication to monitor the ﬁle system: every application

can only access its own disk space. Resource sharing

is allowed only if expressly provided by the developer

of the application. Therefore Android antimalware

cannot monitor the ﬁle system: this allows applicati-

ons to download updates and run new code without

Bacci A., Martinelli F., Medvet E. and Mercaldo F.

VizMal: A Visualization Tool for Analyzing the Behavior of Android Malware.

DOI: 10.5220/0006665005170525

In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), pages 517-525

ISBN: 978-989-758-282-0

 2018 by SCITEPRESS – Science and Technology Publications, Lda. All r ights reserved

any control by the operating system. This behavior

will not be detected by antimalware software in any

way; as a matter of fact, a series of attacks are based

on this principle (the so-called “update-attack” (Zhou

and Jiang, 2012)).

Despite its limitations, a rather trivial static de-

tection as signature-based detection is often the most

common technique adopted by commercial antimal-

ware for mobile platforms. Beyond its low effective-

ness, this method of detection is costly, as the process

for obtaining and classifying a malware signature is

laborious and time-consuming.

In order address the limitations of static analysis,

and hence to be able to cope with a variety of malware

that exists in the wild, detection based on dynamic

analysis is needed: in this case the detection is based

on the investigation of dynamic features, i.e., featu-

res which can be observed while the application is

running (e.g., device resource consumption (Canfora

et al., 2016), frequencies of system calls (Martinelli

et al., 2017a)).

Most of the currently proposed dynamic detection

methods provide the ability to classify applications

as malicious or benign as a whole, i.e., without pro-

viding any insight on which parts of the application

executions are actually malicious. Moreover, trying

to manually inspect the features on the base of which

those tools take their decision is hard, mainly because

of the size of the data. For example, in the many ap-

proaches of dynamic detection based on Machine Le-

arning techniques applied to execution traces, the raw

data consists of thousands of numbers which are very

hard even to be visualized, leaving aside the possibility

of being comprehended—this is, indeed, a common

issue and interesting line of research in the ﬁeld of

visualization of big data (Fiaz et al., 2016). It follows

that an analyst who aims at gaining a better understan-

ding of malware behavior obtains little help from these

methods.

Starting from these considerations, in this paper

we propose VizMal, a tool for assisting the malware

analyst and helping him comprehend the nature of the

malware application under analysis. VizMal operates

on an execution trace of an Android application and

visualizes it as a sequence of colored boxes, one box

for each second of duration of the execution. Each box

delivers two kinds of information to the analyst: ﬁrst,

an indication, by means of box ﬁll color, of the degree

to which the behavior performed by the application

during the corresponding second looks malicious; se-

cond, an indication, by means of the shape of the box,

of how active was the application in the corresponding

second.

VizMal may be a valuable tool for Android mal-

ware analysts, researchers, and practitioners from

many tasks. For example, it can be used to easily

spot similarities of behavior between the analyzed ap-

plication and a well known samples. Or, it can be used

together with a tool for executing applications in a

controlled environment to ﬁnd and understand the rela-

tion between injected events (e.g., an incoming SMS)

and observed behavior with the aim of, e.g., looking

for the payload activation mechanism. Or, ﬁnally, it

can be used to debug a malware detection method

by performing a ﬁne-grained analysis of misclassiﬁed

applications.

Internally, VizMal analyzes the execution trace

using a Multiple Instance Learning (MIL) framework.

Multiple Instance Learning (Carbonneau et al., 2016;

Zhou, 2004) is a form of weakly-supervised learning

in which instances in the learning set are grouped and

the label is associated with a group, instead of with

a single instance. The MIL framework ﬁts our sce-

nario because labeling execution traces of Android

applications at the granularity of one or few seconds

(subtraces) is very costly: hence data for training a

classiﬁer capable of classifying subtraces would be

hard to collect and even harder to keep updated. MIL

addresses this issue by considering each subtrace as

an instance and an entire trace as a group of instances

for which a label exist and is easily obtainable, e.g., by

running several known Android malware application

for a long enough time. To the best of our knowledge,

this is the ﬁrst use of MIL in the context of Android

malware classiﬁcation.

The remaining of the paper is organized as follows.

In Section 2 we brieﬂy review the state-of-the-art on

malware Android detection; in Section 3 we describe

how the tool (VizMal) works; in Section 4 we discuss

the results of the experimental validation of VizMal,

including a comparison of some alternatives for the

main components of VizMal; ﬁnally, in Section 5, we

draw some conclusions and propose future lines of

research.

2 RELATED WORK

In this section we review the current literature related

to the Android malware detection topic with particu-

lar regards to methods focused on, or possibly able

to perform, the localization of the malicious behavi-

ors (overcoming the classic malware detector binary

output, i.e., malware/non-malware).

Amandroid (Wei et al., 2014) performs an inter

component communication (ICC) analysis to detect

leaks. Amandroid needs to build an Inter-component

Data Flow Graph and an Data Dependence Graph to

perform ICC analysis. It is basically a general frame-

work to enable analysts to build a customized analysis

on Android apps.

FlowDroid (Arzt et al., 2014) adequately models

Android-speciﬁc challenges like the application life-

cycle or callback methods. It helps reduce missed

leaks or false positives: the proposed on-demand algo-

rithms allow FlowDroid to maintain efﬁciency despite

its strong context and object sensitivity.

Epicc (Octeau et al., 2013) identiﬁes a speciﬁca-

tion for every ICC source and sink. This includes the

location of the ICC entry point or exit point, the ICC

Intent action, data type and category, as well as the

ICC Intent key/value types and the target component

name.

Mercaldo et al. (2016); Canfora et al. (2015f,e) eva-

luate the effectiveness of the occurrences of a subset

of opcodes (i.e.,

move

jump

switch

, and

goto

)

in order to discriminate mobile malware applications

from non-malware ones. They apply six classiﬁca-

tion algorithms (J48, LADTree, NBTree, RandomFo-

rest, RandomTree, and RepTree), obtaining a precision

equal to 0.949 in malware identiﬁcation.

These methods consider static analysis in order to

identify the threats: as discussed into the introduction,

using static analysis it is possible to identify malici-

ous payloads without infect the device under analysis,

but these techniques exhibit a strong weakness with

respect to the code obfuscation techniques currently

employed by malware writers (Canfora et al., 2015b).

The approach presented by Ferrante et al. (2016)

exploits supervised and unsupervised classiﬁcation in

order to identify the moment in which an application

exhibits a malware behavior. Despite the general idea

and aim are similar to those of the present work, the ci-

ted paper lacks the visualization component and hence

can hardly be used directly by the analyst.

The Andromaly framework (Shabtai et al., 2012)

is based on a Host-based Malware Detection System

able to continuously monitor features (in terms of CPU

consumption, number of sent packets through the Wi-

Fi, number of running processes and battery level) and

events obtained from the mobile device and consider

machine learning to classify the collected data as nor-

mal (benign) or abnormal (malicious). The proposed

solution is evaluated on four applications developed

by authors.

BRIDEMAID (Martinelli et al., 2017b) is a tool

able to combine static and dynamic analysis in order

to detect of Android Malware. The analysis is based

on multi-level monitoring of device, app and user be-

havior with the aim to detect and prevent at runtime

malicious behaviors.

AndroDialysis (Feizollah et al., 2017) considers

Android Intents (explicit and implicit) as a distinguis-

hing feature for malware identiﬁcation. The results

show that the use of Android Intent achieves a bet-

ter detection ratio if compared with the permission

analysis.

TaintDroid (Enck et al., 2014) is an extension to

the Android operating system that tracks the ﬂow of

privacy sensitive data through third-party applications.

TaintDroid assumes that downloaded, third-party ap-

plications are malware, and monitors in realtime how

these applications access and manipulate users’ perso-

nal data.

Researchers in (Tam et al., 2015) consider system

call extraction with the aim to generates behavioral

proﬁles of Android applications. The developed fra-

mework, i.e. CopperDroid, is able to automatically

reconstruct system call semantics, including IPC, RPC,

and Android objects.

Lindofer et al. (Lindorfer et al., 2015) discuss the

MARVIN tool, an analysis tool able to assess the ma-

liciousness of Android applications. MARVIN consi-

ders machine learning techniques to classify Android

mobile applications using an extended feature set ex-

tracted from static and dynamic analysis of a set of

known malicious and benign applications.

These techniques consider dynamic analysis in or-

der to label Android samples as malware or trusted:

in detail the methods presented in (Tam et al., 2015)

and Ferrante et al. (2016) exploit the syscall traces

as discriminant between malware and legitimate sam-

ples as VizMal. The main difference between VizMal

and the methods designed in (Tam et al., 2015) and

Ferrante et al. (2016) is the visualization component

that make our method useful to malware analyst in

order to automatically and quickly ﬁnd the malicious

behaviour.

3 THE PROPOSED TOOL: VizMal

We consider the visual evaluation of malware Android

apps for fast and precise individuation of malicious

behaviors during the execution. To reach this goal, we

built a visualization tool that takes as input an execu-

tion trace

of an app and outputs an image consisting

of a sequence of colored boxes.

Each box corresponds to a period lasting

seconds

of the execution trace

; the value of

is a parameter of

the tool: in this work we focused on the case of

T = 1 s

which is a good trade-off between informativeness and

easiness of comprehension. The color of the box is

related to the degree to which the behavior performed

by the application during the corresponding second

looks malicious. The shape of the box (in particular,

Figure 1: Examples of the images obtained by VizMal on

execution traces of malware apps: only the ﬁrst

18 s

are here

depicted.

its height) is related to how active was the application

in the corresponding second.

VizMal is composed by two components: an image

builder, which builds the image, and a trace classiﬁer,

which processes the trace

and decorates it with its

maliciousness and activity levels. As brieﬂy introdu-

ced in Section 1, the trace classiﬁer is based on MIL:

before being able to process execution traces, it has to

be trained on a set of labeled execution traces (one la-

bel in

{

malware, non-malware

}

for each trace). In the

following sections, we describe the two components.

3.1 Image Builder

The input of the image builder is a sequence

L =

{(m

, a

), (m

, a

), . . . }

of pairs of values. The

-th

pair refers to the subtrace of the trace

starting at

(i − 1)T

second and ending at

second:

∈ [0, 1]

is the maliciousness level of that subtrace (

means no

maliciousness) and

∈ [0, 1]

represents the activity

level of that portion (0 means no activity).

The image is composed of a vertical sequence (i.e.,

a row) of boxes, one for each element in the input

sequence

. Boxes have the same width

and, for

the sake of clarity, are separated by a small empty gap.

The height of the

-th box is

, where

is the box

width and

is the activity level of the corresponding

subtrace. The ﬁll color of the box is solid and determi-

ned basing on

: for

= 0

, it is green, for

= 1

is red, and for intermediate values, it is given by the

point on a line connecting green and red in the HSL

color space whose distance from green is

assuming

than the length of the line is exactly 1.

Figure 1 shows

images obtained with VizMal

applied to execution traces of malware apps: different

maliciousness and activity levels can be seen in the

color and height of the boxes.

3.2 Trace Classiﬁer

The trace classiﬁers operates in two phase. First, it

must be trained in a learning phase which takes as

input a set of labeled execution traces; then, it can be

used in the classiﬁcation phase for actually generating

a sequence

of maliciousness and activity levels out

of an execution trace

. In both phases, each execution

trace is preprocessed in order to extract some featu-

res: in this work, we were inspired by the approach

proposed by Canfora et al. (2015c) where features are

frequences of n-grams of the system calls occurred in

the trace. We remark, however, that any other appro-

ach able to provide a sequence

of maliciousness and

activity levels out of an execution trace

could also

apply.

In detail, the preprocessing of a trace

is as follows.

The trace classiﬁer (i) splits the trace

in a sequence

, t

, . . . }

of subtraces, with each subtrace lasting ex-

actly

seconds; (ii) considers subsequences (n-grams)

of at most

consecutive system calls (discarding the

arguments), where

is a parameter of the trace classi-

ﬁer; (iii) counts the number

o(t

, g)

of each n-gram

in each subtrace t

In the learning phase, the trace classiﬁers trains

a MIL-based binary classiﬁer using substraces as in-

stances, the label (non-malware or malware) of their

enclosing trace as group label, and the counts

o(t

, g)

of n-grams as features.

In the classiﬁcation phase, the trace classiﬁers ﬁrst

preprocesses the input trace

obtaining the subtra-

ces and the corresponding feature values. Then, it

classiﬁes each subtrace using the trained MIL-based

classiﬁer and obtaining a label with a conﬁdence value.

Finally, it sets the value of the maliciousness level

for each subtrace

according to the assigned label and

corresponding conﬁdence value; and it sets the value

of the activity level to

max

, i.e., to the ratio

between the number of system calls in the subtrace

and the maximum number of system calls in a subtrace

of t.

According to the ﬁndings of Canfora et al. (2015c),

in this work we set

N = 1

: in other words, we conside-

red the absolute frequencies of unigrams—we remark,

however, that more sophisticated features could be

used. Concerning the MIL-based classiﬁer, we used

miSVM (Andrews et al., 2003) with a linear kernel and

the parameter C parameter set to 1.

4 VALIDATION

We performed a set of experiments to validate our

proposal, i.e., to verify that VizMal may actually help

the analyst in better understanding malware (and non

malware) apps behavior.

To this end, we considered a dataset of

200

An-

droid apps (a subset of those used in (Canfora et al.,

2015a)), including

100

non-malware apps automati-

cally downloaded from the Google Play Store and

100

malware apps from the Drebin dataset (Arp et al.,

2014). For each app in the dataset, we obtained

exe-

cution traces by executing the app for (at most)

60 s

on a real device with the same procedure followed by

Canfora et al. (2015c).

In order to simulate the usage of VizMal to analyze

new, unseen malware, we divided the dataset in a set

180

and a set of

apps. We ﬁrst trained the tool

on the

180 × 3

traces corresponding to the former set

and then applied it to the

20 × 3

traces of the latter

obtaining several images. We repeated the procedure

several times by varying the dataset division and obtai-

ned consistent results: we here report a subset of the

images obtained in one repetition.

Figures 2 and 3 show the images obtained by Vi-

zMal applied to the

traces of

malware and

non-

malware app, respectively. Several interesting obser-

vation may be made.

Concerning the malware traces in Figure 2, it can

be seen that the images present several red boxes re-

presenting seconds classiﬁed as malware with a high

conﬁdence, but also some green ones, which indicate

seconds with no malware behaviors recognized. Furt-

hermore, some yellow and orange boxes show uncer-

tain seconds of execution: these seconds are classiﬁed

as non-malware (in yellow) and as malware (in orange)

with a lower conﬁdence. The height of the boxes shows

a variable activity during the execution, going from

seconds with a very high number of system calls to

seconds with almost no activity. Finally, the different

number of boxes in the images in Figure 2 indicates

that in many cases malware apps stopped the execution

before the

60 s

time limit: we veriﬁed that this ﬁnding

is due to the machinery used to collect the execution

traces, in which random user interaction events were

simulated by means of an ad hoc tool (Canfora et al.,

2015c).

It can be seen that, with the proposed tool, it is

immediately observable when an app exhibits a gene-

ral malicious behavior (e.g., last

traces for the

second malware app in Figure 2b) or when it behaves

“normally” (e.g., second trace of third malware app

in Figure 2c) or “borderline” (e.g., ﬁrst trace of ﬁrst

malware app in Figure 2a). Moreover the exact mo-

ments during which the malware behavior occurs and

its intensity can be easily identiﬁed. These information

allows for a detailed analysis of the app.

Similar considerations can be made for the ima-

ges obtained for non-malware apps of 3. A row of

green boxes indicates that the app behavior was nor-

mal during the entire period of execution of

60 s

. It

can also be seen, from second row of Figure 3b, that it

may happen that an non-malware app behavior looks

suspicious from the point of view of the sequences

of system calls. This can be an opportunity, for the

analyst, to gain more insights in the execution trace or

on the classiﬁcation machinery.

We remark that VizMal applies to a single execu-

tion trace: hence, issues related to the representative-

ness of such a single trace of the behavior of an app in

general (e.g., code coverage) are orthogonal to the goal

of VizMal. However, since VizMal allows the analyst

to quickly analyze a single execution trace, it may

also enable a faster analysis of several traces collected

from the same app, perhaps in order to maximize the

generality of ﬁndings, e.g., w.r.t. code coverage.

4.1 Alternative Image Builders

Before converging on the ﬁnal proposed visualization

(visible in Figure 1), we explored many design options

for the image builder, progressively adding elements

to enrich the information displayed while keeping the

image easily readable. We here present the most signi-

ﬁcant variants.

We started with the simplest conﬁguration, i.e.,

where boxes delivered a binary indication (green or

red) for the maliciousness level and no indication for

the activity level. The result is shown in Figure 4 for

execution traces of malware apps (top) and one trace

of a non-malware app (bottom). The ﬁrst three apps

can be correctly recognized as malware. The last one

instead might look like a malware app (even though

the malware activity is much shorter compared with

the others apps), but it is actually a non-malware app.

This example shows that using only two colors for

describing all the information makes the analysis very

limited.

To display more information, we decided to in-

clude in the visualization also an indication of the

activity level, as the number of system calls executed

during each subtrace. An example of this visualization

is shown in Figure 5, for the same traces of Figure 4.

With this visualization, it is easy to see that in the non-

malware app the malware activity is in fact lower, i.e.,

fewer system calls were executed during the seconds

classiﬁed as malware with respect to the actual mal-

ware apps. This might suggest an erroneous evaluation

from the underlying classiﬁer, but it the claim would

be very weak.

We tried another approach and encoded the conﬁ-

dence of the classiﬁcation using the ﬁll color of the

boxes, i.e., basing on

as described in Section 3.1. Fi-

gure 6 shows the result for the same traces of Figure 4.

The ﬁll color shows that the conﬁdences of malici-

ousness are lower in the non-malware app, for which

there are no red boxes. Malware apps instead contain

many boxes with a high conﬁdence of maliciousness.

This consideration can lead to the hypothesis of a false

positive of the trace classiﬁer or, from another point of

(a) Malware app 1.

(b) Malware app 2.

Figure 2: Images obtained from the traces of 3 malware apps.

(a) Non-malware app 1.

(b) Non-malware app 2.

Figure 3: Images obtained from the traces of 3 non-malware apps.

Figure 4: Examples of the images obtained by the VizMal

variant with binary maliciousness level and no activity level

for 3 malware traces and one non-malware trace.

Figure 5: Examples of the images obtained by the VizMal

variant with binary maliciousness level and activity level

related to number of system calls for the same execution

traces of Figure 4.

view, of borderline behavior.

Eventually, we converged to the proposed solution

for the image builder component of VizMal. The ﬁll

color is related to the maliciousness level

in a conti-

nuous way and the box shape is related to the activity

level

. Figure 7 shows the image obtained in this

(ﬁnal) variant for the same traces of Figure 4. The

non-malware app representation is different enough

from the malware apps to indicate a probable wrong

classiﬁcation and the necessity to perform a deeper

Figure 6: Examples of the images obtained by the VizMal

variant with continuous maliciousness level and no activity

level for the same execution traces of Figure 4.

Figure 7: Examples of the images obtained by the VizMal

variant with continuous maliciousness level and activity level

related to number of system calls for the same execution

traces of Figure 4.

analysis to clarify the nature of the app.

4.2 Alternative Trace Classiﬁers

In order to explore different options for the MIL-based

classiﬁer on which the trace classiﬁer is built, we con-

sidered

other algorithms able to work in the consi-

dered scenario. We remark that the reason for which

VizMal internally bases on a MIL framework is that

because obtaining a dataset of traces annotated with

a malware/non-malware label with the granularity of

one or few seconds is costly. Instead, using MIL, Viz-

Mal can train on traces obtained by malware and non-

malware apps without particular constraints, under

the assumption that the malware behavior eventually

occurs if the execution is long and varied enough.

We experimented with sMIL (Bunescu and

Mooney, 2007), miSVM (Andrews et al., 2003), and

an ad hoc variant of “single instance” SVM (SIL-

SVM) which we modiﬁed in order to act as a MIL

framework. For the latter, we built a learning set in

which we applied a malware label to each subtrace

of a trace corresponding to a malware app and a non-

malware label to each subtrace of a trace correspon-

ding to a non-malware app. For sMIL and miSVM,

instead, a label is associated with an entire trace, with

the semantic that a malware label means that at least

one subtrace is malware, whereas a non-malware label

means that all the subtraces are non-malware.

In order to assess the

variants, we performed the

following procedure:

we divided the dataset of

3 × 100 +3 ×100

execu-

tion traces (see Section 4) in a balanced learning

set composed of

90%

of the traces and a testing set

composed of the remaining traces;

we trained the three classiﬁers on the learning set;

we applied the trained classiﬁers on the traces in

the testing set.

We repeated the above procedure

time by varying the

learning and testing set compositions and measured

the performance of the classiﬁers as False Positive

Rate (FPR), i.e., ratio between the number of subtraces

of non-malware apps classiﬁed as malware and the

number (

30 × 60

) of all the non-malware subtraces,

and False Negative Rate (FNR), i.e., ratio between the

number of subtraces of malware apps classiﬁed as non-

malware and the number of all (30 × 60) the malware

subtraces. Since all the considered MIL variants base

on SVM, for a fair comparison we used the linear

kernel and C = 1 for all.

Table 1 presents the results, averaged across the

repetitions. We remark that (a) our experimentation

was not aimed at performing a comparison among MIL

frameworks—the interested reader may refer to (Ray

and Craven, 2005)—and (b) the shown ﬁgures should

not be intended as representative of the accuracy of

malware detection for the considered approaches. In

facts, while it is fair to consider a false positive (i.e.,

a subtrace of a non-malware app classiﬁed as a mal-

ware) as an error, the same cannot be done for a false

negative (i.e., a subtrace of a malware app classiﬁed as

a non-malware): it may actually occur, possibly with

high probability, that even a malware app does not

Table 1: FPR and FNR (in percentage) for three considered

variants of MIL classiﬁers.

Classiﬁer FPR FNR

SIL-SVM 75.80 11.28

SIL-SVM (near EER) 40.24 38.48

miSVM 26.44 42.44

sMIL 9.80 69.28

sMIL (near EER) 30.84 36.36

exhibit a malicious behavior for some seconds during

its execution.

It can be seen from Table 1 that miSVM (the vari-

ant which we used in VizMal) outperforms both SIL-

SVM and sMIL. The values of FPR and FNR for the

latters suggest that their output is biased towards the

malware (SIL) and non-malware (sMIL) labels. To

mitigate this effect, we tuned the threshold of the to

classiﬁers in order to obtain their effectiveness indexes

in a working point close to the Equal Error Rate (EER),

also reported in Table 1. However, it can be seen that

miSVM still appears as the most effective variant.

5 CONCLUDING REMARKS

In this paper we introduced VizMal, a tool that presents

in a graphical way the results of a dynamic malware

analysis of Android applications. VizMal takes an exe-

cution trace of an Android application and shows a row

of colored boxes, one box for each second of duration

of the execution: the color of the box represents the

maliciousness level of the app during the correspon-

ding second, whereas the box shape represents the app

activity level during the corresponding second. Viz-

Mal may be a valuable tool in the Android malware

analysts’ and researchers’ toolboxes, allowing them to

better comprehend the nature of malware application

and debug other, maybe more sophisticated, detection

systems. Along this line, we intend to explore the use

of VizMal together with tools for controlled execution

of Android apps in order to investigate the possibility

of relating the injected user and system events to the

maliciousness and activity levels measured by VizMal,

possibly resulting in an interactive tool for malware

analysis.

ACKNOWLEDGMENTS

This work has been partially supported by H2020 EU-

funded projects NeCS and C3ISP and EIT-Digital Pro-

ject HII.

REFERENCES

Aafer, Y., Du, W., and Yin, H. (2013). Droidapiminer: Mi-

ning api-level features for robust malware detection

in android. In International Conference on Security

and Privacy in Communication Systems, pages 86–103.

Springer.

Andrews, S., Tsochantaridis, I., and Hofmann, T. (2003).

Support vector machines for multiple-instance learning.

In Becker, S., Thrun, S., and Obermayer, K., editors,

Advances in Neural Information Processing Systems

15, pages 577–584. MIT Press.

Arp, D., Spreitzenbarth, M., Huebner, M., Gascon, H., and

Rieck, K. (2014). Drebin: Efﬁcient and explainable

detection of android malware in your pocket. In Procee-

dings of 21th Annual Network and Distributed System

Security Symposium (NDSS).

Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A.,

Klein, J., Le Traon, Y., Octeau, D., and McDaniel, P.

(2014). Flowdroid: Precise context, ﬂow, ﬁeld, object-

sensitive and lifecycle-aware taint analysis for android

apps. Acm Sigplan Notices, 49(6):259–269.

Bunescu, R. C. and Mooney, R. J. (2007). Multiple instance

learning for sparse positive bags. In Proceedings of

the 24th Annual International Conference on Machine

Learning (ICML-2007), Corvallis, OR.

Canfora, G., De Lorenzo, A., Medvet, E., Mercaldo, F.,

and Visaggio, C. A. (2015a). Effectiveness of op-

code ngrams for detection of multi family android mal-

ware. In Availability, Reliability and Security (ARES),

2015 10th International Conference on, pages 333–340.

IEEE.

Canfora, G., Di Sorbo, A., Mercaldo, F., and Visaggio, C. A.

(2015b). Obfuscation techniques against signature-

based detection: a case study. In 2015 Mobile Systems

Technologies Workshop (MST), pages 21–26. IEEE.

Canfora, G., Medvet, E., Mercaldo, F., and Visaggio, C. A.

(2015c). Detecting android malware using sequences

of system calls. In Proceedings of the 3rd Internatio-

nal Workshop on Software Development Lifecycle for

Mobile, pages 13–20. ACM.

Canfora, G., Medvet, E., Mercaldo, F., and Visaggio, C. A.

(2016). Acquiring and analyzing app metrics for ef-

fective mobile malware detection. In Proceedings of

the 2016 ACM International Workshop on Interna-

tional Workshop on Security and Privacy Analytics.

ACM.

Canfora, G., Mercaldo, F., Moriano, G., and Visaggio, C. A.

(2015d). Composition-malware: building android mal-

ware at run time. In Availability, Reliability and Secu-

rity (ARES), 2015 10th International Conference on,

pages 318–326. IEEE.

Canfora, G., Mercaldo, F., and Visaggio, C. A. (2014). Ma-

licious javascript detection by features extraction. e-

Informatica Software Engineering Journal, 8(1).

Canfora, G., Mercaldo, F., and Visaggio, C. A. (2015e).

Evaluating op-code frequency histograms in malware

and third-party mobile applications. In E-Business and

Telecommunications, pages 201–222. Springer.

Canfora, G., Mercaldo, F., and Visaggio, C. A. (2015f). Mo-

bile malware detection using op-code frequency histo-

grams. In Proceedings of International Conference on

Security and Cryptography (SECRYPT).

Carbonneau, M.-A., Cheplygina, V., Granger, E., and Gag-

non, G. (2016). Multiple instance learning: A survey

of problem characteristics and applications. arXiv pre-

print arXiv:1612.03365.

Cimitile, A., Martinelli, F., Mercaldo, F., Nardone, V., and

Santone, A. (2017). Formal methods meet mobile code

obfuscation identiﬁcation of code reordering technique.

In Enabling Technologies: Infrastructure for Collabo-

rative Enterprises (WETICE), 2017 IEEE 26th Inter-

national Conference on, pages 263–268. IEEE.

Dalla Preda, M. and Maggi, F. (2017). Testing android

malware detectors against code obfuscation: a sys-

tematization of knowledge and uniﬁed methodology.

Journal of Computer Virology and Hacking Techniques,

13(3):209–232.

Egele, M., Scholte, T., Kirda, E., and Kruegel, C. (2012). A

survey on automated dynamic malware-analysis techni-

ques and tools. ACM computing surveys (CSUR),

44(2):6.

Enck, W., Gilbert, P., Han, S., Tendulkar, V., Chun, B.-

G., Cox, L. P., Jung, J., McDaniel, P., and Sheth,

A. N. (2014). Taintdroid: an information-ﬂow tracking

system for realtime privacy monitoring on smartpho-

nes. ACM Transactions on Computer Systems (TOCS),

32(2):5.

Feizollah, A., Anuar, N. B., Salleh, R., Suarez-Tangil, G.,

and Furnell, S. (2017). Androdialysis: analysis of

android intent effectiveness in malware detection. com-

puters & security, 65:121–134.

Ferrante, A., Medvet, E., Mercaldo, F., Milosevic, J., and

Visaggio, C. A. (2016). Spotting the malicious moment:

Characterizing malware behavior using dynamic fea-

tures. In Availability, Reliability and Security (ARES),

2016 11th International Conference on, pages 372–381.

IEEE.

Fiaz, A. S., Asha, N., Sumathi, D., and Navaz, A. S. (2016).

Data visualization: Enhancing big data more adapta-

ble and valuable. International Journal of Applied

Engineering Research, 11(4):2801–2804.

Lindorfer, M., Neugschwandtner, M., and Platzer, C. (2015).

Marvin: Efﬁcient and comprehensive mobile app clas-

siﬁcation through static and dynamic analysis. In Com-

puter Software and Applications Conference (COMP-

SAC), 2015 IEEE 39th Annual, volume 2, pages 422–

433. IEEE.

Maiorca, D., Mercaldo, F., Giacinto, G., Visaggio, C. A., and

Martinelli, F. (2017). R-packdroid: Api package-based

characterization and detection of mobile ransomware.

In Proceedings of the Symposium on Applied Compu-

ting, pages 1718–1723. ACM.

Martinelli, F., Marulli, F., and Mercaldo, F. (2017a). Eva-

luating convolutional neural network for effective mo-

bile malware detection. Procedia Computer Science,

112:2372–2381.

Martinelli, F., Mercaldo, F., and Saracino, A. (2017b). Bride-

maid: An hybrid tool for accurate detection of android

malware. In Proceedings of the 2017 ACM on Asia

Conference on Computer and Communications Secu-

rity, pages 899–901. ACM.

Medvet, E. and Mercaldo, F. (2016). Exploring the usage of

topic modeling for android malware static analysis. In

Availability, Reliability and Security (ARES), 2016 11th

International Conference on, pages 609–617. IEEE.

Mercaldo, F., Visaggio, C. A., Canfora, G., and Cimitile, A.

(2016). Mobile malware detection in the real world.

In Proceedings of the 38th International Conference

on Software Engineering Companion, pages 744–746.

ACM.

Moser, A., Kruegel, C., and Kirda, E. (2007). Limits of

static analysis for malware detection. In Computer

security applications conference, 2007. ACSAC 2007.

Twenty-third annual, pages 421–430. IEEE.

Octeau, D., McDaniel, P., Jha, S., Bartel, A., Bodden, E.,

Klein, J., and Le Traon, Y. (2013). Effective inter-

component communication mapping in android with

epicc: An essential step towards holistic security ana-

lysis. Effective Inter-Component Communication Map-

ping in Android with Epicc: An Essential Step Towards

Holistic Security Analysis.

Rastogi, V., Chen, Y., and Jiang, X. (2014). Catch me if you

can: Evaluating android anti-malware against trans-

formation attacks. IEEE Transactions on Information

Forensics and Security, 9(1):99–108.

Ray, S. and Craven, M. (2005). Supervised versus multi-

ple instance learning: An empirical comparison. In

Proceedings of the 22Nd International Conference on

Machine Learning, ICML ’05, pages 697–704, New

York, NY, USA. ACM.

Shabtai, A., Kanonov, U., Elovici, Y., Glezer, C., and Weiss,

Y. (2012). “andromaly”: a behavioral malware de-

tection framework for android devices. Journal of

Intelligent Information Systems, 38(1):161–190.

Tam, K., Khan, S. J., Fattori, A., and Cavallaro, L. (2015).

Copperdroid: Automatic reconstruction of android mal-

ware behaviors. In NDSS.

Wei, F., Roy, S., Ou, X., et al. (2014). Amandroid: A

precise and general inter-component data ﬂow analysis

framework for security vetting of android apps. In

Proceedings of the 2014 ACM SIGSAC Conference on

Computer and Communications Security, pages 1329–

1341. ACM.

Zhou, Y. and Jiang, X. (2012). Dissecting android malware:

Characterization and evolution. In Security and Pri-

vacy (SP), 2012 IEEE Symposium on, pages 95–109.

IEEE.

Zhou, Z.-H. (2004). Multi-instance learning: A survey. De-

partment of Computer Science & Technology, Nanjing

University, Tech. Rep.