Identifying Mobile Repackaged Applications through Formal Methods

Fabio Martinelli

, Francesco Mercaldo

, Vittoria Nardone

, Antonella Santone

and Corrado Aaron Visaggio

Institute for Informatics and Telematics, National Research Council of Italy (CNR), Pisa, Italy

Department of Engineering, University of Sannio, Benevento, Italy

Keywords:

Security, Malware, Model Checking, Android, Testing.

Abstract:

Smartphones and tablets are rapidly become indispensable in every day activities. Android has become the

most popular operating system for mobile environments in the world. These devices, owing to the open nature

of Android, are continuously exposed to attacks, mostly to data exﬁltration and monetary fraud. There are

many techniques to embed the bad code, i.e. the instructions able to perform a malicious behaviour, into a

legitimate application: the most diffused one is the so-called repackaged, that consists of reverse engineer the

application in order to embed the malicious code and then (re)distribute them in the ofﬁcial and/or third party

markets. In this paper we propose a technique to localize malicious payload of GinMaster family, one of the

most representative repackaged trojan in Android environment. We obtain encouraging results, achieving an

accuracy equal to 0.9.

1 INTRODUCTION

In 2015, the volume of mobile malware continued

to grow. From 2004 to 2013 security experts of Se-

curList detected nearly 200,000 samples of malicious

mobile code. In 2014 there were 295,539 new pro-

grams, while the number was 884,774 in 2015. Each

malware sample has several installation packages: in

2015, they detected 2,961,727 malicious mobile in-

stallation packages (SecureList, 2015).

Signature-based malware detection, which is the

most common technique adopted by commercial an-

timalware for mobile, is often ineffective. Moreover it

is costly, as the process for obtaining and classifying a

malware signature is laborious and time-consuming.

In order to mitigate the malware trend in Febru-

ary 2011, Google introduced Bouncer (GoogleMo-

bile, 2014) to screen submitted apps for detecting

malicious behaviors, but this has not eliminated the

problem, as it is discussed in (Oberheide and Miller,

2012).

The Fraunhofer Research Institution for Applied

and Integrated Security has performed an evaluation

of antivirus for Android (Fedler et al., 2014): the con-

clusion is there are many techniques for evading the

detection of most antivirus and for installing mali-

cious payload.

The most employed installation technique is the

so-called repackaging (Zhou and Jiang, 2012): the

attacker decompiles a trusted application to get the

source code, then adds the malicious payload and re-

compiles the application with the payload to make it

available on various market alternatives, and some-

times also on the ofﬁcial market. The user is often

encouraged to download such malicious applications

because they are free versions of trusted applications

sold on the ofﬁcial market.

Scientiﬁc community in last years has proposed a

lot of static (Canfora et al., 2016; Liang and Du, 2014;

Yerima et al., 2013; Arp et al., 2014; Spreitzenbarth

et al., 2013) and dynamic (Isohara et al., 2011; Tchak-

ount and Dayang, 2013; Reina et al., 2013) techniques

basically based on machine learning methods in order

to solve the problem: the main limitation in this case

is due to the false positive ratio.

Indeed existing solutions for protecting privacy

and security on smartphones are still ineffective in

many facets (Marforio et al., 2011), and many ana-

lysts warn that the malware families and their vari-

ants for Android are rapidly increasing. This scenario

calls for new security models and tools to limit the

spreading of malware for smartphones.

For these reasons, in this paper we evaluate the

effectiveness of a model-checking approach to iden-

tify Android malware. We evaluate our method us-

ing GinMaster malware, one of the most widespread

Martinelli, F., Mercaldo, F., Nardone, V., Santone, A. and Visaggio, C.

Identifying Mobile Repackaged Applications through Formal Methods.

DOI: 10.5220/0006287906730682

In Proceedings of the 3rd International Conference on Information Systems Security and Pr ivacy (ICISSP 2017), pages 673-682

ISBN: 978-989-758-209-7

673

family in mobile malware landscape, with over 6,000

known variants belonging to this family.

The approach can be easily applied to other

repackaged families: as reported in (citazione), the

so-called repackaged malware contains the malicious

payload at installation time, this is the reason why

the payload is embedded into the application and the

logic rules, able to verify the existance of the mali-

cious payload, can be veriﬁed without run the sample.

The reason why we explore the effectiveness of

our approach on GinMaster family is represented by

the fact that this is one of the most widespread trojan

malware family in mobile environment. The payload

belonging to this family is embedded into legitimate

applications using repackaging technique. GinMaster

has gone through three signiﬁcant generations since

it was ﬁrst found by researchers from North Carolina

State University on 17 August 2011.

GinMaster is distributed in third-party app mar-

kets in China. The attackers injected GinMaster

code into thousands of legitimate game, ringtone and

picture applications. These applications have more

chance to lure mobile users into installing the mali-

cious payload.

The trojan contains a malicious service able to

root speciﬁc devices in order to escalate privileges.

It also has the ability to modify and delete contents in

the SD card of device, steal conﬁdential information

and send it to a remote website, execute command-

and-control services from the remote website, as well

as download and install applications regardless of user

interaction.

The approach can be easily applied to other

repackaged families: as reported in (Zhou and Jiang,

2012), the so-called repackaged malware contains the

malicious payload at installation time, this is the rea-

son why the payload is embedded into the applica-

tion and the logic rules, able to verify the existence of

the malicious payload, can be veriﬁed without run the

sample.

The salient characteristics of our methodology

are:

• the use of formal methods;

• the inspection of Java Bytecode and not on the

source code;

• the use of static analysis;

• the capture of malicious behaviours at a ﬁner

granularity.

In practice, from the Java Bytecode application

ﬁles we generate CCS processes, which are succes-

sively used for checking properties expressing the

most common behaviours exhibit by GinMaster fam-

ily samples.

Performing automatic analysis on the Bytecode

and not directly on the source code has several ad-

vantages:

• independence of the source programming lan-

guage;

• identiﬁcation of GinMaster without decompila-

tion even when source code is lacking;

• ease of parsing a lower-level code;

• independence from obfuscation.

The paper proceeds as follows: Section 2 dis-

cusses related work; Section 3 describes and moti-

vates our approach; Section 4 illustrates the results

of experiments; ﬁnally, conclusions are drawn in the

Section 5.

2 RELATED WORK

In this section we review the current literature related

to malware identiﬁcation with particular regards to

malicious family identiﬁcation.

A very thorough dissecting of GinMaster family is

provided in (Yu, 2013). The paper gives an overview

of three generations of the GinMaster family, exam-

ines the core malicious functionality, tracks their evo-

lution from source code, and presents notable tech-

niques utilized by the speciﬁc variants. Basically,

Ginmaster is able to set-up via mobile botnet mali-

cious code hidden in the affected app. However, in-

stead of directly taking advantage of these zombies

devices to make proﬁt from end-users, the malware

controller employs a botnet to generate millions of in-

stallations and large volumes of advertising trafﬁc to

legitimate developers and advertising services.

In (Canfora et al., 2016) authors experimentally

evaluate two techniques for detecting Android mal-

ware: the ﬁrst one is based on Hidden Markov Model

(HMM), while the second one exploits Structural En-

tropy. They demonstrate that these methods obtain a

precision of 0.96 to discriminate a malware applica-

tion. In addition they also analyse ransomware sam-

ples, obtaining a precision of 0.961 with LADTree

classiﬁcation algorithm using the Structural Entropy

method, and a precision of 0.824 with J48 algorithm

using the HMM one in ransomware identiﬁcation.

Song et al. (Song et al., 2016) propose a frame-

work to statically detect Android malware, consisting

of four layers of ﬁltering mechanisms: the message

digest values, the combination of malicious permis-

sions, the dangerous permissions, and the dangerous

intention. As additional contribute, they propose a

novel threat degree threshold model of dangerous per-

missions on malware detection. They experiment the

ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering

674

method on real mobile devices, using 83 real mobile

devices and achieving a 98.8% pass rate, where the

versions of Android range from 2.3 to 5.1.

Neuhaus et al. (Neuhaus and Zimmermann, 2010)

crawl the vulnerability reports in the Common Vul-

nerability and Exposures database by using topic

models to ﬁnd prevalent vulnerability types and

new trends semi-automatically. They analyze 39393

unique reports until the end of 2009, with the aim of

characterizing many vulnerability trends: SQL injec-

tion (PHP), buffer overﬂows, format strings, cross-

site request forgery and so on.

Zhao et al. (Zhao et al., 2014) propose a LDA-

based method to analyze the trends of network secu-

rity which consist of three steps: collect data from

web sites, extract topics from the collected data, and

makes the curves of trends over time. They select

620 documents sorted by time and extract 10 topic

from each document. Six interesting topics are dis-

covered by LDA model, according to the autors: dns-

ddos, vulnerability, mobile-malware, mac-malware,

Browser malware and Java-vulnerability.

Formal methods have been applied for studying

malware in some recent papers. In (Kinder et al.,

2005) the authors introduce the speciﬁcation language

CTPL (Computation Tree Predicate Logic) conﬁrm-

ing the malicious behavior of thirteen Windows mal-

ware variants using as dataset a set of worms dating

from 2002 to 2004.

Song et al. present an approach to model Mi-

crosoft Windows XP binary programs as a PushDown

System (PDS) (Song and Touili, 2001). They evalu-

ate 200 malware variants (generated by NGVCK and

VCL32 engines) and 8 benign programs.

The tool PoMMaDe (Song and Touili, 2013) is

able to detect 600 real malware, 200 malware gen-

erated by two malware generators (NGVCK and

VCL32), and proves the reliability of benign pro-

grams: a Microsoft Windows binary program is mod-

eled as a PDS which allows to track the stack of the

program.

Song et al. model mobile applications using a

PDS in order to discovery private data leaking work-

ing at Smali code level (Song and Touili, 2014). Ille-

gal ﬂow of information in Java bytecode has been also

studied in (Bernardeschi et al., 2004), using a static

analysis approach.

Jacob and colleagues provide a basis for a mal-

ware model, founded on the Join-Calculus: they con-

sider the system call sequences to build the model (Ja-

cob et al., 2010).

Recently, the possibility to identify the malicious

payload in Android malware using a model checking

based approach has been explored in (Battista et al.,

2016; Mercaldo et al., 2016a; Mercaldo et al., 2016b).

Starting from payload behavior deﬁnition they for-

mulate logic rules and then test them by using a

real world dataset composed by Ransomware, Droid-

KungFu, Opfake families and update attack samples.

As it emerges from the literature in the last years,

formal methods have been applied to detect mobile

malware, but at the best knowledge of the authors they

have never been applied for identifying speciﬁcally

the repackaged attack provided by GinMaster family

on Android malware.

3 THE METHOD

In this section a model checking-based approach for

the detection of GinMaster apps is presented. While

model checking (Barbuti et al., 2005) was originally

developed to verify the correctness of systems against

speciﬁcations, recently it has been highlighted in con-

nection with a variety of disciplines such as biol-

ogy (De Ruvo et al., 2015), clone detection (Santone,

2011), secure information ﬂow (Barbuti et al., 2002),

among others. In this paper we present the use of

model checking in the security ﬁeld. Fig. 1 describes

all the phases of our approach.

During the ﬁrst phase, we generate a formal model

from the Java Bytecode of the .class ﬁles derived by

the app under analysis. As formal speciﬁcation lan-

guage, we use Milner’s Calculus of Communicating

Systems (CCS) (Milner, 1989), one of the most well

known process algebras. CCS contains basic oper-

ators to build ﬁnite processes, communication op-

erators to express concurrency, and some notion of

recursion to capture the inﬁnite behaviour. Thus,

the formal model is obtained by transforming each

Java Bytecode instruction in CCS processes. More

precisely, from the CCS we generate an automaton

such that the nodes represent instruction addresses

while the edges (labeled with opcodes) represent the

control-ﬂow transitions from one instruction to it suc-

cessors(s).

In the second phase, we try to discover Android

malware GinMaster apps. The behavior of the Gin-

Master family is encoded into a property ϕ expressed

in a branching temporal logic: the mu-calculus logic

(Stirling, 1989). Temporal logics are logical for-

malisms for expressing properties such as liveness

and safety properties. The syntax of the mu-calculus

is the following, where K ranges over sets of actions

and Z ranges over variables:

φ ::= tt | ff |Z | φ ∧ φ | φ ∨ φ | [K]φ |

hKiφ | νZ.φ | µZ.φ

Identifying Mobile Repackaged Applications through Formal Methods

675

Figure 1: The Workﬂow of The Approach.

A ﬁxpoint formula has the form µZ.φ (resp. νZ.φ)

where µZ (resp. νZ) binds free occurrences of Z in φ.

An occurrence of Z is free if it is not within the scope

of a binder µZ (resp. νZ). A formula is closed if it

contains no free variables. µZ.φ is the least ﬁxpoint of

the recursive equation Z = φ, while νZ.φ is the great-

est one. From now on we consider only closed formu-

lae.

The satisfaction of a formula φ by a state s of a

transition system is deﬁned as follows:

• each state satisﬁes tt and no state satisﬁes ff;

• a state satisﬁes φ

∨φ

(φ

∧φ

) if it satisﬁes φ

(and) φ

• [K] φ is satisﬁed by a state which, for every perfor-

mance of an action in K, evolves to a state obeying

φ.

• hKi φ is satisﬁed by a state which can evolve to a

state obeying φ by performing an action in K.

For the precise deﬁnition of the satisfaction of a

closed formula ϕ by a state s (written s |= ϕ) the reader

can refer to (Stirling, 1989).

For example, µY.hai tt ∧ h−a,biY means that “it

is possible to perform the action a non preceded by

the action b”.

The CCS formal model, generated by the Java

Bytecode during the ﬁrst phase, is now used to prove

the property ϕ: using the model checking we deter-

mine the detection of Ginmaster malware apps.

In Table 1 we show the GinMaster formulae to

give the reader the ﬂavour of the approach followed.

Table 1 shows the logic formulae to catch the Gin-

Master malicious payload: the ﬁrst one, i.e., ϕ

, is

able to catch the root ability of the GinMaster mali-

cious payload; the second one, i.e., ϕ

, identiﬁes the

gather of user information, one of the most represen-

tative mobile trojan behavior, while ϕ

identiﬁes the

commands to communicate with the C&C server. Fig-

ure 2 shows two snippets of code belonging to a Gin-

Master sample. The reported snippets are related to

the two malicious behaviours catch by the logic for-

mulae ϕ

and ϕ

More precisely, formula ϕ

is able to catch follow-

ing actions:

• “new javalangStringBuilder”: it represents muta-

ble sequence of characters, typically used to build

path where resources, i.e. ﬁles, are located. In the

case of GinMaster the built string represents the

path of the script able to root the device;

• “pushchmod”: to successfully run the script the

malware needs to set the admin privileges to

the root script, this action is performed with the

“chmod” command;

• “invokeappend”: this represent a method of the

StringBuilder class used to concatenate different

String;

• “invokeexec”: this instruction is able to run the

speciﬁed command and arguments in a separate

process with the speciﬁed environment and work-

ing directory. The “exec” method, belonging to

“Runtime” class, represents the command able to

run the script to root the phone that already ob-

tained the admin privileges;

• “invokewaitFor”: this instruction causes the cur-

rent thread to wait, if necessary, until the process

ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering

676

Table 1: The formulae for GinMaster payload detection.

= µX.hnew javalangStringBuilderi ϕ

∨ h−new javalangStringBuilderi X

= µX.hpushchmodi ϕ

∨ h−pushchmodi X

= µX.hinvokeappendi ϕ

∨ h−invokeappendi X

= µX.hinvokeexeci ϕ

∨ h−invokeexeci X

= µX.hinvokewaitFori ϕ

∨ h−invokewaitFori X

= µX.hpush f ilei tt ∨ h−push f ilei X

= µX.hpushphonei ϕ

∨ h−pushphonei X

= µX.hinvokegetSystemServicei ϕ

∨ h−invokegetSystemServicei X

= µX.hcheckcastandroidtelephonyTelephonyManageri ϕ

∨

h−checkcastandroidtelephonyTelephonyManageri X

= µX.hinvokegetDeviceIdi ϕ

∨ h−invokegetDeviceIdi X

= µX.hinvokegetSubscriberIdi ϕ

∨ h−invokegetSubscriberIdi X

= µX.hinvokegetSimSerialNumberi ϕ

∨ h−invokegetSimSerialNumberi X

= µX.hinvokegetLineUunoNumberi ϕ

∨ h−invokegetLineUunoNumberi X

= µX.hnew javaioBu f f eredReaderi tt ∨ h−new javaioBu f f eredReaderi X

= µX.hA,B,C, D, E,F,G,H,I,J,K,L, M, Ni tt ∨ h−A,B,C,D,E,F,G,H,I,J, K,L, M, Ni X

Figure 2: Code Snippet Related to Logic Formulae ϕ

and ϕ

represented by this Process object has terminated.

This method returns immediately if the subpro-

cess has already terminated. If the subprocess

has not yet terminated, the calling thread will be

blocked until the subprocess exits. In this case the

subprocess is represented by the execution of the

script to root the device;

• “pushphone”: it represents the retrieval of a ﬁle

from an external source and the storage on the de-

vice; in this case the stored ﬁle in the device is the

script the will be run in a separate thread.

Instead, formula ϕ

identiﬁes the personal infor-

mation gathering capability of the malicious payload

performing following actions:

• “invokegetSystemService”: it represents an in-

terface to global information about the appli-

cation environment. This is an abstract class

whose implementation is provided by the An-

droid system. It allows access to application-

speciﬁc resources and classes, as well as up-calls

for application-level operations such as launching

activities, broadcasting and receiving intents;

• “checkcastandroidtelephonyTelephonyManager”:

it provides access to information about the

telephony services on the device. Applications

can use the methods in this class to determine

telephony services and states, as well as to

access some types of subscriber information.

Applications can also register a listener to re-

ceive notiﬁcation of telephony state changes;

“invokegetDeviceId”: it is a method of the

“TelephonyManager” class provided by An-

droid environment and it returns the unique

device ID, for instance, the IMEI for GSM

Identifying Mobile Repackaged Applications through Formal Methods

677

and the MEID or ESN for CDMA phones;

“invokegetSubscriberId”: another method pro-

vided by “TelephonyManager” class Returns the

unique subscriber ID, for example, the IMSI for a

GSM phone;

• “invokegetSimSerialNumber”: this method, be-

longing to “TelephonyManager” class, it returns

the serial number of the SIM inserted into the de-

vice;

• “invokegetLineUunoNumber”: this method re-

turns the phone number string for line 1;

• “new javaioBu f f eredReader”: an object belong-

ing to the BufferedReader class is able to read text

from a character-input stream and buffer charac-

ters. In this case the buffer contains the personal

information retrieved previously by the malicious

payload.

The formula ϕ

identiﬁes the Command and Con-

trol (C&C) list of instructions. Table 2 shows all the

strings of commands and reports a brief description of

them. In some sample these strings are encrypted and

the algorithm used for decryption is shown in Figure

3. The decryption module (as Figure 3 shows) uses

the XOR Byte to Byte with key 0x18 after decoding

in Base 64. The ﬁrst generation of GinMaster fam-

ily exhibits the C&C instruction not encrypted, while

in the second one the encryption of the commands is

introduced. In order to evade detection by antimal-

ware software, the second generation obfuscates class

names and encrypts URLs as well as C&C instruc-

tions. It is impossible to catch this variant by detect-

ing the class name or URLs. In both the generations,

the malware has the capability of reporting package

information relating to packages installed/uninstalled

in the system, searching and listing package infor-

mation from remote websites, and downloading addi-

tional applications to the device without the consent

of the user.

The model checker accepts two inputs: the formal

model of the app and the property expressing the mal-

ware characteristics of the GinMaster family. If the

model checker returns true it means that we consider

the app belonging to the GinMaster family, while if

it returns false it means that the app can be either

trusted of belong to another malware family. As for-

mal veriﬁcation environment in this paper we use the

Concurrency Workbench of New Century (CWB-NC)

(Cleaveland and Sims, 1996) which supports several

different speciﬁcation languages, among which CCS.

Since CWB-NC is no longer in active develop-

ment, as future work we want to substitute CWB-

NC with CALL (standing for Concurrency workbench

developed at AALborg university) (Andersen et al.,

2015), which supports CCS as input speciﬁcation lan-

guage (as CWB-NC), but uses a more efﬁcient algo-

rithm to perform model checking. Actually, there ex-

ist mature tools with modern designs like CADP (Gar-

avel et al., 2013) with expressive input languages and

efﬁcient analysis methods. However, our aim is to de-

velop an initial rapid research prototype to evaluate

how our approach is able to identify GinMaster apps.

For the same reason, logic properties are formulated

manually, but as future work we plan to build rules

automatically using, for instance, machine learning or

clone detection.

An important feature of our approach is that an au-

tomatic dissection of an app can be achieved, with the

advantage of the localization into the code of the in-

structions that implement the malicious behavior. To

localize the payload, no manual inspection is needed.

In fact, our approach is able to identify the exact po-

sition of the instructions characterizing the malicious

behaviour, with a precision at method level.

This is a very novel result in the malware analy-

sis. In addition, starting from the consideration that

we analyze Java bytecode, our methodology is able to

correctly identiﬁes the malicious payload also when

trivial obfuscation techniques (i.e., nop insertion, junk

code, call reordering) are applied to Java source code

(Mercaldo et al., 2016b).

4 EXPERIMENT

In this section we discuss the experiment we per-

formed to evaluate the effectiveness of our approach

in recognizing GinMaster payload, discriminating

samples belonging to other Android malware fami-

lies.

4.1 Dataset

The real world samples examined in the experiment

were gathered from the Drebin project’s dataset (Arp

et al., 2014; Spreitzenbarth et al., 2013): a very well

known collection of malware used in many scien-

tiﬁc works, which includes the most diffused Android

families.

Malware dataset is also partitioned according to

the malware family: each family contains samples

which have in common several characteristics, like

payload installation, the kind of attack and events that

trigger malicious payload (Zhou and Jiang, 2012).

Table 3 shows the 10 malware families with the

largest number of applications in our malware dataset

with installation type, kind of attack and event acti-

vating malicious payload.

ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering

678

Table 2: The list of C&C commands.

ACTION STRING DESCRIPTION

A ”htt p : //client.go360days.com/report/ f irst run.do” Report the starting of GinMaster.

B ”htt p : //client.go360days.com/request/tableclass.do” show information stored in SQLite database

C ”htt p : //client.go360days.com/request/con f ig.do” Change the frequency conﬁguration for checking into the server.

D ”htt p : //client.go360days.com/request/alert.do” alert last id

E ”htt p : //client.go360days.com/request/push.do” soft last id

F ”htt p : //client.go360days.com/report/return con fig.do” show conﬁguration

G ”htt p : //client.go360days.com/report/return alert.do” send alert

H ”htt p : //client.go360days.com/report/return push.do” push ﬁle into the device

I ”htt p : //client.go360days.com/report/install list.do” Report information when installing a list of packages.

J ”htt p : //client.go360days.com/report/listener.do” check the communication between server and device

K ”htt p : //client.go360days.com/client.php?action = so ft&so f t id = ” Get a link to a speciﬁed software.

L ”htt p : //client.go360days.com/client.php?action = so ftlist&type = search&word = ” Search a list of software with a speciﬁed word.

M ”htt p : //client.go360days.com/client.php?action = so ftlist” Get the list of the available software

N ”htt p : //client.go360days.com/client.php?action = list&list id = 9” Get the software with the speciﬁed id

Figure 3: Decryption algorithm used to decode the C&C list of commands.

The malware was retrieved from the Drebin

project (Arp et al., 2014; Spreitzenbarth et al., 2013)

taking into account the top 10 most populous families.

We brieﬂy describe the malicious payload action

for the top 10 populous families in our dataset.

1. The samples of FakeInstaller family have the

main payload in common but have different code

implementations, and some of them also have an

extra payload. FakeInstaller malware is server-

side polymorphic, which means the server could

provide different .apk ﬁles for the same URL re-

quest. There are variants of FakeInstaller that not

only send SMS messages to premium rate num-

bers, but also include a backdoor to receive com-

mands from a remote server. There is a large

number of variants for this family, and it has dis-

tributed in hundreds of websites and alternative

markets. The members of this family hide their

malicious code inside repackaged version of pop-

ular applications. During the installation process

the malware sends expensive SMS messages to

premium services owned by the malware authors.

2. DroidKungFu installs a backdoor that allows at-

tackers to access the smartphone when they want

and use it as they please. They could even turn

it into a bot. This malware encrypts two known

root exploits, exploit and rage against the cage, to

break out of the Android security container. When

it runs, it decrypts these exploits and then contacts

a remote server without the user knowing.

3. Plankton uses an available native functionality

(i.e., class loading) to forward details like IMEI

and browser history to a remote server. It is

present in a wide number of versions as harmful

adware that download unwanted advertisements

and it changes the browser homepage or add un-

wanted bookmarks to it.

4. The Opfake samples make use of an algorithm that

can change shape over time so to evade the anti-

malware. The Opfake malware demands payment

for the application content through premium text

messages. This family represents an example of

polymorphic malware in Android environment: it

is written with an algorithm that can change shape

over time so to evade any detection by signature

based antimalware.

5. GinMaster family contains a malicious service

with the ability to root devices to escalate priv-

ileges, steal conﬁdential information and send

to a remote website, as well as install applica-

tions without user interaction. It is also a tro-

jan application and similarly to the DroidKungFu

family the malware starts its malicious services

as soon as it receives a BOOT COMPLETED

or USER PRESENT intent. The malware can

successfully avoid detection by mobile anti-virus

Identifying Mobile Repackaged Applications through Formal Methods

679

Table 3: Families in Drebin dataset with details of the installation method (standalone, repackaging, update), the kind of attack

(trojan, botnet), the events that trigger the malicious payload and a brief family description.

Family Installation Attack Activation Description

FakeInstaller s t,b server-side polymorphic family

Plankton s,u t,b it uses class loading to forward details

DroidKungFu r t boot,batt,sys it installs a backdoor

GinMaster r t boot malicious service to root devices

BaseBridge r,u t boot,sms,net,batt it sends information to a remote server

Adrd r t net,call it compromises personal data

Kmin s t boot it sends info to premium-rate numbers

Geinimi r t boot,sms ﬁrst Android botnet

DroidDream r b main botnet, it gained root access

Opfake r t ﬁrst Android polymorphic malware

Table 4: Performance Evaluation.

GinMaster Malware TP FP FN TN PR RC Fm Acc

100 761 81 2 19 759 0.98 0.81 0.89 0.98

software by using polymorphic techniques to hide

malicious code, obfuscating class names for each

infected object, and randomizing package names

and self-signed certiﬁcates for applications.

6. BaseBridge malware sends information to a re-

mote server running one ore more malicious ser-

vices in background, like IMEI, IMSI and other

ﬁles to premium-rate numbers. BaseBridge mal-

ware is able to obtain the permissions to use Inter-

net and to kill the processes of antimalware appli-

cation in background.

7. Kmin malware is similar to BaseBridge, but does

not kill antimalware processes.

8. Geinimi is the ﬁrst Android malware in the wild

that displays botnet-like capabilities. Once the

malware is installed, it has the potential to receive

commands from a remote server that allows the

owner of that server to control the phone. Gein-

imi makes use of a bytecode obfuscator. The mal-

ware belonging to this family is able to read, col-

lect, delete SMS, send contact informations to a

remote server, make phone call silently and also

launch a web browser to a speciﬁc URL to start

ﬁles download.

9. Adrd family is very close to Geinimi but with less

server side commands, it also compromises per-

sonal data such as IMEI and IMSI of infected de-

vice. In addiction to Geinimi, this one is able to

modify device settings.

10. DroidDream is another example of botnet, it

gained root access to device to access unique iden-

tiﬁcation information. This malware could also

downloads additional malicious programs without

the user’s knowledge as well as open the phone up

to control by hackers. The name derives from the

fact that it was set up to run between the hours of

11pm and 8am when users were most likely to be

sleeping and their phones less likely to be in use.

4.2 Evaluation

To estimate the detection performance of our method-

ology we compute the metrics of precision and recall,

F-measure (Fm) and Accuracy (Acc), deﬁned as fol-

lows:

PR =

T P

T P + FP

; RC =

T P

T P + FN

;

Fm =

2PR RC

PR + RC

; Acc =

T P + T N

T P + FN + FP + T N

where T P is the number of malware that was correctly

identiﬁed in the GinMaster family (True Positives),

T N is the number of malware correctly identiﬁed as

not belonging to the GinMaster family (True Nega-

tives), FP is the number of malware that was incor-

rectly identiﬁed in the GinMaster family (False Posi-

tives), and FN is the number of malware that was not

identiﬁed as belonging to the GinMaster family (False

Negatives).

Table 4 shows the results obtained using our

method.

We consider in the column GinMaster the sam-

ples belonging to GinMaster family, while in the col-

umn Malware the malware samples belonging to oth-

ers families considered in the study: the detail about

the malicious payload of the family we considered is

shown in Table 3. We demonstrate the effectiveness

of our approach evaluating 100 malware belonging

to GinMaster family and 761 malware belonging to

other families.

ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering

680

Results in Table 4 seems to be very promising: we

obtain an Accuracy equal to 0.9. Concerning the Gin-

Master results, we are not able to identify the mali-

cious payloads of just 2 samples on 100. It is worth

noting the above values are also due to the fact that the

dataset is unbalanced, i.e., 100 malware belonging to

GinMaster family and 761.

5 CONCLUSION AND FUTURE

WORK

The most common way to inject malicious payload

in Android environment is represented by the repack-

aging attack, that basically consists to distribute le-

gitimate well-known applications with the malicious

behaviour in order to lure users. In this paper we

propose an approach, based on formal methods, able

to catch the malicious payload related to GinMaster

family, one of the most populous repackaged trojan

embed in legitimate Android applications. GinMaster

family is able to root Android devices in order to ex-

ecute shell scripts with admin privileges, in addition

it is able to send personal user information to the at-

tacker using C&C server. We identiﬁed a set of rules

speciﬁc to GinMaster payload behaviour and we eval-

uate the effectiveness of our approach using a dataset

of real-world malware, obtaining an accuracy equal to

0.9. As future work, we plan to test our approach on

mobile malware belonging to other families that ex-

hibit trojan behaviour to evaluate the rule set on fam-

ilies with similar payload.

ACKNOWLEDGEMENTS

This work has been partially supported by H2020

EU-funded projects NeCS and C3ISP and EIT-Digital

Project HII.

REFERENCES

Andersen, J. R., Andersen, N., Enevoldsen, S., Hansen,

M. M., Larsen, K. G., Olesen, S. R., Srba, J., and

Wortmann, J. K. (2015). CAAL: concurrency work-

bench, aalborg edition. In Theoretical Aspects of

Computing - ICTAC 2015 - 12th International Col-

loquium Cali, Colombia, October 29-31, 2015, Pro-

ceedings, pages 573–582.

Arp, D., Spreitzenbarth, M., Huebner, M., Gascon, H., and

Rieck, K. (2014). Drebin: Efﬁcient and explainable

detection of android malware in your pocket. In Pro-

ceedings of 21th Annual Network and Distributed Sys-

tem Security Symposium (NDSS). IEEE.

Barbuti, R., De Francesco, N., Santone, A., and Tesei,

L. (2002). A notion of non-interference for timed

automata. Fundamenta Informaticae, 51(1-2):1–11.

cited By 6.

Barbuti, R., Francesco, N. D., Santone, A., and Vaglini, G.

(2005). Reduced models for efﬁcient CCS veriﬁca-

tion. Formal Methods in System Design, 26(3):319–

350.

Battista, P., Mercaldo, F., Nardone, V., Santone, A., and Vis-

aggio, C. A. (2016). Identiﬁcation of android malware

families with model checking. In International Con-

ference on Information Systems Security and Privacy.

SCITEPRESS.

Bernardeschi, C., De Francesco, N., Lettieri, G., and Mar-

tini, L. (2004). Checking secure information ﬂow in

java bytecode by code transformation and standard

bytecode veriﬁcation. Software - Practice and Expe-

rience, 34(13):1225–1255.

Canfora, G., Mercaldo, F., and Visaggio, C. A. (2016). An

hmm and structural entropy based detector for android

malware: An empirical study. Computers & Security,

61:1–18.

Cleaveland, R. and Sims, S. (1996). The ncsu concurrency

workbench. In CAV. Springer.

De Ruvo, G., Nardone, V., Santone, A., Ceccarelli, M.,

and Cerulo, L. (2015). Infer gene regulatory networks

from time series data with probabilistic model check-

ing. pages 26–32. cited By 7.

Fedler, R., Sch

utte, J., and Kulicke, M. (2014). On

the effectiveness of malware protection on an-

droid: An evaluation of android antivirus apps,

http://www.aisec.fraunhofer.de/.

Garavel, H., Lang, F., Mateescu, R., and Serwe, W. (2013).

CADP 2011: a toolbox for the construction and anal-

ysis of distributed processes. STTT, 15(2):89–107.

GoogleMobile (2014). http://googlemobile.blogspot.it/2012/

02/android-and-security.html.

Isohara, T., Takemori, K., and Kubota, A. (2011). Kernel-

based behavior analysis for android malware detec-

tion. In Proceedings of Seventh International Confer-

ence on Computational Intelligence and Security, pp.

1011-1015.

Jacob, G., Filiol, E., and Debar, H. (2010). Formalization of

viruses and malware through process algebras. In In-

ternational Conference on Availability, Reliability and

Security (ARES 2010). IEEE.

Kinder, J., Katzenbeisser, S., Schallhart, C., and Veith, H.

(2005). Detecting malicious code by model checking.

Springer.

Liang, S. and Du, X. (2014). Permission-combination-

based scheme for android mobile malware detec-

tion. In International Conference on Communica-

tions, pages 2301–2306.

Marforio, C., Aurelien, F., and Srdjan, C. (2011).

Application collusion attack on the permission-

based security model and its implications for mod-

ern smartphone systems, ftp://ftp.inf.ethz.ch/doc/tech-

reports/7xx/724.pdf.

Mercaldo, F., Nardone, V., Santone, A., and Visaggio, C. A.

(2016a). Download malware? No, thanks. How for-

mal methods can block update attacks. In Formal

Identifying Mobile Repackaged Applications through Formal Methods

681

Methods in Software Engineering (FormaliSE), 2016

IEEE/ACM 4th FME Workshop on. IEEE.

Mercaldo, F., Nardone, V., Santone, A., and Visaggio,

C. A. (2016b). Ransomware steals your phone. for-

mal methods rescue it. In International Conference

on Formal Techniques for Distributed Objects, Com-

ponents, and Systems, pages 212–221. Springer.

Milner, R. (1989). Communication and concurrency. PHI

Series in computer science. Prentice Hall.

Neuhaus, S. and Zimmermann, T. (2010). Security trend

analysis with cve topic models. In Software reliabil-

ity engineering (ISSRE), 2010 IEEE 21st international

symposium on, pages 111–120. IEEE.

Oberheide, J. and Miller, C. (2012). Dissect-

ing the android bouncer. In SummerCon,

https://jon.oberheide.org/ﬁles/summercon12-

bouncer.pdf.

Reina, A., Fattori, A., and Cavallaro, L. (2013). A system

call-centric analysis and stimulation technique to au-

tomatically reconstruct android malware behaviors. In

Proceedings of EuroSec.

Santone, A. (2011). Clone detection through process alge-

bras and java bytecode. pages 73–74. cited By 10.

SecureList (2015). https://securelist.com/analysis/kaspersky-

security-bulletin/73839/mobile-malware-evolution-

2015/.

Song, F. and Touili, T. (2001). Efﬁcient malware detection

using model-checking. Springer.

Song, F. and Touili, T. (2013). Pommade: Pushdown

model-checking for malware detection. In Proceed-

ings of the 2013 9th Joint Meeting on Foundations of

Software Engineering. ACM.

Song, F. and Touili, T. (2014). Model-checking for android

malware detection. Springer.

Song, J., Han, C., Wang, K., Zhao, J., Ranjan, R., and

Wang, L. (2016). An integrated static detection and

analysis framework for android. Pervasive and Mo-

bile Computing.

Spreitzenbarth, M., Echtler, F., Schreck, T., Freling, F. C.,

and Hoffmann, J. (2013). Mobilesandbox: Looking

deeper into android applications. In 28th International

ACM Symposium on Applied Computing (SAC). ACM.

Stirling, C. (1989). An introduction to modal and temporal

logics for ccs. In Concurrency: Theory, Language,

And Architecture, pages 2–20.

Tchakount, F. and Dayang, P. (2013). System calls analysis

of malwares on android. In International Journal of

Science and Tecnology (IJST) Volume, 2 No. 9.

Yerima, S. Y., Sezer, S., McWilliams, G., and Muttik, I.

(2013). A new android malware detection approach

using bayesian classiﬁcation. In International Confer-

ence on Advanced Information Networking and Appli-

cations, pages 121–128.

Yu, R. (2013). Ginmaster: a case study in android malware.

In Virus bulletin conference, pages 92–104.

Zhao, Y.-B., Liu, S.-M., and Guo, S.-Q. (2014). Extraction

and prediction of hot topics in network security. In

Computer Science and Network Security, 2014 Inter-

national Conference on, pages 347–353.

Zhou, Y. and Jiang, X. (2012). Dissecting android malware:

Characterization and evolution. In 2012 IEEE Sympo-

sium on Security and Privacy, pages 95–109. IEEE.

ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering

682