Identifying Mobile Repackaged Applications through Formal Methods
Fabio Martinelli
1
, Francesco Mercaldo
1
, Vittoria Nardone
2
, Antonella Santone
2
and Corrado Aaron Visaggio
2
1
Institute for Informatics and Telematics, National Research Council of Italy (CNR), Pisa, Italy
2
Department of Engineering, University of Sannio, Benevento, Italy
Keywords:
Security, Malware, Model Checking, Android, Testing.
Abstract:
Smartphones and tablets are rapidly become indispensable in every day activities. Android has become the
most popular operating system for mobile environments in the world. These devices, owing to the open nature
of Android, are continuously exposed to attacks, mostly to data exfiltration and monetary fraud. There are
many techniques to embed the bad code, i.e. the instructions able to perform a malicious behaviour, into a
legitimate application: the most diffused one is the so-called repackaged, that consists of reverse engineer the
application in order to embed the malicious code and then (re)distribute them in the official and/or third party
markets. In this paper we propose a technique to localize malicious payload of GinMaster family, one of the
most representative repackaged trojan in Android environment. We obtain encouraging results, achieving an
accuracy equal to 0.9.
1 INTRODUCTION
In 2015, the volume of mobile malware continued
to grow. From 2004 to 2013 security experts of Se-
curList detected nearly 200,000 samples of malicious
mobile code. In 2014 there were 295,539 new pro-
grams, while the number was 884,774 in 2015. Each
malware sample has several installation packages: in
2015, they detected 2,961,727 malicious mobile in-
stallation packages (SecureList, 2015).
Signature-based malware detection, which is the
most common technique adopted by commercial an-
timalware for mobile, is often ineffective. Moreover it
is costly, as the process for obtaining and classifying a
malware signature is laborious and time-consuming.
In order to mitigate the malware trend in Febru-
ary 2011, Google introduced Bouncer (GoogleMo-
bile, 2014) to screen submitted apps for detecting
malicious behaviors, but this has not eliminated the
problem, as it is discussed in (Oberheide and Miller,
2012).
The Fraunhofer Research Institution for Applied
and Integrated Security has performed an evaluation
of antivirus for Android (Fedler et al., 2014): the con-
clusion is there are many techniques for evading the
detection of most antivirus and for installing mali-
cious payload.
The most employed installation technique is the
so-called repackaging (Zhou and Jiang, 2012): the
attacker decompiles a trusted application to get the
source code, then adds the malicious payload and re-
compiles the application with the payload to make it
available on various market alternatives, and some-
times also on the official market. The user is often
encouraged to download such malicious applications
because they are free versions of trusted applications
sold on the official market.
Scientific community in last years has proposed a
lot of static (Canfora et al., 2016; Liang and Du, 2014;
Yerima et al., 2013; Arp et al., 2014; Spreitzenbarth
et al., 2013) and dynamic (Isohara et al., 2011; Tchak-
ount and Dayang, 2013; Reina et al., 2013) techniques
basically based on machine learning methods in order
to solve the problem: the main limitation in this case
is due to the false positive ratio.
Indeed existing solutions for protecting privacy
and security on smartphones are still ineffective in
many facets (Marforio et al., 2011), and many ana-
lysts warn that the malware families and their vari-
ants for Android are rapidly increasing. This scenario
calls for new security models and tools to limit the
spreading of malware for smartphones.
For these reasons, in this paper we evaluate the
effectiveness of a model-checking approach to iden-
tify Android malware. We evaluate our method us-
ing GinMaster malware, one of the most widespread
Martinelli, F., Mercaldo, F., Nardone, V., Santone, A. and Visaggio, C.
Identifying Mobile Repackaged Applications through Formal Methods.
DOI: 10.5220/0006287906730682
In Proceedings of the 3rd International Conference on Information Systems Security and Pr ivacy (ICISSP 2017), pages 673-682
ISBN: 978-989-758-209-7
Copyright
c
2017 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
673
family in mobile malware landscape, with over 6,000
known variants belonging to this family.
The approach can be easily applied to other
repackaged families: as reported in (citazione), the
so-called repackaged malware contains the malicious
payload at installation time, this is the reason why
the payload is embedded into the application and the
logic rules, able to verify the existance of the mali-
cious payload, can be verified without run the sample.
The reason why we explore the effectiveness of
our approach on GinMaster family is represented by
the fact that this is one of the most widespread trojan
malware family in mobile environment. The payload
belonging to this family is embedded into legitimate
applications using repackaging technique. GinMaster
has gone through three significant generations since
it was first found by researchers from North Carolina
State University on 17 August 2011.
GinMaster is distributed in third-party app mar-
kets in China. The attackers injected GinMaster
code into thousands of legitimate game, ringtone and
picture applications. These applications have more
chance to lure mobile users into installing the mali-
cious payload.
The trojan contains a malicious service able to
root specific devices in order to escalate privileges.
It also has the ability to modify and delete contents in
the SD card of device, steal confidential information
and send it to a remote website, execute command-
and-control services from the remote website, as well
as download and install applications regardless of user
interaction.
The approach can be easily applied to other
repackaged families: as reported in (Zhou and Jiang,
2012), the so-called repackaged malware contains the
malicious payload at installation time, this is the rea-
son why the payload is embedded into the applica-
tion and the logic rules, able to verify the existence of
the malicious payload, can be verified without run the
sample.
The salient characteristics of our methodology
are:
the use of formal methods;
the inspection of Java Bytecode and not on the
source code;
the use of static analysis;
the capture of malicious behaviours at a finer
granularity.
In practice, from the Java Bytecode application
files we generate CCS processes, which are succes-
sively used for checking properties expressing the
most common behaviours exhibit by GinMaster fam-
ily samples.
Performing automatic analysis on the Bytecode
and not directly on the source code has several ad-
vantages:
independence of the source programming lan-
guage;
identification of GinMaster without decompila-
tion even when source code is lacking;
ease of parsing a lower-level code;
independence from obfuscation.
The paper proceeds as follows: Section 2 dis-
cusses related work; Section 3 describes and moti-
vates our approach; Section 4 illustrates the results
of experiments; finally, conclusions are drawn in the
Section 5.
2 RELATED WORK
In this section we review the current literature related
to malware identification with particular regards to
malicious family identification.
A very thorough dissecting of GinMaster family is
provided in (Yu, 2013). The paper gives an overview
of three generations of the GinMaster family, exam-
ines the core malicious functionality, tracks their evo-
lution from source code, and presents notable tech-
niques utilized by the specific variants. Basically,
Ginmaster is able to set-up via mobile botnet mali-
cious code hidden in the affected app. However, in-
stead of directly taking advantage of these zombies
devices to make profit from end-users, the malware
controller employs a botnet to generate millions of in-
stallations and large volumes of advertising traffic to
legitimate developers and advertising services.
In (Canfora et al., 2016) authors experimentally
evaluate two techniques for detecting Android mal-
ware: the first one is based on Hidden Markov Model
(HMM), while the second one exploits Structural En-
tropy. They demonstrate that these methods obtain a
precision of 0.96 to discriminate a malware applica-
tion. In addition they also analyse ransomware sam-
ples, obtaining a precision of 0.961 with LADTree
classification algorithm using the Structural Entropy
method, and a precision of 0.824 with J48 algorithm
using the HMM one in ransomware identification.
Song et al. (Song et al., 2016) propose a frame-
work to statically detect Android malware, consisting
of four layers of filtering mechanisms: the message
digest values, the combination of malicious permis-
sions, the dangerous permissions, and the dangerous
intention. As additional contribute, they propose a
novel threat degree threshold model of dangerous per-
missions on malware detection. They experiment the
ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering
674
method on real mobile devices, using 83 real mobile
devices and achieving a 98.8% pass rate, where the
versions of Android range from 2.3 to 5.1.
Neuhaus et al. (Neuhaus and Zimmermann, 2010)
crawl the vulnerability reports in the Common Vul-
nerability and Exposures database by using topic
models to find prevalent vulnerability types and
new trends semi-automatically. They analyze 39393
unique reports until the end of 2009, with the aim of
characterizing many vulnerability trends: SQL injec-
tion (PHP), buffer overflows, format strings, cross-
site request forgery and so on.
Zhao et al. (Zhao et al., 2014) propose a LDA-
based method to analyze the trends of network secu-
rity which consist of three steps: collect data from
web sites, extract topics from the collected data, and
makes the curves of trends over time. They select
620 documents sorted by time and extract 10 topic
from each document. Six interesting topics are dis-
covered by LDA model, according to the autors: dns-
ddos, vulnerability, mobile-malware, mac-malware,
Browser malware and Java-vulnerability.
Formal methods have been applied for studying
malware in some recent papers. In (Kinder et al.,
2005) the authors introduce the specification language
CTPL (Computation Tree Predicate Logic) confirm-
ing the malicious behavior of thirteen Windows mal-
ware variants using as dataset a set of worms dating
from 2002 to 2004.
Song et al. present an approach to model Mi-
crosoft Windows XP binary programs as a PushDown
System (PDS) (Song and Touili, 2001). They evalu-
ate 200 malware variants (generated by NGVCK and
VCL32 engines) and 8 benign programs.
The tool PoMMaDe (Song and Touili, 2013) is
able to detect 600 real malware, 200 malware gen-
erated by two malware generators (NGVCK and
VCL32), and proves the reliability of benign pro-
grams: a Microsoft Windows binary program is mod-
eled as a PDS which allows to track the stack of the
program.
Song et al. model mobile applications using a
PDS in order to discovery private data leaking work-
ing at Smali code level (Song and Touili, 2014). Ille-
gal flow of information in Java bytecode has been also
studied in (Bernardeschi et al., 2004), using a static
analysis approach.
Jacob and colleagues provide a basis for a mal-
ware model, founded on the Join-Calculus: they con-
sider the system call sequences to build the model (Ja-
cob et al., 2010).
Recently, the possibility to identify the malicious
payload in Android malware using a model checking
based approach has been explored in (Battista et al.,
2016; Mercaldo et al., 2016a; Mercaldo et al., 2016b).
Starting from payload behavior definition they for-
mulate logic rules and then test them by using a
real world dataset composed by Ransomware, Droid-
KungFu, Opfake families and update attack samples.
As it emerges from the literature in the last years,
formal methods have been applied to detect mobile
malware, but at the best knowledge of the authors they
have never been applied for identifying specifically
the repackaged attack provided by GinMaster family
on Android malware.
3 THE METHOD
In this section a model checking-based approach for
the detection of GinMaster apps is presented. While
model checking (Barbuti et al., 2005) was originally
developed to verify the correctness of systems against
specifications, recently it has been highlighted in con-
nection with a variety of disciplines such as biol-
ogy (De Ruvo et al., 2015), clone detection (Santone,
2011), secure information flow (Barbuti et al., 2002),
among others. In this paper we present the use of
model checking in the security field. Fig. 1 describes
all the phases of our approach.
During the first phase, we generate a formal model
from the Java Bytecode of the .class files derived by
the app under analysis. As formal specification lan-
guage, we use Milner’s Calculus of Communicating
Systems (CCS) (Milner, 1989), one of the most well
known process algebras. CCS contains basic oper-
ators to build finite processes, communication op-
erators to express concurrency, and some notion of
recursion to capture the infinite behaviour. Thus,
the formal model is obtained by transforming each
Java Bytecode instruction in CCS processes. More
precisely, from the CCS we generate an automaton
such that the nodes represent instruction addresses
while the edges (labeled with opcodes) represent the
control-flow transitions from one instruction to it suc-
cessors(s).
In the second phase, we try to discover Android
malware GinMaster apps. The behavior of the Gin-
Master family is encoded into a property ϕ expressed
in a branching temporal logic: the mu-calculus logic
(Stirling, 1989). Temporal logics are logical for-
malisms for expressing properties such as liveness
and safety properties. The syntax of the mu-calculus
is the following, where K ranges over sets of actions
and Z ranges over variables:
φ ::= tt | ff |Z | φ φ | φ φ | [K]φ |
hKiφ | νZ.φ | µZ.φ
Identifying Mobile Repackaged Applications through Formal Methods
675
Figure 1: The Workflow of The Approach.
A fixpoint formula has the form µZ.φ (resp. νZ.φ)
where µZ (resp. νZ) binds free occurrences of Z in φ.
An occurrence of Z is free if it is not within the scope
of a binder µZ (resp. νZ). A formula is closed if it
contains no free variables. µZ.φ is the least fixpoint of
the recursive equation Z = φ, while νZ.φ is the great-
est one. From now on we consider only closed formu-
lae.
The satisfaction of a formula φ by a state s of a
transition system is defined as follows:
each state satisfies tt and no state satisfies ff;
a state satisfies φ
1
φ
2
(φ
1
φ
2
) if it satisfies φ
1
or
(and) φ
2
.
[K] φ is satisfied by a state which, for every perfor-
mance of an action in K, evolves to a state obeying
φ.
hKi φ is satisfied by a state which can evolve to a
state obeying φ by performing an action in K.
For the precise definition of the satisfaction of a
closed formula ϕ by a state s (written s |= ϕ) the reader
can refer to (Stirling, 1989).
For example, µY.hai tt h−a,biY means that “it
is possible to perform the action a non preceded by
the action b”.
The CCS formal model, generated by the Java
Bytecode during the first phase, is now used to prove
the property ϕ: using the model checking we deter-
mine the detection of Ginmaster malware apps.
In Table 1 we show the GinMaster formulae to
give the reader the flavour of the approach followed.
Table 1 shows the logic formulae to catch the Gin-
Master malicious payload: the first one, i.e., ϕ
1
, is
able to catch the root ability of the GinMaster mali-
cious payload; the second one, i.e., ϕ
2
, identifies the
gather of user information, one of the most represen-
tative mobile trojan behavior, while ϕ
3
identifies the
commands to communicate with the C&C server. Fig-
ure 2 shows two snippets of code belonging to a Gin-
Master sample. The reported snippets are related to
the two malicious behaviours catch by the logic for-
mulae ϕ
1
and ϕ
2
.
More precisely, formula ϕ
1
is able to catch follow-
ing actions:
new javalangStringBuilder”: it represents muta-
ble sequence of characters, typically used to build
path where resources, i.e. files, are located. In the
case of GinMaster the built string represents the
path of the script able to root the device;
pushchmod”: to successfully run the script the
malware needs to set the admin privileges to
the root script, this action is performed with the
“chmod” command;
invokeappend”: this represent a method of the
StringBuilder class used to concatenate different
String;
invokeexec”: this instruction is able to run the
specified command and arguments in a separate
process with the specified environment and work-
ing directory. The “exec” method, belonging to
“Runtime” class, represents the command able to
run the script to root the phone that already ob-
tained the admin privileges;
invokewaitFor”: this instruction causes the cur-
rent thread to wait, if necessary, until the process
ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering
676
Table 1: The formulae for GinMaster payload detection.
ϕ
1
= µX.hnew javalangStringBuilderi ϕ
1
1
h−new javalangStringBuilderi X
ϕ
1
1
= µX.hpushchmodi ϕ
1
2
h−pushchmodi X
ϕ
1
2
= µX.hinvokeappendi ϕ
1
3
h−invokeappendi X
ϕ
1
3
= µX.hinvokeexeci ϕ
1
4
h−invokeexeci X
ϕ
1
4
= µX.hinvokewaitFori ϕ
1
5
h−invokewaitFori X
ϕ
1
5
= µX.hpush f ilei tt h−push f ilei X
ϕ
2
= µX.hpushphonei ϕ
2
1
h−pushphonei X
ϕ
2
1
= µX.hinvokegetSystemServicei ϕ
2
2
h−invokegetSystemServicei X
ϕ
2
2
= µX.hcheckcastandroidtelephonyTelephonyManageri ϕ
2
3
h−checkcastandroidtelephonyTelephonyManageri X
ϕ
2
3
= µX.hinvokegetDeviceIdi ϕ
2
4
h−invokegetDeviceIdi X
ϕ
2
4
= µX.hinvokegetSubscriberIdi ϕ
2
5
h−invokegetSubscriberIdi X
ϕ
2
5
= µX.hinvokegetSimSerialNumberi ϕ
2
6
h−invokegetSimSerialNumberi X
ϕ
2
6
= µX.hinvokegetLineUunoNumberi ϕ
2
7
h−invokegetLineUunoNumberi X
ϕ
2
7
= µX.hnew javaioBu f f eredReaderi tt h−new javaioBu f f eredReaderi X
ϕ
3
= µX.hA,B,C, D, E,F,G,H,I,J,K,L, M, Ni tt h−A,B,C,D,E,F,G,H,I,J, K,L, M, Ni X
Figure 2: Code Snippet Related to Logic Formulae ϕ
1
and ϕ
2
.
represented by this Process object has terminated.
This method returns immediately if the subpro-
cess has already terminated. If the subprocess
has not yet terminated, the calling thread will be
blocked until the subprocess exits. In this case the
subprocess is represented by the execution of the
script to root the device;
pushphone”: it represents the retrieval of a file
from an external source and the storage on the de-
vice; in this case the stored file in the device is the
script the will be run in a separate thread.
Instead, formula ϕ
2
identifies the personal infor-
mation gathering capability of the malicious payload
performing following actions:
invokegetSystemService”: it represents an in-
terface to global information about the appli-
cation environment. This is an abstract class
whose implementation is provided by the An-
droid system. It allows access to application-
specific resources and classes, as well as up-calls
for application-level operations such as launching
activities, broadcasting and receiving intents;
checkcastandroidtelephonyTelephonyManager:
it provides access to information about the
telephony services on the device. Applications
can use the methods in this class to determine
telephony services and states, as well as to
access some types of subscriber information.
Applications can also register a listener to re-
ceive notification of telephony state changes;
invokegetDeviceId”: it is a method of the
“TelephonyManager” class provided by An-
droid environment and it returns the unique
device ID, for instance, the IMEI for GSM
Identifying Mobile Repackaged Applications through Formal Methods
677
and the MEID or ESN for CDMA phones;
invokegetSubscriberId”: another method pro-
vided by “TelephonyManager” class Returns the
unique subscriber ID, for example, the IMSI for a
GSM phone;
invokegetSimSerialNumber”: this method, be-
longing to “TelephonyManager” class, it returns
the serial number of the SIM inserted into the de-
vice;
invokegetLineUunoNumber”: this method re-
turns the phone number string for line 1;
new javaioBu f f eredReader”: an object belong-
ing to the BufferedReader class is able to read text
from a character-input stream and buffer charac-
ters. In this case the buffer contains the personal
information retrieved previously by the malicious
payload.
The formula ϕ
3
identifies the Command and Con-
trol (C&C) list of instructions. Table 2 shows all the
strings of commands and reports a brief description of
them. In some sample these strings are encrypted and
the algorithm used for decryption is shown in Figure
3. The decryption module (as Figure 3 shows) uses
the XOR Byte to Byte with key 0x18 after decoding
in Base 64. The first generation of GinMaster fam-
ily exhibits the C&C instruction not encrypted, while
in the second one the encryption of the commands is
introduced. In order to evade detection by antimal-
ware software, the second generation obfuscates class
names and encrypts URLs as well as C&C instruc-
tions. It is impossible to catch this variant by detect-
ing the class name or URLs. In both the generations,
the malware has the capability of reporting package
information relating to packages installed/uninstalled
in the system, searching and listing package infor-
mation from remote websites, and downloading addi-
tional applications to the device without the consent
of the user.
The model checker accepts two inputs: the formal
model of the app and the property expressing the mal-
ware characteristics of the GinMaster family. If the
model checker returns true it means that we consider
the app belonging to the GinMaster family, while if
it returns false it means that the app can be either
trusted of belong to another malware family. As for-
mal verification environment in this paper we use the
Concurrency Workbench of New Century (CWB-NC)
(Cleaveland and Sims, 1996) which supports several
different specification languages, among which CCS.
Since CWB-NC is no longer in active develop-
ment, as future work we want to substitute CWB-
NC with CALL (standing for Concurrency workbench
developed at AALborg university) (Andersen et al.,
2015), which supports CCS as input specification lan-
guage (as CWB-NC), but uses a more efficient algo-
rithm to perform model checking. Actually, there ex-
ist mature tools with modern designs like CADP (Gar-
avel et al., 2013) with expressive input languages and
efficient analysis methods. However, our aim is to de-
velop an initial rapid research prototype to evaluate
how our approach is able to identify GinMaster apps.
For the same reason, logic properties are formulated
manually, but as future work we plan to build rules
automatically using, for instance, machine learning or
clone detection.
An important feature of our approach is that an au-
tomatic dissection of an app can be achieved, with the
advantage of the localization into the code of the in-
structions that implement the malicious behavior. To
localize the payload, no manual inspection is needed.
In fact, our approach is able to identify the exact po-
sition of the instructions characterizing the malicious
behaviour, with a precision at method level.
This is a very novel result in the malware analy-
sis. In addition, starting from the consideration that
we analyze Java bytecode, our methodology is able to
correctly identifies the malicious payload also when
trivial obfuscation techniques (i.e., nop insertion, junk
code, call reordering) are applied to Java source code
(Mercaldo et al., 2016b).
4 EXPERIMENT
In this section we discuss the experiment we per-
formed to evaluate the effectiveness of our approach
in recognizing GinMaster payload, discriminating
samples belonging to other Android malware fami-
lies.
4.1 Dataset
The real world samples examined in the experiment
were gathered from the Drebin project’s dataset (Arp
et al., 2014; Spreitzenbarth et al., 2013): a very well
known collection of malware used in many scien-
tific works, which includes the most diffused Android
families.
Malware dataset is also partitioned according to
the malware family: each family contains samples
which have in common several characteristics, like
payload installation, the kind of attack and events that
trigger malicious payload (Zhou and Jiang, 2012).
Table 3 shows the 10 malware families with the
largest number of applications in our malware dataset
with installation type, kind of attack and event acti-
vating malicious payload.
ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering
678
Table 2: The list of C&C commands.
ACTION STRING DESCRIPTION
A htt p : //client.go360days.com/report/ f irst run.do Report the starting of GinMaster.
B htt p : //client.go360days.com/request/tableclass.do show information stored in SQLite database
C htt p : //client.go360days.com/request/con f ig.do Change the frequency configuration for checking into the server.
D htt p : //client.go360days.com/request/alert.do alert last id
E htt p : //client.go360days.com/request/push.do soft last id
F htt p : //client.go360days.com/report/return con fig.do show configuration
G htt p : //client.go360days.com/report/return alert.do send alert
H htt p : //client.go360days.com/report/return push.do push file into the device
I htt p : //client.go360days.com/report/install list.do Report information when installing a list of packages.
J htt p : //client.go360days.com/report/listener.do check the communication between server and device
K htt p : //client.go360days.com/client.php?action = so ft&so f t id = Get a link to a specified software.
L htt p : //client.go360days.com/client.php?action = so ftlist&type = search&word = Search a list of software with a specified word.
M htt p : //client.go360days.com/client.php?action = so ftlist Get the list of the available software
N htt p : //client.go360days.com/client.php?action = list&list id = 9” Get the software with the specified id
Figure 3: Decryption algorithm used to decode the C&C list of commands.
The malware was retrieved from the Drebin
project (Arp et al., 2014; Spreitzenbarth et al., 2013)
taking into account the top 10 most populous families.
We briefly describe the malicious payload action
for the top 10 populous families in our dataset.
1. The samples of FakeInstaller family have the
main payload in common but have different code
implementations, and some of them also have an
extra payload. FakeInstaller malware is server-
side polymorphic, which means the server could
provide different .apk files for the same URL re-
quest. There are variants of FakeInstaller that not
only send SMS messages to premium rate num-
bers, but also include a backdoor to receive com-
mands from a remote server. There is a large
number of variants for this family, and it has dis-
tributed in hundreds of websites and alternative
markets. The members of this family hide their
malicious code inside repackaged version of pop-
ular applications. During the installation process
the malware sends expensive SMS messages to
premium services owned by the malware authors.
2. DroidKungFu installs a backdoor that allows at-
tackers to access the smartphone when they want
and use it as they please. They could even turn
it into a bot. This malware encrypts two known
root exploits, exploit and rage against the cage, to
break out of the Android security container. When
it runs, it decrypts these exploits and then contacts
a remote server without the user knowing.
3. Plankton uses an available native functionality
(i.e., class loading) to forward details like IMEI
and browser history to a remote server. It is
present in a wide number of versions as harmful
adware that download unwanted advertisements
and it changes the browser homepage or add un-
wanted bookmarks to it.
4. The Opfake samples make use of an algorithm that
can change shape over time so to evade the anti-
malware. The Opfake malware demands payment
for the application content through premium text
messages. This family represents an example of
polymorphic malware in Android environment: it
is written with an algorithm that can change shape
over time so to evade any detection by signature
based antimalware.
5. GinMaster family contains a malicious service
with the ability to root devices to escalate priv-
ileges, steal confidential information and send
to a remote website, as well as install applica-
tions without user interaction. It is also a tro-
jan application and similarly to the DroidKungFu
family the malware starts its malicious services
as soon as it receives a BOOT COMPLETED
or USER PRESENT intent. The malware can
successfully avoid detection by mobile anti-virus
Identifying Mobile Repackaged Applications through Formal Methods
679
Table 3: Families in Drebin dataset with details of the installation method (standalone, repackaging, update), the kind of attack
(trojan, botnet), the events that trigger the malicious payload and a brief family description.
Family Installation Attack Activation Description
FakeInstaller s t,b server-side polymorphic family
Plankton s,u t,b it uses class loading to forward details
DroidKungFu r t boot,batt,sys it installs a backdoor
GinMaster r t boot malicious service to root devices
BaseBridge r,u t boot,sms,net,batt it sends information to a remote server
Adrd r t net,call it compromises personal data
Kmin s t boot it sends info to premium-rate numbers
Geinimi r t boot,sms first Android botnet
DroidDream r b main botnet, it gained root access
Opfake r t first Android polymorphic malware
Table 4: Performance Evaluation.
GinMaster Malware TP FP FN TN PR RC Fm Acc
100 761 81 2 19 759 0.98 0.81 0.89 0.98
software by using polymorphic techniques to hide
malicious code, obfuscating class names for each
infected object, and randomizing package names
and self-signed certificates for applications.
6. BaseBridge malware sends information to a re-
mote server running one ore more malicious ser-
vices in background, like IMEI, IMSI and other
files to premium-rate numbers. BaseBridge mal-
ware is able to obtain the permissions to use Inter-
net and to kill the processes of antimalware appli-
cation in background.
7. Kmin malware is similar to BaseBridge, but does
not kill antimalware processes.
8. Geinimi is the first Android malware in the wild
that displays botnet-like capabilities. Once the
malware is installed, it has the potential to receive
commands from a remote server that allows the
owner of that server to control the phone. Gein-
imi makes use of a bytecode obfuscator. The mal-
ware belonging to this family is able to read, col-
lect, delete SMS, send contact informations to a
remote server, make phone call silently and also
launch a web browser to a specific URL to start
files download.
9. Adrd family is very close to Geinimi but with less
server side commands, it also compromises per-
sonal data such as IMEI and IMSI of infected de-
vice. In addiction to Geinimi, this one is able to
modify device settings.
10. DroidDream is another example of botnet, it
gained root access to device to access unique iden-
tification information. This malware could also
downloads additional malicious programs without
the user’s knowledge as well as open the phone up
to control by hackers. The name derives from the
fact that it was set up to run between the hours of
11pm and 8am when users were most likely to be
sleeping and their phones less likely to be in use.
4.2 Evaluation
To estimate the detection performance of our method-
ology we compute the metrics of precision and recall,
F-measure (Fm) and Accuracy (Acc), defined as fol-
lows:
PR =
T P
T P + FP
; RC =
T P
T P + FN
;
Fm =
2PR RC
PR + RC
; Acc =
T P + T N
T P + FN + FP + T N
where T P is the number of malware that was correctly
identified in the GinMaster family (True Positives),
T N is the number of malware correctly identified as
not belonging to the GinMaster family (True Nega-
tives), FP is the number of malware that was incor-
rectly identified in the GinMaster family (False Posi-
tives), and FN is the number of malware that was not
identified as belonging to the GinMaster family (False
Negatives).
Table 4 shows the results obtained using our
method.
We consider in the column GinMaster the sam-
ples belonging to GinMaster family, while in the col-
umn Malware the malware samples belonging to oth-
ers families considered in the study: the detail about
the malicious payload of the family we considered is
shown in Table 3. We demonstrate the effectiveness
of our approach evaluating 100 malware belonging
to GinMaster family and 761 malware belonging to
other families.
ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering
680
Results in Table 4 seems to be very promising: we
obtain an Accuracy equal to 0.9. Concerning the Gin-
Master results, we are not able to identify the mali-
cious payloads of just 2 samples on 100. It is worth
noting the above values are also due to the fact that the
dataset is unbalanced, i.e., 100 malware belonging to
GinMaster family and 761.
5 CONCLUSION AND FUTURE
WORK
The most common way to inject malicious payload
in Android environment is represented by the repack-
aging attack, that basically consists to distribute le-
gitimate well-known applications with the malicious
behaviour in order to lure users. In this paper we
propose an approach, based on formal methods, able
to catch the malicious payload related to GinMaster
family, one of the most populous repackaged trojan
embed in legitimate Android applications. GinMaster
family is able to root Android devices in order to ex-
ecute shell scripts with admin privileges, in addition
it is able to send personal user information to the at-
tacker using C&C server. We identified a set of rules
specific to GinMaster payload behaviour and we eval-
uate the effectiveness of our approach using a dataset
of real-world malware, obtaining an accuracy equal to
0.9. As future work, we plan to test our approach on
mobile malware belonging to other families that ex-
hibit trojan behaviour to evaluate the rule set on fam-
ilies with similar payload.
ACKNOWLEDGEMENTS
This work has been partially supported by H2020
EU-funded projects NeCS and C3ISP and EIT-Digital
Project HII.
REFERENCES
Andersen, J. R., Andersen, N., Enevoldsen, S., Hansen,
M. M., Larsen, K. G., Olesen, S. R., Srba, J., and
Wortmann, J. K. (2015). CAAL: concurrency work-
bench, aalborg edition. In Theoretical Aspects of
Computing - ICTAC 2015 - 12th International Col-
loquium Cali, Colombia, October 29-31, 2015, Pro-
ceedings, pages 573–582.
Arp, D., Spreitzenbarth, M., Huebner, M., Gascon, H., and
Rieck, K. (2014). Drebin: Efficient and explainable
detection of android malware in your pocket. In Pro-
ceedings of 21th Annual Network and Distributed Sys-
tem Security Symposium (NDSS). IEEE.
Barbuti, R., De Francesco, N., Santone, A., and Tesei,
L. (2002). A notion of non-interference for timed
automata. Fundamenta Informaticae, 51(1-2):1–11.
cited By 6.
Barbuti, R., Francesco, N. D., Santone, A., and Vaglini, G.
(2005). Reduced models for efficient CCS verifica-
tion. Formal Methods in System Design, 26(3):319–
350.
Battista, P., Mercaldo, F., Nardone, V., Santone, A., and Vis-
aggio, C. A. (2016). Identification of android malware
families with model checking. In International Con-
ference on Information Systems Security and Privacy.
SCITEPRESS.
Bernardeschi, C., De Francesco, N., Lettieri, G., and Mar-
tini, L. (2004). Checking secure information flow in
java bytecode by code transformation and standard
bytecode verification. Software - Practice and Expe-
rience, 34(13):1225–1255.
Canfora, G., Mercaldo, F., and Visaggio, C. A. (2016). An
hmm and structural entropy based detector for android
malware: An empirical study. Computers & Security,
61:1–18.
Cleaveland, R. and Sims, S. (1996). The ncsu concurrency
workbench. In CAV. Springer.
De Ruvo, G., Nardone, V., Santone, A., Ceccarelli, M.,
and Cerulo, L. (2015). Infer gene regulatory networks
from time series data with probabilistic model check-
ing. pages 26–32. cited By 7.
Fedler, R., Sch
¨
utte, J., and Kulicke, M. (2014). On
the effectiveness of malware protection on an-
droid: An evaluation of android antivirus apps,
http://www.aisec.fraunhofer.de/.
Garavel, H., Lang, F., Mateescu, R., and Serwe, W. (2013).
CADP 2011: a toolbox for the construction and anal-
ysis of distributed processes. STTT, 15(2):89–107.
GoogleMobile (2014). http://googlemobile.blogspot.it/2012/
02/android-and-security.html.
Isohara, T., Takemori, K., and Kubota, A. (2011). Kernel-
based behavior analysis for android malware detec-
tion. In Proceedings of Seventh International Confer-
ence on Computational Intelligence and Security, pp.
1011-1015.
Jacob, G., Filiol, E., and Debar, H. (2010). Formalization of
viruses and malware through process algebras. In In-
ternational Conference on Availability, Reliability and
Security (ARES 2010). IEEE.
Kinder, J., Katzenbeisser, S., Schallhart, C., and Veith, H.
(2005). Detecting malicious code by model checking.
Springer.
Liang, S. and Du, X. (2014). Permission-combination-
based scheme for android mobile malware detec-
tion. In International Conference on Communica-
tions, pages 2301–2306.
Marforio, C., Aurelien, F., and Srdjan, C. (2011).
Application collusion attack on the permission-
based security model and its implications for mod-
ern smartphone systems, ftp://ftp.inf.ethz.ch/doc/tech-
reports/7xx/724.pdf.
Mercaldo, F., Nardone, V., Santone, A., and Visaggio, C. A.
(2016a). Download malware? No, thanks. How for-
mal methods can block update attacks. In Formal
Identifying Mobile Repackaged Applications through Formal Methods
681
Methods in Software Engineering (FormaliSE), 2016
IEEE/ACM 4th FME Workshop on. IEEE.
Mercaldo, F., Nardone, V., Santone, A., and Visaggio,
C. A. (2016b). Ransomware steals your phone. for-
mal methods rescue it. In International Conference
on Formal Techniques for Distributed Objects, Com-
ponents, and Systems, pages 212–221. Springer.
Milner, R. (1989). Communication and concurrency. PHI
Series in computer science. Prentice Hall.
Neuhaus, S. and Zimmermann, T. (2010). Security trend
analysis with cve topic models. In Software reliabil-
ity engineering (ISSRE), 2010 IEEE 21st international
symposium on, pages 111–120. IEEE.
Oberheide, J. and Miller, C. (2012). Dissect-
ing the android bouncer. In SummerCon,
https://jon.oberheide.org/files/summercon12-
bouncer.pdf.
Reina, A., Fattori, A., and Cavallaro, L. (2013). A system
call-centric analysis and stimulation technique to au-
tomatically reconstruct android malware behaviors. In
Proceedings of EuroSec.
Santone, A. (2011). Clone detection through process alge-
bras and java bytecode. pages 73–74. cited By 10.
SecureList (2015). https://securelist.com/analysis/kaspersky-
security-bulletin/73839/mobile-malware-evolution-
2015/.
Song, F. and Touili, T. (2001). Efficient malware detection
using model-checking. Springer.
Song, F. and Touili, T. (2013). Pommade: Pushdown
model-checking for malware detection. In Proceed-
ings of the 2013 9th Joint Meeting on Foundations of
Software Engineering. ACM.
Song, F. and Touili, T. (2014). Model-checking for android
malware detection. Springer.
Song, J., Han, C., Wang, K., Zhao, J., Ranjan, R., and
Wang, L. (2016). An integrated static detection and
analysis framework for android. Pervasive and Mo-
bile Computing.
Spreitzenbarth, M., Echtler, F., Schreck, T., Freling, F. C.,
and Hoffmann, J. (2013). Mobilesandbox: Looking
deeper into android applications. In 28th International
ACM Symposium on Applied Computing (SAC). ACM.
Stirling, C. (1989). An introduction to modal and temporal
logics for ccs. In Concurrency: Theory, Language,
And Architecture, pages 2–20.
Tchakount, F. and Dayang, P. (2013). System calls analysis
of malwares on android. In International Journal of
Science and Tecnology (IJST) Volume, 2 No. 9.
Yerima, S. Y., Sezer, S., McWilliams, G., and Muttik, I.
(2013). A new android malware detection approach
using bayesian classification. In International Confer-
ence on Advanced Information Networking and Appli-
cations, pages 121–128.
Yu, R. (2013). Ginmaster: a case study in android malware.
In Virus bulletin conference, pages 92–104.
Zhao, Y.-B., Liu, S.-M., and Guo, S.-Q. (2014). Extraction
and prediction of hot topics in network security. In
Computer Science and Network Security, 2014 Inter-
national Conference on, pages 347–353.
Zhou, Y. and Jiang, X. (2012). Dissecting android malware:
Characterization and evolution. In 2012 IEEE Sympo-
sium on Security and Privacy, pages 95–109. IEEE.
ForSE 2017 - 1st International Workshop on FORmal methods for Security Engineering
682