•
Accuracy
, which is the total number of hits di-
vided by the number of the instances in the
dataset: Accuracy = (TP+ TN)/(P+ N).
•
Area Under ROC Curve
(Singh et al., 2009),
establishes the relation between false negatives
and false positives.
Table 2 shows the obtained results. When we ap-
plied Manhattan distance, we obtained the best AUC
value (0.88), using average as the combination rule.
The accuracy obtained was around 85% for this com-
bination. Using euclidean distance, we obtained more
than 0.90 of AUC and 87.57% of accuracy. Finally,
using cosine distance, we obtained the best results:
0.91 of AUC and nearly 90% of accuracy.
In general, the results obtained surpassed 0.8 of
AUC and 80% of accuracy for all distances, consid-
ering the average as a combination rule in the three
cases.
6 RELATED WORK
In order to tackle the problem of growing malware in
Android, researchers have begun to explore this area
using the experience acquired in other platforms. We
can distinguish two different approaches. Dynamic
approaches execute the sample in an isolated environ-
ment and collect data about its execution. These ap-
proaches require high computational efforts and are
not suitable for the deployment on smartphones. Be-
sides, static approaches analyse the samples without
executing them. Some attempts are based on sig-
nature scanning, that is, detecting known patterns
present in malicious applications, while others try to
implement generic approaches to distinguish patterns
in benign or malicious applications.
Shabtai and Elovici (Shabtai et al., 2012) pre-
sented “Andromaly”, a framework for detecting mal-
ware on Android mobile devices. This framework
collected 88 features and events and, then, applied
machine-learning algorithms to detect abnormal be-
haviours. Their dataset was composed of 4 self-
written pieces of malware, as well as goodware sam-
ples, both separated into two different categories
(games and tools). Their approach achieved a 0.99
area under ROC curve and 99% of accuracy.
Despite these results, their framework required the
acquisition of a huge number of features and events,
overloading the device and, consequently, draining
the battery. Our approach, in contrast, extracts the
data from the
AndroidManifest.xml
file, which is a
trivial process. Although our results are not as sound
as theirs, our approach requires much less computa-
tional efforts. In addition, our dataset is larger and
sparser in malware samples than theirs.
Regarding the signature based approach, Schmidt,
Camtepe, and Albayrak (Schmidt et al., 2010) fo-
cused on a static and light-weight analysis of the sam-
ples. They used system calls as features and simple
classifiers to detect malicious behaviours. Both ap-
proaches do not prevent the installation of malware
in the devices. Our system evaluates each application
before its installation, considering several features ex-
tracted from the manifest file, obtaining similar re-
sults to those obtained in previous work.
Peng et al. (Peng et al., 2012) proposed an ap-
proach to rank the risk of Android applications us-
ing probabilistic generative models. They selected
the permissions of the applications as key feature.
Specifically, they chose the top 20 most frequently re-
quested permissions in their dataset, composed by 2
benign software collections, obtained from the An-
droid application store Google Play (157,856 and
324,658 samples, respectively) and 378 unique sam-
ples of malware. They obtained a 0.94 AUC as best
result. Nevertheless, the unbalanced nature of their
dataset makes it difficult to directly compare the re-
sults with our approach. In fact, our approach is based
on anomaly detection, as it measures the deviation of
any sample to a set of benign applications. In addi-
tion, we complemented the information provided by
the permissions with the
uses-features
, enhancing
the results and approaching the results to those ob-
tained by previous methods. In summary, our ap-
proach prevents the installation of malware on the de-
vices, instead of monitoring the execution of the ap-
plications, thus saving device resources and prevent-
ing undesirable consequences.
7 CONCLUSIONS AND FUTURE
WORK
Smartphones and tablets are flooding both consumer
and business markets and, therefore, these devices
manage a large amount of information. Thus, mal-
ware writers have found in these devices a new source
of income and therefore the number of malware sam-
ples has grown exponentially in these platforms.
In this paper, we presented a new malicious soft-
ware detection approach that is inspired in anomaly
detection systems. In contrast to other approaches,
this method only needs to previously label goodware
and measures the deviation of a new sample respect to
normality (applications without malicious intentions).
Although anomaly detection systems tend to produce
high error rates (specially, false positives), our exper-
SECRYPT2013-InternationalConferenceonSecurityandCryptography
392