than the highest rate with exclusively the same dataset
from other related work. In addition, we extended
our working dataset with more recent data extracted
from AndroZoo APKs, and we improved the accu-
racy by using deep learning techniques and by the ex-
traction of multiple and additional features from byte-
code and the AndroidManifest.xml file. Our method-
ology has proven to be effective with an accuracy of
nearly 97.7% in detecting recent Android malware by
binary classification. A dataset consisting of features
extracted from nearly 80,000 recent applications with
about 30,000 malware will be made available on the
Internet, as well as the script to extract these features
from a raw AndroZoo dataset.
Different areas of improvement can be studied,
such as optimizing hyperparameters, exploiting a
greater mass of applications from the AndroZoo
dataset, and extending the extracted features from
bytecode to improve the model. Ongoing work on
multi-class classification to better categorize Android
malware families is actually carried out. It is also in-
teresting to study the cases of APK predicted as false
positives, to understand why they were tagged as mal-
ware. Manual reverse engineering techniques could
eventually reveal unknown attacks that were not de-
tected by classical antivirus.
REFERENCES
Allix, K., Bissyand and, T., F., Klein, J., and Le Traon, Y.
(2016). AndroZoo: Collecting Millions of Android
Apps for the Research Community. In Proceedings of
the 13th International Conference on Mining Software
Repositories, pages 468–471, New York, NY, USA.
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H.,
Rieck, K., and Siemens, C. (2014). Drebin: Effec-
tive and explainable detection of Android malware in
your pocket. In Network and Distributed System Secu-
rity Symposium, volume 14, pages 23–26, San Diego,
California, USA.
Backes, M. and Nauman, M. (2017). Luna: Quantifying
and leveraging uncertainty in Android malware analy-
sis through Bayesian machine learning. In IEEE Euro-
pean Symposium on Security and Privacy (EuroS P),
pages 204–217, Paris, France.
Dong, Y. (2017). Android malware prediction by permis-
sion analysis and data mining. In PhD, University of
Michigan-Dearborn.
Ganesh, M., Pednekar, P., Prabhuswamy, P., Sreedharan, D.,
Park, Y., and Jeon, H. (2017). CNN-based Android
malware detection. San Diego, CA, USA.
Hou, S., Saas, A., Ye, Y., and Chen, L. (2016). Droiddelver:
An Android malware detection system using deep be-
lief network based on api call blocks. In Interna-
tional Conference on Web-Age Information Manage-
ment, volume 9998, pages 54–66, Nanchang, China.
Kapratwar, A., Di Troia, F., and Stamp, M. (2017). Static
and dynamic analysis of Android malware. In Inter-
national Conference on Information Systems Security
and Privacy, pages 653–662, Porto, Portugal.
Kim, T., Kang, B., Rho, M., Sezer, S., and Im, E. G.
(2019). A multimodal deep learning method for An-
droid malware detection using various features. IEEE
Transactions on Information Forensics and Security,
14(3):773–788.
Li, D., Wang, Z., and Xue, Y. (2018). Fine-grained Android
malware detection based on deep learning. In IEEE
Conference on Communications and Network Security
(CNS), Beijing, China.
Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., and Liu, H.
(2020). A Review of Android Malware Detection Ap-
proaches Based on Machine Learning. IEEE Access,
8:124579–124607.
Lundberg, S. and Lee, S.-I. (2017). A unified approach to
interpreting model predictions. Seattle, WA.
Naway, A. and Li, Y. (2018). A Review on The Use of
Deep Learning in Android Malware Detection. In In-
ternational Journal of Computer Science and Mobile
Computing, volume 7, pages 42–58.
Nix, R. and Zhang, J. (2017). Classification of An-
droid apps and malware using deep neural networks.
In 2017 International Joint Conference on Neural
Networks (IJCNN), pages 1871–1878, Anchorage,
Alaska, USA.
Pektas¸, Abdurrahman, and Acarman, T. (2020). Deep learn-
ing for effective Android malware detection using API
call graph embeddings. Soft Computing, 24(2):1027–
1043.
Shiqi, L., Shengwei, T., Long, Y., Jiong, Y., and Hua, S.
(2018). Android malicious code classification using
deep belief network. KSII Transactions on Internet
and Information Systems, 12(1).
Sood, G. (2017). virustotal: R Client for the virustotal API.
R package version 0.2.1.
StatCounter (June 2021). Operating system market share
worldwide”. https://gs.statcounter.com/os-market-
share.
Wang, W., Zhao, M., Gao, Z., Xu, G., Xian, H., Li, Y., and
Zhang, X. (2019). Constructing Features for Detect-
ing Android Malicious Applications: Issues, Taxon-
omy and Directions. volume 7, pages 67602–67631.
Wu, D.-J., Mao, C.-H., Wei, T.-E., Lee, H.-M., and Wu,
K.-P. (2012). Droidmat: Android malware detection
through manifest and API calls tracing. In Seventh
Asia Joint Conference on Information Security, pages
62–69, Tokyo, Japan.
Yuan, H., Yang, Z., Chen, X., Li, Y., and Liu, W. (2018).
URL2Vec: URL modeling with character embeddings
for fast and accurate phishing website detection. pages
265–272, Melbourne, Australia.
ICISSP 2022 - 8th International Conference on Information Systems Security and Privacy
462