ing classification models for this task. First, it does
not require a large number of samples in each mal-
ware family to train the model. Second, using the con-
cept of discriminative power to select discriminative
function clusters, our approach can handle datasets
with uneven class distribution. Third, unlike ordinary
machine learning models, our approach provides in-
terpretable evidence to justify its classification results.
It is a practical solution for malware classification.
ACKNOWLEDGMENT
This research is supported by Defence Research
and Development Canada (contract no. W7701-
176483/001/QCL), NSERC Discovery Grants
(RGPIN-2018-03872), and Canada Research Chairs
Program (950-230623).
REFERENCES
Baldangombo, U., Jambaljav, N., and Horng, S.-J. (2013).
A static malware detection system using data mining
methods. arXiv preprint arXiv:1308.2831.
Bawa, M., Condie, T., and Ganesan, P. (2005). Lsh forest:
self-tuning indexes for similarity search. In Proceed-
ings of the 14th International Conference on World
Wide Web, pages 651–660. ACM.
Bayer, U., Moser, A., Kruegel, C., and Kirda, E. (2006).
Dynamic analysis of malicious code. Journal in Com-
puter Virology, 2(1):67–77.
Bishop, C. M. (2006). Pattern recognition and machine
learning. springer.
Cabaj, K., Gawkowski, P., Grochowski, K., Nowikowski,
A., and
˙
Zórawski, P. (2017). The impact of mal-
ware evolution on the analysis methods and infras-
tructure. In Proceedings of the Federated Conference
on Computer Science and Information Systems (Fed-
CSIS), pages 549–553. IEEE.
Cerna, A. E. U., Pattichis, M., VanMaanen, D. P., Jing, L.,
Patel, A. A., Stough, J. V., Haggerty, C. M., and Forn-
walt, B. K. (2019). Interpretable neural networks for
predicting mortality risk using multi-modal electronic
health records. arXiv preprint arXiv:1901.08125.
Chen, J., Alalfi, M. H., Dean, T. R., and Zou, Y. (2015). De-
tecting android malware using clone detection. Jour-
nal of Computer Science and Technology, 30(5):942–
956.
Cordy, J. R. and Roy, C. K. (2011). The nicad clone
detector. In Proceedings of the 19th IEEE In-
ternational Conference on Program Comprehension
(ICPC), pages 219–220. IEEE.
Dahl, G. E., Stokes, J. W., Deng, L., and Yu, D. (2013).
Large-scale malware classification using random pro-
jections and neural networks. In Acoustics, Speech
and Signal Processing (ICASSP), 2013 IEEE Interna-
tional Conference on, pages 3422–3426. IEEE.
Dai, J., Guha, R. K., and Lee, J. (2009). Efficient virus
detection using dynamic instruction sequences. JCP,
4(5):405–414.
Ding, S. H. H., Fung, B. C. M., and Charland, P. (2016).
Kam1n0: MapReduce-based assembly clone search
for reverse engineering. In Proceedings of the 22nd
ACM International Conference on Knowledge Dis-
covery and Data Mining (SIGKDD), pages 461–470.
ACM Press.
Ding, S. H. H., Fung, B. C. M., and Charland, P. (2019).
Asm2vec: Boosting static representation robustness
for binary clone search against code obfuscation and
compiler optimization. In Proceedings of the 40th
International Symposium on Security and Privacy
(S&P), pages 38–55. IEEE Computer Society.
Farhadi, M. R., Fung, B. C. M., Charland, P., and Deb-
babi, M. (2014). Binclone: Detecting code clones
in malware. In Proceedings of the 8th IEEE Interna-
tional Conference on Software Security and Reliabil-
ity (SERE), pages 78–87, San Francisco, CA. IEEE.
Farhadi, M. R., Fung, B. C. M., Fung, Y. B., Charland,
P., Preda, S., and Debbabi, M. (2015). Scalable code
clone search for malware analysis. Digital Investiga-
tion (DIIN): Special Issue on Big Data and Intelligent
Data Analysis, 15:46–60.
Fredrikson, M., Jha, S., Christodorescu, M., Sailer, R., and
Yan, X. (2010). Synthesizing near-optimal malware
specifications from suspicious behaviors. In Security
and Privacy (SP), 2010 IEEE Symposium on, pages
45–60. IEEE.
Han, K., Kang, B., and Im, E. G. (2014). Malware analysis
using visualized image matrices. The Scientific World
Journal, 2014.
Huang, W. and Stokes, J. W. (2016). Mtnet: a multi-task
neural network for dynamic malware classification. In
Proceedings of the International Conference on De-
tection of Intrusions and Malware, and Vulnerability
Assessment, pages 399–418. Springer.
Indyk, P. and Motwani, R. (1998). Approximate nearest
neighbors: towards removing the curse of dimension-
ality. In Proceedings of the thirtieth annual ACM
symposium on Theory of computing, pages 604–613.
ACM.
Jain, P., Kulis, B., and Grauman, K. (2008). Fast image
search for learned metrics. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 1–8. IEEE.
Kalash, M., Rochan, M., Mohammed, N., Bruce, N. D.,
Wang, Y., and Iqbal, F. (2018). Malware classification
with deep convolutional neural networks. In Proceed-
ings of the 9th IFIP International Conference on New
Technologies, Mobility and Security (NTMS), pages
1–5. IEEE.
Koga, H., Ishibashi, T., and Watanabe, T. (2007). Fast
agglomerative hierarchical clustering algorithm using
locality-sensitive hashing. Knowledge and Informa-
tion Systems, 12(1):25–53.
A Novel and Dedicated Machine Learning Model for Malware Classification
627