Malware Detection based on Graph Classification

Khanh-Huu-The Dam, Tayssir Touili

Abstract

Malware detection is nowadays a big challenge. The existing techniques for malware detection require a huge effort of engineering to manually extract the malicious behaviors. To avoid this tedious task of manually discovering malicious behaviors, we propose in this paper to apply learning for malware detection. Given a set of malwares and a set of benign programs, we show how learning techniques can be applied in order to detect malware. For that, we use abstract API graphs to represent programs. Abstract API graphs are graphs whose nodes are API functions and whose edges represent the order of execution of the different calls to the API functions (i.e., functions supported by the operating system). To learn malware, we apply well-known learning techniques based on Random Walk Graph Kernel (combined with Support Vector Machines). We can achieve a high detection rate with only few false alarms (98.93% for detection rate with 1.24% of false alarms). Moreover, we show that our techniques are able to detect several malwares that could not be detected by well-known and widely used antiviruses such as Avira, Kaspersky, Avast, Qihoo-360, McAfee, AVG, BitDefender, ESET-NOD32, F-Secure, Symantec or Panda.

References

  1. Anderson, B., Quist, D., Neil, J., Storlie, C., and Lane, T. (2011). Graph-based malware detection using dynamic analysis. Journal in Computer Virology, 7(4):247-258.
  2. Babic, D., Reynaud, D., and Song, D. (2011). Malware analysis with tree automata inference. CAV'11.
  3. Barla, A., Odone, F., and Verri, A. (2003). Histogram intersection kernel for image classification. InICIP 2003.
  4. Bergeron, J., Debbabi, M., Erhioui, M., and Ktari, B. (1999). Static analysis of binary code to isolate malicious behaviors. In WET ICE 7899.
  5. Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov., 2(2).
  6. Chang, C.-C. and Lin, C.-J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
  7. Christodorescu, M. and Jha, S. (2003). Static analysis of executables to detect malicious patterns. SSYM'03.
  8. Christodorescu, M., Jha, S., and Kruegel, C. (2007). Mining specifications of malicious behavior. ESEC-FSE 7807. ACM.
  9. Eagle, C. (2011). The IDA Pro Book. No Starch Press, 2nd edition.
  10. Elhadi, E., Maarof, M. A., and Barry, B. (2015). Improving the detection of malware behaviour using simplified data dependent api call graph.
  11. Fredrikson, M., Jha, S., Christodorescu, M., Sailer, R., and Yan, X. (2010). Synthesizing near-optimal malware specifications from suspicious behaviors. SP 7810.
  12. Gärtner, T., Flach, P., and Wrobel, S. (2003). On graph kernels: Hardness results and efficient alternatives. In Learning Theory and Kernel Machines.
  13. Gavrilut, D., Cimpoesu, M., Anton, D., and Ciortuz, L. (2009). Malware detection using perceptrons and support vector machines. In 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns. IEEE.
  14. Haussler, D. (1999). Convolution kernels on discrete structures.
  15. Khammas, B. M., Monemi, A., Bassi, J. S., Ismail, I., Nor, S. M., and Marsono, M. N. (2015). Feature selection and machine learning classification for malware detection. Jurnal Teknologi, 77.
  16. Kinable, J. and Kostakis, O. (2011). Malware classification based on call graph clustering. J. Comput. Virol., 7(4).
  17. Kinder, J., Katzenbeisser, S., Schallhart, C., and Veith, H. (2010). Proactive detection of computer worms using model checking. Dependable and Secure Computing, IEEE Transactions on, 7(4).
  18. Kinder, J. and Veith, H. (2008). Jakstab: A static analysis platform for binaries. In Gupta, A. and Malik, S., editors, Computer Aided Verification, volume 5123.
  19. Kolter, J. Z. and Maloof, M. A. (2004). Learning to detect malicious executables in the wild. KDD 7804.
  20. Kong, D. and Yan, G. (2013). Discriminant malware distance learning on structural information for automated malware classification. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining.
  21. Macedo, H. and Touili, T. (2013). Mining malware specifications through static reachability analysis. In ESORICS 2013.
  22. Maji, S., Berg, A., and Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. InCVPR 2008.
  23. Nguyen, M. H., Nguyen, T. B., Quan, T. T., and Ogawa, M. (2013). A hybrid approach for control flow graph construction from binary code. In APSEC 2013, volume 2.
  24. Nikolopoulos, S. D. and Polenakis, I. (2016). A graphbased model for malware detection and classification using system-call groups. Journal of Computer Virology and Hacking Techniques, pages 1-18.
  25. Ravi, C. and Manoharan, R. (2012). Malware detection using windows api sequence and machine learning. International Journal of Computer Applications, 43.
  26. Rieck, K., Holz, T., Willems, C., Dussel, P., and Laskov, P. (2008). Learning and classification of malware behavior. DIMVA 7808.
  27. Schultz, M., Eskin, E., Zadok, E., and Stolfo, S. (2001). Data mining methods for detection of new malicious executables. In S P 2001.
  28. Song, F. and Touili, T. (2013a). Ltl model-checking for malware detection. In Piterman, N. and Smolka, S., editors, Tools and Algorithms for the Construction and Analysis of Systems, volume 7795.
  29. Song, F. and Touili, T. (2013b). Pommade: Pushdown model-checking for malware detection. ESEC/FSE 2013.
  30. Tahan, G., Rokach, L., and Shahar, Y. (2012). Mal-id: Automatic malware detection using common segment analysis and meta-features. J. Mach. Learn. Res., 13(1).
  31. Vishwanathan, S. V. N., Schraudolph, N. N., Kondor, R., and Borgwardt, K. M. (2010). Graph kernels. J. Mach. Learn. Res., 11.
  32. Wagner, C., Wagener, G., State, R., and Engel, T. (2009). Malware analysis with graph kernels and support vector machines. In MALWARE 2009. IEEE.
  33. Xu, M., Wu, L., Qi, S., Xu, J., Zhang, H., Ren, Y., and Zheng, N. (2013). A similarity metric method of obfuscated malware using function-call graph. Journal of Computer Virology and Hacking Techniques, 9(1):35-47.
Download


Paper Citation


in Harvard Style

Dam K. and Touili T. (2017). Malware Detection based on Graph Classification . In Proceedings of the 3rd International Conference on Information Systems Security and Privacy - Volume 1: ICISSP, ISBN 978-989-758-209-7, pages 455-463. DOI: 10.5220/0006209504550463


in Bibtex Style

@conference{icissp17,
author={Khanh-Huu-The Dam and Tayssir Touili},
title={Malware Detection based on Graph Classification},
booktitle={Proceedings of the 3rd International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,},
year={2017},
pages={455-463},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006209504550463},
isbn={978-989-758-209-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,
TI - Malware Detection based on Graph Classification
SN - 978-989-758-209-7
AU - Dam K.
AU - Touili T.
PY - 2017
SP - 455
EP - 463
DO - 10.5220/0006209504550463