N-GRAMS-BASED FILE SIGNATURES FOR MALWARE DETECTION

Igor Santos, Yoseba K. Penya, Jaime Devesa, Pablo G. Bringas

Abstract

Malware is any malicious code that has the potential to harm any computer or network. The amount of malware is increasing faster every year and poses a serious security threat. Thus, malware detection is a critical topic in computer security. Currently, signature-based detection is the most extended method for detecting malware. Although this method is still used on most popular commercial computer antivirus software, it can only achieve detection once the virus has already caused damage and it is registered. Therefore, it fails to detect new malware. Applying a methodology proven successful in similar problem-domains, we propose the use of n-grams (every substring of a larger string, of a fixed lenght \textit{n}) as file signatures in order to detect unknown malware whilst keeping low false positive ratio. We show that n-grams signatures provide an effective way to detect unknown malware.

References

  1. Abou-Assaleh, T., Cercone, N., Keselj, V., and Sweidan, R. (2004). N-gram-based detection of new malicious code. In COMPSAC Workshops, pages 41-42.
  2. Fix, E. and Hodges, J. L. (1952). Discriminatory analysis: Nonparametric discrimination: Small sample performance. Technical Report Project 21-49-004, Report Number 11.
  3. Jacob, A. and Gokhale, M. (2007). Language classification using n-grams accelerated by fpga-based bloom filters. In HPRCTA 7807: Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications, pages 31-37, New York, NY, USA. ACM.
  4. Kaspersky (2008). Kaspersky security bulletin 2008: Malware evolution january - june 2008.
  5. Kephart, J. O. (1994). A biologically inspired immune system for computers. In In Artificial Life IV: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pages 130- 139. MIT Press.
  6. Morley, P. (2001). Processing virus collections. In Proceedings of the 2001 Virus Bulletin Conference (VB2001), pages 129-134. Virus Bulletin.
  7. Nachenberg, C. (1997). Computer virus-antivirus coevolution. Commun. ACM, 40(1):46-51.
  8. Schultz, M. G., Eskin, E., Zadok, E., and Stolfo, S. J. (2001). Data mining methods for detection of new malicious executables. In SP 7801: Proceedings of the 2001 IEEE Symposium on Security and Privacy, page 38, Washington, DC, USA. IEEE Computer Society.
  9. Zhou, S. and Guan, J. (2002). Chinese documents classification based on n-grams. In CICLing 7802: Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, pages 405-414, London, UK. Springer-Verlag.
Download


Paper Citation


in Harvard Style

Santos I., Penya Y., Devesa J. and Bringas P. (2009). N-GRAMS-BASED FILE SIGNATURES FOR MALWARE DETECTION . In Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8111-85-2, pages 317-320. DOI: 10.5220/0001863603170320


in Bibtex Style

@conference{iceis09,
author={Igor Santos and Yoseba K. Penya and Jaime Devesa and Pablo G. Bringas},
title={N-GRAMS-BASED FILE SIGNATURES FOR MALWARE DETECTION},
booktitle={Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2009},
pages={317-320},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001863603170320},
isbn={978-989-8111-85-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - N-GRAMS-BASED FILE SIGNATURES FOR MALWARE DETECTION
SN - 978-989-8111-85-2
AU - Santos I.
AU - Penya Y.
AU - Devesa J.
AU - Bringas P.
PY - 2009
SP - 317
EP - 320
DO - 10.5220/0001863603170320