Behavior-based Malware Analysis using Profile Hidden Markov Models

Saradha Ravi, N. Balakrishnan, Bharath Venkatesh

Abstract

In the area of malware analysis, static binary analysis techniques are becoming increasingly difficult with the code obfuscation methods and code packing employed when writing the malware. The behavior-based analysis techniques are being used in large malware analysis systems because of this reason. In these dynamic analysis systems, the malware samples are executed and monitored in a controlled environment using tools such as CWSandbox(Willems et al., 2007). In previous works, a number of clustering and classification techniques from machine learning and data mining have been used to classify the malwares into families and to identify even new malware families, from the behavior reports. In our work, we propose to use the Profile Hidden Markov Model to classify the malware files into families or groups based on their behavior on the host system. PHMM has been used extensively in the area of bioinformatics to search for similar protein and DNA sequences in a large database. We see that using this particular model will help us overcome the hurdle posed by polymorphism that is common in malware today. We show that the classification accuracy is high and comparable with the state-of-art-methods, even when using very few training samples for building models. The experiments were on a dataset with 24 families initially, and later using a larger dataset with close to 400 different families of malware. A fast clustering method to group malware with similar behaviour following the scoring on the PHMMprofile database was used for the large dataset. We have presented the challenges in the evaluation methods and metrics of clustering on large number of malware files and show the effectiveness of using profile hidden model models for known malware families.

References

  1. Apel, M., Bockermann, C., and Meier, M. (2009). Measuring similarity of malware behavior. In Local Computer Networks, 2009. LCN 2009. IEEE 34th Conference on, pages 891-898. IEEE.
  2. Attaluri, S., McGhee, S., and Stamp, M. (2009). Profile hidden markov models and metamorphic virus detection. Journal in computer virology, 5(2):151-169.
  3. Bailey, M., Oberheide, J., Andersen, J., Mao, Z., Jahanian, F., and Nazario, J. (2007). Automated classification and analysis of internet malware. In Recent Advances in Intrusion Detection, pages 178-197. Springer.
  4. Bayer, U., Comparetti, P. M., Hlauschek, C., Kruegel, C., and Kirda, E. (2009). Scalable, behavior-based malware clustering. In Network and Distributed System Security Symposium (NDSS). Citeseer.
  5. Bayer, U., Kirda, E., and Kruegel, C. (2010). Improving the efficiency of dynamic malware analysis. In Proceedings of the 2010 ACM Symposium on Applied Computing, pages 1871-1878. ACM.
  6. Durbin, R., Eddy, S. R., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.
  7. Eddy, S. (2003). Hmmer: profile hmms for protein sequence analysis. http://hmmer.janelia.org/.
  8. Eddy, S. R. (1998). Profile hidden markov models. Bioinformatics, 14(9):755-763.
  9. Edgar, R. C. (2004). Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research, 32(5):1792-1797.
  10. Lee, T. and Mody, J. J. (2006). Behavioral classification. In EICAR Conference.
  11. Li, P., Liu, L., Gao, D., and Reiter, M. (2010). On challenges in evaluating malware clustering. In Recent Advances in Intrusion Detection, pages 238-255. Springer.
  12. Moser, A., Kruegel, C., and Kirda, E. (2007). Limits of static analysis for malware detection. In Computer Security Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual, pages 421-430. IEEE.
  13. Needleman, S. B., Wunsch, C. D., et al. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 48(3):443-453.
  14. Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286.
  15. (2008). Learning and classification of malware behavior. Detection of Intrusions and Malware, and Vulnerability Assessment, pages 108-125.
  16. Rieck, K., Trinius, P., Willems, C., and Holz, T. (2011). Automatic analysis of malware behavior using machine learning. Journal of Computer Security, 19(4):639- 668.
  17. Trinius, P. (2009). Malheur Dataset. http://pi1.informatik. uni-mannheim.de/ malheur/ #dldata.
  18. Trinius, P., Willems, C., Holz, T., and Rieck, K. (2010). A malware instruction set for behavior-based analysis. In Proceedings of 5th GI Conference Sicherheit, Schutz und Zuverl assigkeit, Berlin, Germany.
  19. Wagener, G., State, R., and Dulaunoy, A. (2008). Malware behaviour analysis. Journal in computer virology, 4(4):279-287.
  20. Willems, C., Holz, T., and Freiling, F. (2007). Toward automated dynamic malware analysis using cwsandbox. Security & Privacy, IEEE, 5(2):32-39.
  21. Yadwadkar, N. J., Bhattacharyya, C., Gopinath, K., Niranjan, T., and Susarla, S. (2010). Discovery of application workloads from network file traces. In Proceedings of the 8th USENIX conference on File and storage technologies, pages 14-14. USENIX Association.
Download


Paper Citation


in Harvard Style

Ravi S., Balakrishnan N. and Venkatesh B. (2013). Behavior-based Malware Analysis using Profile Hidden Markov Models . In Proceedings of the 10th International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2013) ISBN 978-989-8565-73-0, pages 195-206. DOI: 10.5220/0004528201950206


in Bibtex Style

@conference{secrypt13,
author={Saradha Ravi and N. Balakrishnan and Bharath Venkatesh},
title={Behavior-based Malware Analysis using Profile Hidden Markov Models},
booktitle={Proceedings of the 10th International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2013)},
year={2013},
pages={195-206},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004528201950206},
isbn={978-989-8565-73-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2013)
TI - Behavior-based Malware Analysis using Profile Hidden Markov Models
SN - 978-989-8565-73-0
AU - Ravi S.
AU - Balakrishnan N.
AU - Venkatesh B.
PY - 2013
SP - 195
EP - 206
DO - 10.5220/0004528201950206