5 CONCLUSIONS
In this work, the problem of detecting new and
unknown malware is addressed. Present day
technologies and our approach for the detection of
malware are discussed. An isolated environment was
set up for the process of reverse engineering and
each executable was reversed rigorously to find its
properties and behavior. On the data extracted from
the reversing process, different data mining
techniques were used to procure patterns of
malicious executables and thereby classification
models were generated. To test the models, new
executables were supplied from the wild with the
same set of features. The results thus obtained
proved to be satisfactory.
From analyzing the experimental results, we can
conclude that finding static and behavioral features
of each malware through reverse engineering and
applying data mining techniques to the data helps in
detecting new generation malware. Considering the
rapidly increasing amount of malware appearing
each day, this method of detection can be used along
with current practice detection techniques.
We have reversed each strain of malware and
benign executables to extract all the features we
could with the help of the tools used by the
computer security profession. However, we were not
able to analyze the process address space of the
executables in the physical memory as the memory
analysis tools were released after we completed the
reversing step. Analyzing the address space would
reveal more interesting information about the
processes and thereby analyzing their behavior more
accurately.
Reversing each malware manually is a time
consuming process and requires much effort with the
thousands of new malware being generated. One
way to cope up with this problem is to automate the
whole reverse engineering process. Although there
are some tools for automated reverse engineering,
they do not record the full details of malware. A
more specific tool that does rigorous reversing
would help in combating large amounts of malware.
We consider these two tasks as the future work that
aid in detecting new malware more efficiently.
REFERENCES
Ahmed, F., Hameed, H., Shafiq, M. Z. and Farooq, M.,
2009. Using spatio-temporal information in API calls
with machine learning algorithms for malware
detection. In AISec ’09: Proceedings of the 2nd
ACMworkshop on Security and artificial intelligence,
pages 55–62, New York, NY, USA, 2009. ACM.
Burji, S., Liszka, K. J., and Chan, C.-C., 2010. Malware
Analysis Using Reverse Engineering and Data Mining
Tools. The 2010 International Conference on System
Science and Engineering (ICSSE 2010), July 2010, pp.
619-624.
Chan, C.-C. and Santhosh, S., 2003. BLEM2: Leaming
Bayes' rules from examples using rough sets. Proc.
NAFIPS 2003, 22nd Int. Conf. of the North American
Fuzzy Information Processing Society, July 24 - 26,
2003, Chicago, Illinois, pp. 187-190.
Christodorescu, M., Jha, S. and Kruegel, C., 2007. Mining
specifications of malicious behaviour. Proc. ESEC/FS
2007, pp. 5–14.
Cohen, F., 1985. Computer Viruses. PhD thesis,
University of Southern California.
Cohen, W., 1996. Learning Trees and Rules with Set-
Valued Features. American Association for Artificial
Intelligence (AMI), 1996.
Islam, R., Tian, R., Batten, L. and Versteeg, S.C., 2010.
Classification of Malware Based on String and
Function Feature Selection. 2010 Second Cybercrime
and Trustworthy Computing Workshop, Ballarat,
Victoria Australia., July 19-July 20, ISBN: 978-0-
7695-4186-0.
Kang, M. G., Poosankam, P. and Yin, H., 2007. Renovo:
A hidden code extractor for packed executables. In
Proc. Fifth ACM Workshop on Recurring Malcode
(WORM 2007), November 2007.
Kolter, J. and Maloof, M., 2004. Learning to detect
malicious executables in the wild. Proc. KDD-2004,
pp. 470–478.
Komashinskiy, D. and Kotenko, I. V., 2010. Malware
Detection by Data Mining Techniques Based on
Positionally Dependent Features. PDP '10
Proceedings of the 2010 18th Euromicro Conference
on Parallel, Distributed and Network-based
Processing., IEEE Computer Society Washington,
DC, USA
©
2010. ISBN: 978-0-7695-3939-3
Mcafee.com, 2010a. Retrieved from: http://www.mcafee.
com/us/resources/reports/rp-quarterly-threat-q3-
2010.pdf
Mcafee.com, 2010b. Retrieved from: http://www.
mcafee.com/ us/ resources/reports/rp-good-decade-for-
cybercrime.pdf
Messagelabs.com, 2011. Retrieved from: http://www.
messagelabs.com/mlireport/MLI_2011_01_January_Fi
nal_en-us.pdf
Miller, P., 2000. Hexdump. Online publication, 2000
http://www.pcug.org.au/ millerp/hexdump.html
Rozinov, K., 2005. Reverse Code Engineering: An In-
Depth Analysis of the Bagle Virus. Information
Assurance Workshop, 2005. IAW '05. Proceedings
from the Sixth Annual IEEE SMC, 15-17 June 2005,
pp. 380 – 387.
Schultz, M. G., Eskin, E., Zadok, E. and Stolfo, S. J.,
2001. Data Mining Methods for Detection of New
Malicious Executables. In Proceedings of the 2001
IEEE Symposium on Security and Privacy, IEEE
KDIR 2011 - International Conference on Knowledge Discovery and Information Retrieval
82