API and other features (Xiang et al 2018). Mohaisen
et al. extracted the dynamic detection features of
malware during its execution, and used K-Nearest
Neighbor (KNN), Support Vector Machine (SVM),
decision tree and other classification algorithms to
build the final detection model (Mohaisen et al 2015).
Xin Wang et al. proposed a detection model that
combines Recurrent Neural Network (RNN) networks
and autoencoders (Wang et al 2016). Mehadi Hassen
et al. used the supervised learning algorithm to learn
low-dimensional features, and finally used the
anomaly detection method to detect malware (Hassen
et al 2018). S. Pai et al. used the expectation
maximization algorithm in the cluster detection
analysis of malware (Pai et al 2017).
Current malicious software detection primarily
relies on two main methods: static analysis and
dynamic analysis. Most of these methods use either a
single machine learning algorithm or a combination of
multiple learning algorithms to construct
classification detection models. Static analysis is
efficient but may have lower accuracy, while dynamic
analysis provides high accuracy but may be less
efficient. Therefore, relying solely on static or
dynamic analysis alone may not simultaneously meet
the dual requirements of high efficiency and high
accuracy. Hence, this paper introduces a malicious
software detection technique based on deep learning
algorithms. The fusion of dynamic and static detection
techniques ensures that the final detection process is
both efficient and accurate.
2 TRADITIONAL MALWARE
DETECTION METHODS
2.1 Static Analysis
Static analysis is the initial phase of the malicious
software analysis process, primarily focused on
examining executable files without delving into
specific instructions. Basic static analysis can
determine the presence of malicious characteristics in
a file, offer insights into its expected functionality, or
generate fundamental network feature identifiers.
However, static analysis has its limitations when
dealing with complex malicious software and may
occasionally overlook critical malicious behaviours.
2.1.1 Message Digest Algorithm 5
Message Digest Algorithm 5 (MD5) is a commonly
used technique for identifying malicious software.
The MD5 method involves subjecting malicious
software to a hash function, resulting in a unique hash
value generated for each malicious software instance.
In the field of deep learning, feature extraction hashing
is a commonly employed algorithm that can map data
of varying sizes into standardized fixed-size
representations.
2.1.2 PEiD Detection
PEiD is a common way to detect wrapped files, and is
often used to detect files generated by a packer or
compiler. Because malware is often packaged or
obfuscated, the malicious files it generates are more
difficult to detect, which can seriously hinder the
analysis of malware. PEiD also has a security risk in
its work, because its plug-ins tend to automatically run
malicious executables, so it needs to create a safe
environment for malicious operation and analysis.
2.1.3 Executable File Format Analysis
PE file format is a type of data structure, and almost
all executable code files loaded in Windows systems
are PE file formats. The PE file starts with the header
and includes information such as code, application
type, library functions, etc. The information in the
header of the file is valuable to malware analysts.
2.1.4 Interactive Disassembly Expert (IDA
Pro)
As an advanced static analysis method, IDA Pro is
also the preferred disassembly tool for most malware
analysts and vulnerability analysts (Raff et al 2017).
Strings are the starting point for malware static
analysis, using their cross-reference feature to see
exactly where and how strings are used in code, and
disassembler provides a snapshot of the program
before the first instruction is executed.
2.2 Dynamic Analysis
Dynamic analysis is the second phase in the process of
analysing malicious software. It is typically employed
when basic static analysis fails to yield definitive
results. Dynamic analysis involves monitoring the
behaviour of malicious software while it actively runs
or examining system changes after the execution of
malicious software. Unlike static analysis, dynamic
analysis provides a deeper understanding of the actual
functionality and internal workings of malicious
software. It has been proven to be an effective method
for identifying malicious software.
The Investigation of Malware Detection Model Construction Based on Deep Learning Algorithms
333