completely "measure" different types of domain
names generated by the same model.
The results of the above experiments show that the
detection effect of CNN and Bi-LSTM network is
more prominent, while the effect of FFN is relatively
poor. Comparing the Bi-LSTM network with the FFN,
the Bi-LSTM can synthesize the forward and
backward information, and it solves the long-distance
dependence problem in the ordinary RNN network
and can memorize historical information. The FFN
cannot learn the surrounding "context" because the
data fed to the FFN is not related to the previous data,
and it cannot remember the previous context
information. Therefore, the features captured by the
LSTM network are more detailed.
The effect of CNN is even better. After the domain
name is vectorized, a domain name can be regarded
as a vector matrix, which is very similar to the image
processing of CNN. Every time in convolution,
CNN processes the data in whole line. It's like an
n-gram model. If you process every two rows, it will
be a 2-gram model. At the same time, since multiple
convolution kernels can be set in the model to capture
different features, it has stronger feature extraction
capability than FFN.
The results prove that the effect of using CNN on
PCFG-based domain name detection is slightly better
than that of LSTM. In the short-sentence task, CNN
has an overall ability to summarize the overall
structure of the sentence because of its convolution
function; but in the long-sentence task, CNN can only
process the information in its window. The
information of adjacent windows can only be
supported by the latter convolutional layer, which
depends heavily on the parameters of the convolution
window and the length of the movement. The domain
name of this mission is not particularly long, so using
the CNN model is a suitable solution.
6.2 Multi-Head Attention Layer Analysis
The MultiHead-Deep model not only uses the neural
network as a way of feature extraction, but also adopts
Multi-Head Attention layer, which is also the key to
improving the detection effect. Multi-Head Attention
employs self-attention mechanism. Its advantage is
that it can capture the global connection in one step
and completely solve the long-distance dependence
problem. In addition, Multi-Head computing can be
considered as learning in a number of different
subspaces, integrating information in different
subspaces to capture the features of each location as
much as possible.
Figure 5 shows the accuracy curve of the
MultiHead-CNN model. Under different PCFG
models, the accuracy convergence value of
MultiHead-CNN is different. This is because that the
detection complexity of domain names generated by
different models is different.
Figure 5 also shows that the MultiHead-CNN
model converges faster, and MultiHead-CNN
converges before the fifth iteration in different models.
It is proved that the model extraction feature is
excellent from the other side.
7 CONCLUSIONS
This paper proposes a model for detecting PCFG-
based domain names using neural networks and
Multi-Head Attention mechanism. Experiments show
that the MultiHead-Deep model is better than
traditional detection methods in detecting such DGA
domain names. This model takes advantage of the
neural network that does not require manual capture
of features and the intrinsic link of domain names. It
uses the Multi-Head Attention mechanism to capture
the overall features more deeply. Different PCFG
models have different detection rates, which also
indicates that the PCFG mode can be extended
according to the rules established by the user. In the
experiment, MultiHead-Deep has shown decent
results on different PCFG models, which proves the
effectiveness of the model in PCFG-based domain
name detection
REFERENCES
Plohmann D, Yakdan K, Klatt M, A comprehensive
measurement study of domain generating malware[C]//
25th USENIX Security Symposium. Austin: Usenix,
2016:263-278.
S. Yadav, A. K. K. Reddy, A. L. N. Reddy, and S. Ranjan,
“Detecting algorithmically generated malicious
domain names,” in Proc. 10th ACM SIGCOMM Conf.
Internet Meas., 2010, pp. 48–61.