Shift, and VBGMM. The ultimate goal of this re-
search is to implement and evaluate a DoS/DDoS at-
tack detection system using ensemble learning based
on various clustering methods in a real environment.
2 RELATED WORKS
DBSCAN, the clustering method used in this study,
has been used for clustering various types of data, and
Zhang (Yan, 2022) discusses the possibility of realiz-
ing network security using DBSCAN. However, the
target dataset is not clear. Our work differs in that
we deal with quantitative data of real-world DNS traf-
fic; Sabottke et al.(Sabottke et al., 2019) discuss how
DBSCAN can be used to detect a wide range of cy-
bercrimes. In this study, we partially leverage their
results to discover changes in the aspect of clusters
generated based on time series for the purpose of DNS
attack detection. Najafimehr et al.(Najafimehr et al.,
2022) proposed hybrid machine learning for DDoS
detection and verified its effectiveness. The differ-
ence is that these results are based on dataset analysis,
while this study uses actual network traffic for verifi-
cation. Rongfeng et al. (Zheng et al., 2020) analyzed
benign traffic using DBSCAN. In this study, the target
is limited to DNS traffic and time-series data is con-
verted to quantitative data and processed to discover
stationary-state and non-stationary-state changes. Yu
et al.(Yu et al., 2015) and An et al.(An et al., 2014) re-
searched using Mean Shift to track image changes as
sequences. In this study, we apply this result and use
Mean Shift to detect changes in the shape of clusters
in the correlation diagram of traffic data.
3 PROPOSED METHODOLOGY
In this study, we investigated 17 types of quantita-
tive DNS traffic data over 15 months. The cluster-
ing shapes in the correlation diagrams were investi-
gated and four types of communication were focused
on. This section describes the DNS communication
data used in the analysis and the proposed clustering
method.
3.1 DNS Communication Data Used in
Analysis and Labeling
In this study, we focused on DNS server data.
There are more than 200 types of data avail-
able(DOCUMENTATION, 2023b). We focused on
17 types of data and obtained correlations (Table.1).
Especially, from the 17 types of data, we focused on
4 types of DNS traffic data (⃝ in the leftmost col-
umn of Table.1). The 4 types of traffic data were
selected because the shape of the clusters changed
in the time-series-generated correlation charts. A,
MX, SOA Record, and AD Flag traffic are aggre-
gated on a DNS cache server using the Type shown
in Table.1. The cache server used in this study is
unbound(DOCUMENTATION, 2023a). The data is
quantitative traffic data, aggregated hourly using the
statistics function of unbound. Here, quantitative data
means that each traffic data is cumulative data. For
example, in the case of A Record data, it is the ac-
cumulated number of times The DNS cache server
processed a Record traffic. Such data were collected
hourly for about one and a half years.
The correlation graphs in Figures 1 and 2 were
created from 15 months of data with a one-week
window. For example, for the correlation graph
for March 16, 2022, the data from March 10 to
March 16, 2022, were aggregated. A total of
485 correlation graphs were created. The shapes
of the num.query.type.A and num.query.flags.AD
clusters were checked. num.query.type.A and
num.query.flags.AD clusters form zonal clusters in
the stationary state, as shown in Figure 1. In contrast,
they form clumped clusters in the non-stationary state,
as shown in Figure 2. The cluster formation state was
checked and labeled as 0 for the stationary state and 1
for the non-stationary state.
3.2 Stationary State and
Non-Stationary State Taxonomy
For the quantitative data of DNS traffic analyzed in
this study, it is difficult to detect changes from corre-
lations. As mentioned, we calculated the correlation
of the time series quantitative data with a one-week
window. In the stationary state, banded clusters are
obtained, as shown in the left plot of Figure 3. In the
non-stationary state, clumped clusters are obtained,
as shown in the right plot of Figure 3. In this study,
we hypothesized that unsupervised machine learning
could calculate the number of clusters to discover the
change in aspect. For this task, it is necessary to make
judgments based on cluster shape. Here, we attempt
to detect cluster shape change based on the change in
the number of clusters.
Since the number of clusters to be formed is un-
known for the present data, clustering using unsuper-
vised machine learning is required. In addition, it is
necessary to adjust the hyperparameters of each clus-
tering method to achieve good clustering. DBSCAN,
Mean Shift, and VBGMM were selected as the clus-
DATA 2023 - 12th International Conference on Data Science, Technology and Applications
320