now reached a point of systematic advancement and
are large-scale enough to hinder darknet analysis. It
is therefore vital to clarify the nature of investigative
scanners and reduce false positives by identifying the
causes of alerts.
To this end, conducting long-term, sequential anal-
ysis of temporal changes in analysis targets may be
more beneficial than one-off analysis in terms of iden-
tifying and understanding the objectives of the target
scanner group’s activities. In this study, we present a
unique approach for tracing the activities of investiga-
tive and offensive scanner groups and capturing the
actual status of constant scanner groups. Specifically,
we successively perform non-negative matrix factor-
ization (NMF) (Lee and Seung, 2000) for a short-term
period over time-series data while shifting the data
little by little. We restrict the decomposition results
to coinciding with overlapping intervals of preceding
and following time-series data; it should be possible
to trace analysis targets sequentially over long-term
time-series data.
There are three key advantages to our approach:
• Since the NMF is performed sequentially over a
long-term period while inheriting the decomposi-
tion results, bases are uniquely fixed with respect
to the overlap period and do not change. A ‘ba-
sis’ here refers to a group of scanners that exhibit
similar temporal changes, and the number of bases
is a hyperparameter in the NMF. Therefore, trac-
ing can be flexible even if there are changes to the
scanner specifications (scanner IP addresses, scan
frequency, etc.). This tracing flexibility is nearly
impossible to achieve with other methods and thus
forms the most important element of our approach.
• Even if analysis targets are not specified in ad-
vance, the NMF decomposes scanner groups with
similar (synchronous) temporal patterns, which
enables us to trace scanner groups that behave sim-
ilarly. Of course, tracing a given set of targets in
advance is also possible.
• Since the NMF is relatively computationally in-
expensive, real-time tracing is possible for large-
scale darknet data.
In this work, we present the details of the proposed
method, discuss our prototype implementation, and
report the results of experiments on real darknet traffic
data to evaluate the feasibility of tracing. Our findings
demonstrate that the proposed method requires less
processing time than the original NMF and has fewer
deviations of decomposition results, and that judgment
of tracing success or failure is feasible.
2 TRACING TARGET
The ultimate goal of tracing in this study is to auto-
matically identify scanner groups that behave similarly
over the long-term and investigate their activities. For
this purpose, we need a flexible tracing method that
can trace scanners even when their specifications (e.g.,
IP addresses or scanning frequency) change. Since
there are various potential tracing targets, it is difficult
to define them, but here we describe a few specific
ones.
Hosts infected with the same worm-type malware
execute scans with similar temporal patterns over a
long-term period. We want to identify and trace
such similar infected hosts as a group campaign. In
addition to worms, we analyze systemized scanners
from Internet-wide scanning service providers such
as Shodan, Censys (Durumeric et al., 2015), Rapid7,
Onyphe, Shadowserver, and BinaryEdge3 , as well as
those from research institutions such as the University
of Michigan4 . Even if multiple scanner groups use
well-known scanning tools such as ZMap (Durumeric
et al., 2013), Masscan, or Nmap, we want to identify
and trace each as a distinct group rather than tracing
the scanning tools.
There are considered to be advanced scanning
tools/programs among the scanners. For example,
there are a random scan and a stealth scan. Are they
tracing targets in this study? First, the random scan
performs reconnaissance on random destination IP ad-
dresses. We have a worldwide network of darknet
observation nodes and monitor the large-scale dark-
net. Fast random scans show similar spatiotemporal
properties within a certain period in our large-scale
darknet. Thus, random scans fall within the scope of
our analysis target.
Next, the stealth scan performs slow and sporadic
reconnaissance to conceal its scanning activity. In
this case, stealth scan hosts may not scan with simi-
lar temporal patterns. However, their slowness makes
them unsuitable for malicious scanning activities that
require rapid scan execution (e.g., spreading malware
infections or probing for vulnerable devices). There-
fore, stealth scanners with malicious intent are consid-
ered to be scarce. On the other hand, stealth scanners
with benign intentions are considered small in scale
and do not cause problems in cyberspace. Although
stealth scanners are outside the scope of our analysis
target this time, groups of stealth scanners that do not
3From Shodan to BinaryEdge in order: https:
//www.shodan.io/, https://censys.io/, https://www.rapid7.
com/, https://www.onyphe.io/, https://www.shadowserver.
org/, and https://www.binaryedge.io/.
4https://cse.engin.umich.edu/about/resources/
connection-attempts/.
ICISSP 2023 - 9th International Conference on Information Systems Security and Privacy
618