Table 3: Association rules in unified format.
# Rule Confidence
1 256, 2147483904, 4, 2147483652, 2147484160, 256, 2147483904, 4, 2147483652 1
2 256, 2147483904, 4, 2147483652, 2147484160, 256, 2147483904, 4, 2147483652 0.831050228
3 2147483652, 2147484160, 256, 2147483904, 4, 2147483652, 2147484160, 256, 2147483904 0.965714286
4 2147483904, 4, 2147483652, 2147484160, 256, 2147483904, 4, 2147483652, 2147484160 1
5 33026, 2147516674, 4096, 256, 258, 32768, 2147516416, 8192, 2147491840 1
6 4096, 8192, 2147491840, 4096, 8192, 2147491840, 4096, 8192, 2147491840 0.886956522
7 4, 2147483652, 2147484160, 256, 2147483904, 4, 2147483652, 2147484160, 256 0.945945946
8 258, 32768, 2147516416, 8192, 2147491840, 33026, 2147516674, 4096, 256 0.843137255
9 32768, 2147516416, 8192, 2147491840, 33026, 2147516674, 4096, 256, 258 0.931818182
10 2147484160, 256, 2147483904, 4, 2147483652, 2147484160, 256, 2147483904, 4 0.982248521
11 8192, 2147491840, 4096, 8192, 2147491840, 4096, 8192, 2147491840, 4096 1
12 256, 258, 32768, 2147516416, 8192, 2147491840, 33026, 2147516674, 4096 1
13 49152, 2147532800, 8192, 2147491840, 256, 258, 33026, 2147516674, 4096 1
14 2147491840, 4096, 8192, 2147491840, 4096, 8192, 2147491840, 4096, 8192 0.982142857
a histogram of the affected file types (Table 4). The
second and third most frequent file types are WN-
CRYT and WNCRY. These file types represent the
temporary storage and the final encrypted container
generated by the WannaCry ransomware accordingly
(Team, 2017). As for the TMP files, we suppose that
those are also temporary files generated by the mal-
ware since they were created in the infected direc-
tories (as indicated by the Parent File Reference en-
try in the record) and the timestamps match the time-
frame of the attack. The rest of the files comprise less
than 9% of the total detected records that were false-
positively identified. Having this information we may
conclude that the rules correctly detect the anomalies
caused in the file system by malicious activity. To get
the accuracy of the identification, we took all of the
unique file entries that were affected by the attack and
compared them with the ones detected by the rules:
out of 235 affected files we detected 206 which makes
an 87.7% accuracy.
Table 4: Detected file types histogram.
File Type Number of Hits
tmp 1020
wncryt 710
wncry 411
png 101
txt 31
db 24
docx 18
zip 12
js 6
vbs 5
gif 3
lnk 1
If we look closer at the 14 mined rules we can
identify that some of them are just shifted versions of
others. For example, rules 1, 2, 3, 4, 7, and 10. This
behavior was expected since the contiguous repetitive
patterns in the USN Journal can be grabbed by the
algorithm from different starting points. This leaves
us with 4 groups representing the unique rules: (1,
2, 3, 4, 7, 10), (6, 11, 14), (5, 8, 9, 12), and (13).
Only one rule number 13 does not have a shifted ver-
sion of itself. We extracted individual outputs of sin-
gle rules from the identified groups. A comparison
of the outputs showed little to no difference in the
identified records. Thus we end up with only 4 dis-
tinct rules for malicious behavior detection. Another
aspect noted is the repetitiveness of the pattern in the
mined rules. For example, rule number 6 [4096, 8192,
2147491840, 4096, 8192, 2147491840, 4096, 8192,
2147491840] is a repetition of the same 3-value pat-
tern [4096, 8192, 2147491840] three times. It is a
part of future work to address both the elimination of
shifted rule versions and the shortening of repeated
patterns.
Machine learning methods can be considered a
significant alternative to the proposed method. How-
ever, there are some obstacles to applying them in
this context. It is easy for a forensic expert to cre-
ate a snapshot with a benign file system. The target
snapshot which constitutes the subject of investiga-
tion usually contains benign and malicious files which
are blended into one file system. Supervised learn-
ing models require file-level labels to provide scrutiny
about each file, which is very hard to achieve in dig-
ital forensics tasks due to the high cost of labeling.
One-class learning models, which may just learn from
the files in the benign snapshot, cannot use the tar-
get snapshot while inducing the models, limiting the
knowledge that can be obtained from both snapshots.
Unsupervised methods (e.g., clustering) that do not
use any labeled data may give some intuition to the
expert but they do not provide explicit rules. More
importantly, machine learning models do not provide
human-readable rules, which limits their applicabil-
ity in this context enormously. Even the explainable
methods such as decision trees may require additional
steps to generate rules and strict pruning strategies
should be applied to achieve comprehendible rule sets
at expense of detection loss.
Anomalous File System Activity Detection Through Temporal Association Rule Mining
739