usage of binaries. From sensitive locations
such as “/etc/passwd” to block devices such
as “/dev/tcp”, paths are a strong indicator that
something less than honest is happening on a box.
Of course, there are many paths that can be lever-
aged on a system, but thankfully, most of the paths
used in LotL attacks belong to a finite set (see Ta-
ble 1 for a complete set of paths we are using in
the feature extraction process);
(d) Networking. Communication to other hosts rep-
resents an notable use-case for LotL attacks.
While the exact IP address, range or ASN is a
strong indicator of compromise, it is a highly
volatile information and requires a reliable Threat
Intelligence (TI) source of information. Also, it
becomes obsolete very quickly. Instead, our net-
working features provide macro-level information
about the nature of communication (internal, ex-
ternal, loop-back or localhost references) trading
specificity and accuracy for stability over time and
TI independence;
(e) Well-known LotL Patterns. Features based on
known LotL patterns are regular expressions that
look for what can be considered as important sig-
natures that disambiguate the usage of a tool be-
tween legitimate and illegitimate. The number of
rules is significant and we are unable to include
them in the paper. However, if one wants to gain
access to them, they are available in the constants
file from our public repository.
For every raw command-line, we perform the fea-
ture extraction process and enrich the examples with
tags that are later used in the classification process.
We don’t rely on any text-based features in our pro-
cess (for instance n-grams) in order to reduce data
sparsity, avoid classifier over-fitting and gain better
generalization on previously unseen examples.
Note 1. The tags or features are discreete values
formed by concatenating a “class” prefix with the
command, parameter, path, networking or pattern at-
tribute that we detect. For example, if a command line
contains “netcat”, “bash”, “/var/tmp” and con-
nects to a public IP address, the extracted features
are: COMMAND_NC, COMMAND_BASH, PATH_/VAR/TMP
and IP_PUBLIC. While we could skip this step and
move directly to a ML-friendly representation (n-hot
encoding or embedding), we prefer to explicitly do
this in our tool, and present the tags along side the
classification, because this increases visibility into
the dataset and makes it easier for an analyst to un-
derstand why a command-line has been classified as
LotL.
Note 2. Sometimes, connection information is not
present in the command line itself. However, most
implementations for collecting system logs rely on
Endpoint Detection and Response (EDR) solutions,
that enrich data with inbound/outbound connections.
In such cases, it is recommended that the IP ad-
dress(es) is/are concatenated to the command line, so
that the feature extraction process will pick it up and
generate the appropriate tag. All our training data has
been semi-automatically enhanced with this informa-
tion (automatically for benign examples and manually
for the LotL examples).
Note 3. There is one more tag called
“LOOKS_LIKE_KNOWN_LOL”, which is not used
in the training process, but it is used during run-time
to override the decision of the classifier when a
previously unseen command resembles something
malicious in our dataset. More details about the tag
can be found in our initial work. However, we must
mention that it is not used the model evaluation or in
the ablation study.
Note 4. Tags for patterns don’t follow the same nam-
ing convention. Instead, we allow for selecting the
name of the tag that is generated whenever a regu-
lar expression matches and multiple expressions can
yield the same tag.
5 ENHANCED CLASSIFICATION
USING CUSTOM DROPOUT
Dropout is a well-established regularization technique
(Srivastava et al., 2014), that is preponderantly used
in the training process to prevent overfitting. Dur-
ing each training iteration, every unit (neuron) has
a pre-defined probability of being masked (dropped
out) or, in some cases, of being scaled up. We extend
this framework to perform full input-feature which is
aimed at reducing the impact of two major issues:
(a) Because we have a complex feature extrac-
tion process, some of the features might be re-
dundant and increase model instability, since
they are not linear-independent: “COMMAND_SSH”,
“KEYWORD_-R” and “SSH_R” are usually triggered
at the same time;
(b) Regular expression-based feature extraction is not
error-free and some rules might not trigger in all
cases, while their binary, keyword and IP-based
counterparts will work.
To clarify, by full-feature dropout we mean that
we randomly mask input features or their correspond-
ing embeddings and we don’t perform any scaling on
the features that are left untouched. This way, the
model is able to learn to predict whether an example
is LotL or not using less features than the superfluous
Deep Dive into Hunting for LotLs Using Machine Learning and Feature Engineering
197