We started with differentiation between malware,
tool and utility, all instances of software. However,
we saw that the definitions of the different groups var-
ied in different sources and became difficult to main-
tain. This is consistent with the known problem of
classifying malware, and we do not attempt to solve
this in our data model. When the content of the CTI
was not consistent we found that there was no value
of using it at all. Therefore in our platform these con-
cepts were all rolled up into tool, with the possibility
of tagging them as malware or utility as appropriate.
4.2.2 Enrichment and Query/Analysis Across
Sources
One of our first observations was that our graph ended
up being a series of subgraphs, and we wanted to be
able to connect them. The simple solution was en-
richment. As we added more enrichment sources,
the graph gradually became more and more intercon-
nected, and we could find new connections between
clusters of information that were originally separate.
Pivoting on an object is useful, as it lets you find
related information and give you a more comprehen-
sive context. One simple example is from DNS: start
with a domain name, find all of the IP addresses that it
has resolved to, and then find all other domain names
that have resolved to those IP addresses.
Passive DNS (pDNS) data is a historic record of
DNS lookup resolutions and is important for an inves-
tigation. From 2013 mnemonic has collected pDNS
data. By 2017, when we had the initial version of
the platform ready for data consumption, we had a
TLP:White data set of approximately 100 GB of data.
By analyzing super nodes in the data set, we have dis-
covered new and unknown sinkholes. We tag known
sinkholes with a fact connecting to the object in order
to filter them out when traversing the graph further.
A more advanced solution was to use classifiers
to bridge technical, tactical, operational and strate-
gic threat intelligence. An example of this is using
VirusTotal to bridge technical indicators to tactical
information in MITRE ATT&CK. We extracted the
malware family name from anti virus signatures and
normalized it. We then normalized the Software en-
tries from MITRE ATT&CK, e.g. “TrickBot” became
“trickbot”. Automated enrichment with VirusTotal
then connects file hashes and network infrastructure
to the “trickbot” object, which is again linked to the
tactical threat intelligence in ATT&CK.
We also observed that we could create uncommon
pivot points, and our URI object type is an example of
this. A URI object is just a UUID connecting different
components to each other for a complete URI. Fig-
ure 2 shows the facts connecting to a URI in red and
blue color. Given a URL, we split it into the host (do-
main/IP) part, the path and the query parameters. Piv-
oting on query parameters proved useful when track-
ing spam campaigns with specific phishing kits, as all
of the other pivot points changed for each spam run,
but the query parameter stayed the same.
4.2.3 Aliasing
Our data model allows for aliasing different names for
the same object.
Instead of giving a threat actor a primary name,
like in MISP Galaxy, we use alias as a fact type be-
tween threat actor names that are known or suggested
to be the same. This may also be seen in Figure 2 with
green color. Adding information on any threat actor’s
name is then done by linking to the name given at the
source. In this way, if an alias turns out to be wrong,
you only need to retract that one alias, and the rest of
your information is still correct.
The problem of different names for the same ob-
ject is a common situation in CTI. Often, we find
different providers of CTI gives a primary name for
the object, and connect all information about this ob-
ject to that name. For instance, if selecting “APT28”
as the main name for a threat actor, and receive in-
formation about “Fancy Bear” (an alias for APT28),
then such a solution will connect the information to
“APT28”. This information can be wrong. If you at
some point in the future decide that “Fancy Bear” is
not an alias for “APT28”, then you would have a large
manual task in correcting your data.
The alias fact type is used between threat actors
and tools and might be applied to other object types
in the future.
4.2.4 What is Content?
The concept of content is an example of where we
need to be precise in order to enable automation. In
the context of CTI, we handle not just files, but also
stream segments, text strings and parts of content that
has been found in memory. This is all “content”, but
should not all be classified as files. Furthermore, even
in the case of a file, we find that it is seen as unique
based on more than one property. We argue that the
file name, the actual content, and the location of the
content together is what we refer to when we describe
something as a unique file.
To illustrate the above we use the example of two
files with the file system path /etc/hosts on two dif-
ferent Linux machines. In a given situation, the name
and content may be the same, but they are still not
the same file due to the fact that they reside on dif-
ferent machines. In a different scenario you can find
Modeling Cyber Threat Intelligence
277