Linkage Between CVE and ATT&CK with Public Information

Tomoaki Mimoto

, Yuta Gempei

, Kentaro Kita

, Takamasa Isohara

Shinsaku Kiyomoto

and Toshiaki Tanaka

KDDI Research, Inc., 2–1–15 Ohara, Fujimino-shi, Saitama, 356–8502, Japan

University of Hyogo, 7–1–28, Minatojima-minamimachi, Chuo-ku Kobe, Hyogo 650–0047, Japan

Keywords:

CVE, ATT&CK, NVD, CWE, CAPEC, LLM.

Abstract:

Establishing rapid and effective cyber threat intelligence collection and analysis methodologies are required to

counter the rapidly growing sophistication of cyberattacks. The overview of known vulnerability information

and related information can be found in databases such as NVD. However, the relationship between vulner-

abilities and TTPs, which are effective CTIs, must be analyzed individually by experts, and many of these

relationships are unknown. In this study, we attempt to connect vulnerability information keyed to CVE-IDs

with the ATT&CK, which is a knowledge base for TTPs. Speciﬁcally, vulnerability information and tech-

niques associated with ATT&CK are each put into an embedding representation with related information, and

the similarities between them are evaluated to estimate the techniques related to the CVE-IDs. This study

considers the reproducibility problem due to the lack of ground truth in the cybersecurity ﬁeld by handling

only information available from the surface Web.

1 INTRODUCTION

Common Vulnerabilities and Exposures (CVE) pro-

vide vulnerability information, and each vulnerabil-

ity is assigned a CVE-ID mainly by MITRE, a non-

proﬁt organization in the US. The number of reported

vulnerabilities that could be used in cyberattacks has

continued to increase rapidly since 2017, and what’s

more, this is just the tip of the iceberg

. The tradi-

tional reactive approach to cyberattacks is becoming

antiquated, and a change to the proactive countermea-

sures against potential attacks is being demanded.

Tactics, techniques, and procedures (TTPs) are a

concept related to vulnerability threat assessment that

focuses on the principles of attacker behavior and at-

tack scenarios. TTPs represent the most fundamental

idea of attacers, and analyzing and evaluating TTPs

are considered to be the most important for under-

standing the nature of cyberattacks. Understanding

TTPs can facilitate an understanding of the attacker’s

behavior and help in determining policies for defense.

Early identiﬁcation of potential ways that an attacker

can exploit a vulnerability, and knowing where it is in

the attack life-cycle, will lead to accurate vulnerabil-

ity assessments. Therefore, linking CVEs and TTPs

https://ﬂashpoint.io/blog/vulndb-uncovers-hidden-

vulnerabilities-cve/

to predict attack scenarios that exploit the vulnera-

bilities, estimate risks, and prioritize responses is ex-

pected to be one of the proactive measures against cy-

ber threats.

Prior to the advocacy of the pyramid-of-pain, the

ATT&CK framework was created by MITRE, a US

non-proﬁt organization, based on the fact that the

techniques attackers exploit converge to some extent

with the constraints of the target object. ATT&CK

is a knowledge base that organizes TTPs and can be

used for developing threat models, determining coun-

termeasures, and active threat hunting in cybersecu-

rity. There are several databases related to cyberat-

tacks and vulnerabilities, such as the National Vulner-

ability Database (NVD) managed by MITRE in addi-

tion to ATT&CK, Common Weakness Enumerations

(CWE), and Common Attack Pattern Enumeration

and Classiﬁcation (CAPEC) managed by NIST. For

reproducibility, this paper constructs data for eval-

uation from structured databases and discusses the

representativeness of vulnerability using LLM. Our

experiments showed that aggregating multiple pieces

of connectable information improves expressiveness,

which suggests the possibility of further enhancing

expressiveness through the use of unique and superior

information.

Mimoto, T., Gempei, Y., Kita, K., Isohara, T., Kiyomoto, S. and Tanaka, T.

Linkage Between CVE and ATT&CK with Public Information.

DOI: 10.5220/0012722600003767

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 21st International Conference on Security and Cryptography (SECRYPT 2024), pages 655-660

ISBN: 978-989-758-709-2; ISSN: 2184-7711

655

1.1 Research Topic

There are many challenges to collecting and analyz-

ing cyber threat information to generate Cyber Threat

Intelligence (CTI) (Rahman et al., 2023), and we

consider two issues here. The ﬁrst is that the large

amount of threat information in its various forms is

not well coordinated with each other. Even in struc-

tured databases managed by organizations, informa-

tion may be missing or insufﬁciently coordinated. For

example, connecting vulnerability information with

TTPs is considered effective for accurate risk assess-

ment and attack detection, but these connection meth-

ods have not been established, and there is little ex-

isting research. The second is the scarcity of correct

and reproducible available data in the cybersecurity

ﬁeld, for example, there are few tagged corpora. In

the previous study (Kuppa et al., 2021), experts man-

ually link CVE-IDs to techniques as the correct an-

swer data, and the results are not disclosed publicly.

In addition, it uses multiple non-public information

generated by experts and is not reproducible. Rah-

man et al. also show that the sources of information

handled by existing research are often unclear (Rah-

man et al., 2023). This poses issues of generality and

reproducibility, and may hinder the development of

the cybersecurity ﬁeld as a whole.

In this study, we target ATT&CK as information

that is not connected clearly to vulnerability informa-

tion and examine the linkage method between CVE

and ATT&CK. We also explain a method to extract

some of the linked data between CVE-ID and tech-

nique from publicly available information as the cor-

rect data, and guarantee the reproducibility of the ex-

periment.

1.2 Contribution

This study discusses the possibility of enriching vul-

nerability information by improving the embedding

representation. Vulnerabilities assigned a CVE-ID

can be connected to multiple pieces of information

by traversing public databases, which can be lever-

aged to improve embedding representation of vulner-

ability information. Furthermore, a similar embed-

ding representation can be used to map related infor-

mation that is not easily associated with the vulnera-

bility to the same space as the vulnerability informa-

tion, and the similarity of the information enables the

linkage between them. In this study, we utilize CWE

and CAPEC as information that can be connected to

CVE-IDs starting from NVD, and discuss the possi-

bility of connecting CVE-IDs to the ATT&CK tech-

nique by evaluating the similarity. In our experiments,

we have conﬁrmed that by combining information, it

is possible to connect CVE-IDs to TTP chains, which

are the clusters of related techniques, with about 87%

accuracy, even from only the most basic information

sources.

This study also argues for the need to ensure gen-

erality and reproducibility through the use of publicly

available data. This allows comparison of the meth-

ods themselves, independent of the value of the data

used, and encourages the development of research in

the ﬁeld of cybersecurity. The problem of lack of

ground truth is a challenge in the ﬁeld of cyber se-

curity. In this research target, we construct a dataset

by extracting necessary information from public in-

formation, and guarantee that all information about

the experiment can be reconstructed from public data

only.

2 DATASET

2.1 Public Dataset

In the cyber security ﬁeld, public databases have been

established to share information among organizations

in order to combat the vast number of vulnerabilities

and attacks that exploit them. In this study, we utilize

NVD, CWE, CAPEC, and ATT&CK, which are used

by security vendors.

NVD provides an overview of each vulnerabil-

ity and exposure, including URLs with related in-

formation, the organization that registered the CVE,

CVSS score, related CWE-ID, affected software ver-

sion, and update history. In this study, we use in-

formation that can be extracted from the NVD re-

garding vulnerabilities with IDs assigned in 2023 as

of 1/23/2024. CWE is a database that systematizes

vulnerability types, and in version 4.13, vulnerabil-

ity types are classiﬁed into 934 categories, each of

which is assigned a CWE-ID as an identiﬁer. CWE

provides an overview of each vulnerability type and

related vulnerability types and CAPEC-IDs. CWE

is organized from multiple viewpoints such as soft-

ware development and hardware design, and we use

the CWE-1000 dataset, which contains all vulnerabil-

ity types organized from the perspective of research

objectives. CAPEC is a database that systematizes at-

tack patterns, and in version 3.9, there are 559 types,

each with a CAPEC-ID as an identiﬁer. CAPEC

provides an overview of each attack pattern, the re-

lated attack patterns and CWE-ID, the prerequisites

for a successful attack, the attack ﬂow, and mitigation

measures. CAPEC, as well as CWE, is organized in

multiple views, and we use CAPEC-1000, which in-

SECRYPT 2024 - 21st International Conference on Security and Cryptography

656

cludes all attack patterns, as our dataset. ATT&CK

is a knowledge base that organizes TTPs and can be

used for developing threat models, determining coun-

termeasures, and active threat hunting in cyber secu-

rity. In ATT&CK, as in other structured databases,

identiﬁers are assigned to each of the TTPs and we

focus on the techniques of the enterprise ﬁeld. Tech-

nique has sub-techniques for further details, and in

v14, they are organized into 201 techniques and 424

sub-techniques. Effective mitigation is linked to each

technique in the dataset available from ATT&CK.

Moreover, we also utilize group information to con-

struct the TTP chains described below. The groups

represent activity clusters known as threat actors, and

each group is linked to the techniques they primarily

use.

2.2 Dataset for Evaluation

Ground truth, i.e., data for evaluation, is necessary to

conduct an evaluation experiment. Data for evalua-

tion can be easily constructed when information con-

tained in structured databases is used as the objective

variable. However, when making predictions about

information with unknown connections, experts often

have to tag the information manually. Although this

work incurs a signiﬁcant cost, it can also be used as

highly accurate training data, and as a result, a highly

accurate model can be expected. However, there are

often cases where the correct data generated is not

disclosed, and in these cases, the superiority of the

model’s design method cannot be compared and is

not reproducible. In this paper, the correct answers

are also constructed from only publicly available

data, and the design guarantees generality and repro-

ducibility. Speciﬁcally, we utilize AlienVault’s Open

Threat Exchange (OTX), a crowdsourced threat infor-

mation sharing platform that is open to anyone with a

registered account. OTX provides an SDK

to collect

threat information called pulse. Pulses include IoCs

such as IP addresses and URLs, and may also include

CVE-IDs and related techniques from ATT&CK. In

this paper, we extracted the pulses from OTX’s Alien-

Vault account from 1/1/2023 to 1/11/2023 that contain

both CVE-ID and technique, and treated the combina-

tion of these pulses as a dataset for evaluation.

2.3 Building TTP Chains

Inferring techniques used by attackers for CVE-IDs

may allow us to predict the sequence of attack meth-

ods, i.e., TTP chains. If a TTP chain can be identiﬁed

https://github.com/AlienVault-OTX/OTX-Python-

SDK

through a technique associated with a CVE-ID, it is

possible to predict possible subsequent attack meth-

ods and proactively tackle the vulnerability if it could

pose a signiﬁcant risk later on. Therefore, we discuss

the linkage between CVE-ID and technique as well

as the linkage between CVE-ID and the TTP chain.

We apply a method to reproduce TTP chains from

techniques (Al-Shaer et al., 2020). We focus on 143

groups in total in the ATT&CK dataset, and repre-

sent them as one-hot vectors based on the techniques

they use. Since ATT&CK classiﬁes techniques into

201 types, excluding sub-techniques, each attacker

group g

is represented by g

∈

{

0, 1

}

201

. Consider-

ing a matrix consisting of groups and techniques M ∈

{

0, 1

}

143×201

, we obtain a technique t

∈

{

0, 1

}

143

that

is represented by a one-hot vector of groups. It is

possible to construct highly related technique clusters

by evaluating the similarity of these technique vectors

and clustering them, and each cluster can be consid-

ered as a TTP chain. In our experiment, the same

setting as in (Al-Shaer et al., 2020) were used and

the ﬁnal number of clusters was 37. Note that it has

been reported that a technique association of about

90% has been found in clusters using this method, and

this study treats the results obtained as true (Al-Shaer

et al., 2020).

3 EMBEDDING

VULNERABILITIES

This study discusses the linkage between vulnerabil-

ities assigned a CVE-ID and information that can-

not be directly connected to the CVE-ID. As a case

study, we attempt to connect CVE-IDs to techniques

for ATT&CK or TTPs as information that cannot be

directly connected. Speciﬁcally, the vulnerability to

which the CVE-ID is assigned and the information to

be connected, in this case the technique, are each put

into an embedding representation, and the technique

associated with the CVE-ID is inferred from the sim-

ilarity between them.

3.1 Embedding Representations

Embedding representations of words and sentences

have been realized by deep learning models using

CNNs and RNNs, but since the transformer-based

architecture (Vaswani et al., 2017) was proposed in

2017, various fast and accurate NLP models have

been proposed, including BERT (Devlin et al., 2018).

This paper uses BERT, one of the major NLP models,

to achieve an embedding representation of vulnerabil-

Linkage Between CVE and ATT&CK with Public Information

657

ity information.

The pre-training model used in BERT utilizes

BooksCorpus and English Wikipedia as pre-training

data, and the model has good comprehension of gen-

eral terms and sentences, but poor understanding of

terms and contexts in the cybersecurity ﬁeld. There-

fore, we use SecBERT (Liberato, 2022), which is a

pre-trained model using documents from the cyberse-

curity ﬁeld. Using custom heads is generally more ac-

curate than using CLS tokens in the ﬁnal layer, and in

this paper, we obtain the embedding representation of

a sentence by the average pooled value of all tokens

in the ﬁnal layer. In the database used in this study,

a single description contains multiple sentences, and

because these sentences are input together, the num-

ber of tokens exceeds 512, which is the upper limit

that can be processed by BERT, in some cases. There

are various ways to truncate sentences, and here we

use 256 tokens each at the beginning and end of a

sentence if the number of tokens exceeds 512. With

the above heuristic tuning, an embedding represen-

tation for a vulnerability assigned a CVE-ID v

cve

obtained. Similarly, an embedding representation for

a technique in ATT&CK v

tec

is obtained. Since v

cve

and v

tec

are represented by vectors of dimension 768,

respectively, it is possible to directly evaluate their

similarity.

3.2 Use of Multiple Resources

We consider improving the expressiveness of vulner-

abilities by adding relevant information. It is possi-

ble to link NVD, CWE, and CAPEC with each other

using the CWE-ID and CAPEC-ID as keys, so that

they can be used as additional information to express

the vulnerability to which the CVE-ID is assigned.

ATT&CK is also interlinked with tactics, techniques,

mitigations, etc., so that, for example, it is possi-

ble to check which mitigations are valid for a given

technique. Hence, as with CVE-IDs, ATT&CK tech-

niques can also utilize related information such as

mitigations and tactics that represents the technique.

In this paper, we use n pieces of information related to

CVE-IDs and ATT&CK techniques. The ﬁnal embed-

ding representations of vulnerabilities and techniques

are evaluated as a weighted average of the embedding

representations of a single information and additional

information. Let v

be the embedding representa-

tion of a single information and v

(i ∈

{

1, 2, ..., n

}

)

be the embedding representation of n additional in-

formation, the ﬁnal embedding representation v

expressed as follows.

· v

∑

i=1

· v

∑

i=1

(1)

Here, w

is the weight for each information and they

are ﬁxed at ∀i; w

= 1 in the following experiments.

By obtaining v

in the same way as v

, they can be

mapped onto the same space, and thus, v

can be

directly compared to each other. There are currently

625 techniques, of which 424 are subtechniques. The

embedding representation v

tec

of each technique is

obtained by formula (1) using the embedding repre-

sentation v

of the technique and n embedding repre-

sentations v

of the sub-techniques. Cosine similarity

is used as the similarity measure in this study.

4 EXPERIMENT

4.1 Estimation of Technique

This paper uses CVE’s description, and CWE and

CAPEC’s description and mitigation as embedding

representations of vulnerabilities. The embedding

representations of vulnerabilities are represented here

by (CVE’s description, CWE’s description, CWE’s

mitigation, CAPEC’s description, CAPEC’s mitiga-

tion), where each item is set to 1 if the information

is used and 0 if not. For example, if only the CVE’s

description is used, it is represented as (1, 0, 0, 0, 0).

Similarly, the descriptions of techniques and mitiga-

tions are used for the embedding representations of

techniques, and the embedding representations using

both descriptions are represented as (technique, mit-

igation) = (1,1). We ﬁrst tested the linkage between

CVE-ID and each technique. For evaluation, the top

k techniques that are similar for each CVE-ID are se-

lected, and it is considered correct if at least one of

the techniques is included in the correct data.

Table 1 shows the results of similarity evalua-

tion of vulnerabilities represented by CVE-IDs and

ATT&CK technique. Each raw represents the pre-

diction accuracy at k ∈

{

1, 2, 3, 4

}

for each represen-

tation of CVE-IDs and techniques. For each k, the

top three scores are shown in bold. The accuracy in-

creases gradually as k increases, but it is not linear.

On the other hand, as k increases, the error rate also

increases. In our experiments, for the estimation of

techniques, the error rate was lowest for k = 1 in most

cases, i.e., the total number of correct techniques rel-

ative to the total number of predicted techniques was

the largest. For the estimation of technique, surpris-

ingly, we conﬁrmed that (1, 0, 0, 0, 0), i.e., the case

in which only the CVE-ID’s descriptions are used,

is highly accurate. One reason for this may be that

the description of a mitigation relates to more than

one technique. For example, M1018 is about manag-

ing user accounts properly, and there are nearly 100

SECRYPT 2024 - 21st International Conference on Security and Cryptography

658

Table 1: Prediction of techniques related to CVE-ID.

CVE, ATT&CK 1 2 3 4

(1,0,0,0,0), (0,1) 0.055 0.055 0.091 0.272

(1,0,0,0,0), (1,0) 0.073 0.182 0.291 0.309

(1,0,0,0,0), (1,1) 0.109 0.291 0.382 0.491

(1,1,0,1,0), (0,1) 0.055 0.055 0.091 0.255

(1,1,0,1,0), (1,0) 0.073 0.145 0.182 0.182

(1,1,0,1,0), (1,1) 0.164 0.273 0.291 0.327

(1,1,1,1,1), (0,1) 0.055 0.055 0.091 0.200

(1,1,1,1,1), (1,0) 0.073 0.164 0.200 0.236

(1,1,1,1,1), (1,1) 0.127 0.255 0.309 0.345

(0,1,1,0,0), (0,1) 0.036 0.055 0.109 0.291

(0,1,1,0,0), (1,1) 0.055 0.218 0.273 0.345

(0,0,0,1,1), (0,1) 0.036 0.055 0.109 0.200

(0,0,0,1,1), (1,1) 0.073 0.164 0.182 0.238

(0,0,1,0,1), (0,1) 0.036 0.055 0.109 0.236

(0,0,1,0,1), (1,1) 0.073 0.091 0.127 0.182

techniques that this approach is effective. Therefore,

even if mitigation can be estimated, it does not lead

to technique estimation. A result supporting this con-

sideration is that when the representation of technique

is (0,1), the CVE-ID is rarely tied to a speciﬁc tech-

nique. It is considered that in order to connect a CVE-

ID to a unique technique, it is necessary to have infor-

mation that includes a description clearly associated

with that technique.

4.2 Estimation of TTP Chain

We then tested the linkage between CVE-ID and TTP

chains. In the estimation of TTP chains, we ﬁrst de-

termine the clusters of TTP chains to which the tech-

nique connected to the CVE-ID of the correct data

belongs. We then select the top k techniques that are

similar to the embedding representation of the CVE-

ID as before, and determine the clusters of their TTP

chains. In the experiment, we assume that a vulner-

ability is correctly predicted when at least one of the

predicted clusters is included in the cluster of the cor-

rect data. The results of the experiment are shown

in table 2. We obtain higher accuracy in estimating

TTP chains than in estimating techniques, and even

with k = 1, the accuracy rate exceeds 56% at max-

imum. One of the main reasons for the improved

accuracy is that the estimation of the TTP chain is

a 37-classiﬁcation task, while the estimation of the

technique is a 201-classiﬁcation task, making it eas-

ier to guess. In addition to this, there may be a reason

speciﬁc to the TTP chain. Some techniques in the

same TTP chain are used selectively, and these can

be handled with the same mitigation. For example,

in our experiment, T1008 and T1104 are included in

Table 2: Prediction of TTP chains related to CVE-ID (1).

CVE, ATT&CK 1 2 3 4

(1,0,0,0,0), (0,1) 0.055 0.400 0.436 0.564

(1,0,0,0,0), (1,0) 0.291 0.509 0.636 0.745

(1,0,0,0,0), (1,1) 0.345 0.600 0.727 0.800

(1,1,0,1,0), (0,1) 0.164 0.400 0.436 0.655

(1,1,0,1,0), (1,0) 0.527 0.600 0.655 0.745

(1,1,0,1,0), (1,1) 0.509 0.709 0.764 0.873

(1,1,1,1,1), (0,1) 0.364 0.455 0.491 0.673

(1,1,1,1,1), (1,0) 0.527 0.600 0.655 0.745

(1,1,1,1,1), (1,1) 0.545 0.655 0.800 0.800

(0,1,1,0,0), (0,1) 0.255 0.455 0.491 0.564

(0,1,1,0,0), (1,1) 0.491 0.727 0.782 0.873

(0,0,0,1,1), (0,1) 0.164 0.400 0.436 0.636

(0,0,0,1,1), (1,1) 0.491 0.636 0.673 0.745

(0,0,1,0,1), (0,1) 0.364 0.636 0.491 0.709

(0,0,1,0,1), (1,1) 0.564 0.709 0.709 0.818

the same TTP chain. These are techniques that can

be used selectively or simultaneously to make it difﬁ-

cult to detect command and control. The mitigations

of them are common and characterize the TTP chain.

Therefore, unlike the estimation of the technique, the

inclusion of the mitigation is considered to contribute

to the evaluation of similarity as a cluster. In fact, the

estimation of the TTP chain tends to be slightly more

accurate when multiple pieces of information are in-

cluded, especially mitigation, than when only a single

piece of information is included. As with the estima-

tion of technique, the accuracy of the TTP chain in-

creases gradually as k increases, but the error rate also

increases, so it is necessary to determine an appropri-

ate k depending on the nature of the task. In our exper-

iments, for the estimation of the TTP chain, the error

rate was lowest for k = 3 in most cases. The highest

accuracy at k = 3 is about 80%, which is sufﬁcient

when considering that the embedding representation

is constructed using only the most basic information

(NVD, CWE, CAPEC, and ATT&CK).

The experimental results so far indicate that the

NVD’s description is the most important sources in

terms of representing technique, and mitigation, es-

pecially CWE, contributes to the connection between

CVE-ID and TTP chain. With the above in mind, the

table 3 shows the results when the representation of

vulnerabilities is (1,1,1,0,0). The result when the rep-

resentation of the technique is (1,1) shows almost the

highest accuracy in the experiment so far. Especially

for ATT&CK, the combination of technique and mit-

igation improves the accuracy by 5.4 to 16.4%, con-

ﬁrming the effect of combining information. Our ex-

perimental results show that when embedding repre-

sentations of vulnerabilities, it is possible to construct

Linkage Between CVE and ATT&CK with Public Information

659

Table 3: Prediction of TTP chains related to CVE-ID (2).

CVE, ATT&CK 1 2 3 4

(1,1,1,0,0), (1,0) 0.400 0.673 0.691 0.764

(1,1,1,0,0), (1,1) 0.564 0.745 0.782 0.818

embedding representations that are more suitable for

the purpose by selecting and incorporating sufﬁcient

information according to the purpose, even if it is a

simple linear combination. On the other hand, it also

suggests that the inclusion of unnecessary informa-

tion reduces the expressive power of the embedding

representation.

5 RELATED WORK

Here, we introduce some papers related to CVE-IDs

and TTPs as related studies. BRON (Hemberg et al.,

2020) is an initiative that attempts to connect various

types of information starting from tactics and prod-

ucts, and it can be conﬁrmed that the connection be-

tween CWE and technique is insufﬁcient. A similar

study by MITRE with CVE-ID and technique con-

nection results can be found on Github

, but it cov-

ers only some vulnerabilities up to 2020 and does

not allow evaluation for new vulnerabilities. In ad-

dition, Kuppa et al. developed a predictive model

of the ATT&CK technique associated with CVE-IDs

(Kuppa et al., 2021). In the experiment, CVE-IDs

were manually linked to techniques in advance, and

multiple other information sources were used to sug-

gest the possibility of connecting to unknown tech-

niques, and the model was designed for concept drift.

In studies related to TTPs, Ayoade et al. proposed a

bias-corrected SVM classiﬁer to classify tactics and

techniques in reports from multiple security-related

companies (Ayoade et al., 2018), and Li et al. at-

tempted a multi-label classiﬁcation of TTPs using the

semantic similarity of texts using TF-IDF (Li et al.,

2019).

6 CONCLUSION

This study proposed a method to improve the expres-

sions of vulnerability information using BERT. We

evaluated the similarity by applying a weighted aver-

age of multiple embedding representations of related

information to the vulnerability and the expected con-

nection destinations. This study differs from previous

https://github.com/center-for-threat-informed-

defense/ to cve

works in that it is highly reproducible because all in-

formation is collected from publicly available infor-

mation. Therefore, it is possible to discuss the supe-

riority of the model construction method itself, inde-

pendent of the data. As a connection to ATT&CK, we

evaluated the linkability of techniques and TTP chains

associated with vulnerabilities and conﬁrmed an im-

provement in accuracy of up to 16.4% with the use of

additional information, especially for the TTP chains

estimation. Since unnecessary information may be in-

cluded in the embedding representation, the accuracy

of the embedding representation is expected to be fur-

ther improved by using documents with higher infor-

mation content and by varying the weights according

to the reliability of the information.

REFERENCES

Al-Shaer, R., Spring, J. M., and Christou, E. (2020). Learn-

ing the associations of mitre att & ck adversarial tech-

niques. In 2020 IEEE Conference on Communications

and Network Security (CNS), pages 1–9. IEEE.

Ayoade, G., Chandra, S., Khan, L., Hamlen, K., and Thu-

raisingham, B. (2018). Automated threat report clas-

siﬁcation over multi-source data. In 2018 IEEE 4th

International Conference on Collaboration and Inter-

net Computing (CIC), pages 236–245. IEEE.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2018). Bert: Pre-training of deep bidirectional trans-

formers for language understanding. arXiv preprint

arXiv:1810.04805.

Hemberg, E., Kelly, J., Shlapentokh-Rothman, M., Rein-

stadler, B., Xu, K., Rutar, N., and O’Reilly, U.-M.

(2020). Linking threat tactics, techniques, and pat-

terns with defensive weaknesses, vulnerabilities and

affected platform conﬁgurations for cyber hunting.

arXiv preprint arXiv:2010.00533.

Kuppa, A., Aouad, L., and Le-Khac, N.-A. (2021). Link-

ing cve’s to mitre att&ck techniques. In Proceedings

of the 16th International Conference on Availability,

Reliability and Security, pages 1–12.

Li, M., Zheng, R., Liu, L., and Yang, P. (2019). Extrac-

tion of threat actions from threat-related articles using

multi-label machine learning classiﬁcation method. In

2019 2nd International Conference on Safety Produce

Informatization (IICSPI), pages 428–431. IEEE.

Liberato, M. (2022). Secbert: Analyzing reports using bert-

like models. Master’s thesis, University of Twente.

Rahman, M. R., Hezaveh, R. M., and Williams, L. (2023).

What are the attackers doing now? automating cy-

berthreat intelligence extraction from text on pace

with the changing threat landscape: A survey. ACM

Computing Surveys, 55(12):1–36.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.

(2017). Attention is all you need. Advances in neural

information processing systems, 30.

SECRYPT 2024 - 21st International Conference on Security and Cryptography

660