
Table 1: Prediction of techniques related to CVE-ID.
k
CVE, ATT&CK 1 2 3 4
(1,0,0,0,0), (0,1) 0.055 0.055 0.091 0.272
(1,0,0,0,0), (1,0) 0.073 0.182 0.291 0.309
(1,0,0,0,0), (1,1) 0.109 0.291 0.382 0.491
(1,1,0,1,0), (0,1) 0.055 0.055 0.091 0.255
(1,1,0,1,0), (1,0) 0.073 0.145 0.182 0.182
(1,1,0,1,0), (1,1) 0.164 0.273 0.291 0.327
(1,1,1,1,1), (0,1) 0.055 0.055 0.091 0.200
(1,1,1,1,1), (1,0) 0.073 0.164 0.200 0.236
(1,1,1,1,1), (1,1) 0.127 0.255 0.309 0.345
(0,1,1,0,0), (0,1) 0.036 0.055 0.109 0.291
(0,1,1,0,0), (1,1) 0.055 0.218 0.273 0.345
(0,0,0,1,1), (0,1) 0.036 0.055 0.109 0.200
(0,0,0,1,1), (1,1) 0.073 0.164 0.182 0.238
(0,0,1,0,1), (0,1) 0.036 0.055 0.109 0.236
(0,0,1,0,1), (1,1) 0.073 0.091 0.127 0.182
techniques that this approach is effective. Therefore,
even if mitigation can be estimated, it does not lead
to technique estimation. A result supporting this con-
sideration is that when the representation of technique
is (0,1), the CVE-ID is rarely tied to a specific tech-
nique. It is considered that in order to connect a CVE-
ID to a unique technique, it is necessary to have infor-
mation that includes a description clearly associated
with that technique.
4.2 Estimation of TTP Chain
We then tested the linkage between CVE-ID and TTP
chains. In the estimation of TTP chains, we first de-
termine the clusters of TTP chains to which the tech-
nique connected to the CVE-ID of the correct data
belongs. We then select the top k techniques that are
similar to the embedding representation of the CVE-
ID as before, and determine the clusters of their TTP
chains. In the experiment, we assume that a vulner-
ability is correctly predicted when at least one of the
predicted clusters is included in the cluster of the cor-
rect data. The results of the experiment are shown
in table 2. We obtain higher accuracy in estimating
TTP chains than in estimating techniques, and even
with k = 1, the accuracy rate exceeds 56% at max-
imum. One of the main reasons for the improved
accuracy is that the estimation of the TTP chain is
a 37-classification task, while the estimation of the
technique is a 201-classification task, making it eas-
ier to guess. In addition to this, there may be a reason
specific to the TTP chain. Some techniques in the
same TTP chain are used selectively, and these can
be handled with the same mitigation. For example,
in our experiment, T1008 and T1104 are included in
Table 2: Prediction of TTP chains related to CVE-ID (1).
k
CVE, ATT&CK 1 2 3 4
(1,0,0,0,0), (0,1) 0.055 0.400 0.436 0.564
(1,0,0,0,0), (1,0) 0.291 0.509 0.636 0.745
(1,0,0,0,0), (1,1) 0.345 0.600 0.727 0.800
(1,1,0,1,0), (0,1) 0.164 0.400 0.436 0.655
(1,1,0,1,0), (1,0) 0.527 0.600 0.655 0.745
(1,1,0,1,0), (1,1) 0.509 0.709 0.764 0.873
(1,1,1,1,1), (0,1) 0.364 0.455 0.491 0.673
(1,1,1,1,1), (1,0) 0.527 0.600 0.655 0.745
(1,1,1,1,1), (1,1) 0.545 0.655 0.800 0.800
(0,1,1,0,0), (0,1) 0.255 0.455 0.491 0.564
(0,1,1,0,0), (1,1) 0.491 0.727 0.782 0.873
(0,0,0,1,1), (0,1) 0.164 0.400 0.436 0.636
(0,0,0,1,1), (1,1) 0.491 0.636 0.673 0.745
(0,0,1,0,1), (0,1) 0.364 0.636 0.491 0.709
(0,0,1,0,1), (1,1) 0.564 0.709 0.709 0.818
the same TTP chain. These are techniques that can
be used selectively or simultaneously to make it diffi-
cult to detect command and control. The mitigations
of them are common and characterize the TTP chain.
Therefore, unlike the estimation of the technique, the
inclusion of the mitigation is considered to contribute
to the evaluation of similarity as a cluster. In fact, the
estimation of the TTP chain tends to be slightly more
accurate when multiple pieces of information are in-
cluded, especially mitigation, than when only a single
piece of information is included. As with the estima-
tion of technique, the accuracy of the TTP chain in-
creases gradually as k increases, but the error rate also
increases, so it is necessary to determine an appropri-
ate k depending on the nature of the task. In our exper-
iments, for the estimation of the TTP chain, the error
rate was lowest for k = 3 in most cases. The highest
accuracy at k = 3 is about 80%, which is sufficient
when considering that the embedding representation
is constructed using only the most basic information
(NVD, CWE, CAPEC, and ATT&CK).
The experimental results so far indicate that the
NVD’s description is the most important sources in
terms of representing technique, and mitigation, es-
pecially CWE, contributes to the connection between
CVE-ID and TTP chain. With the above in mind, the
table 3 shows the results when the representation of
vulnerabilities is (1,1,1,0,0). The result when the rep-
resentation of the technique is (1,1) shows almost the
highest accuracy in the experiment so far. Especially
for ATT&CK, the combination of technique and mit-
igation improves the accuracy by 5.4 to 16.4%, con-
firming the effect of combining information. Our ex-
perimental results show that when embedding repre-
sentations of vulnerabilities, it is possible to construct
Linkage Between CVE and ATT&CK with Public Information
659