To justify the choice of SVM classifier for the
SSC, the same data is used to train and classify the
Decision tree-based classifier. The average results of
the 10-fold cross validation for SVM-based and Deci-
sion tree-based classification experiments are shown
in Table 5.
Table 5: Performance comparison of SVM and Decision
tree based classifier.
Prec. Rec. F1 Score Accuracy
SVM 76.51 81.53 78.92 78.28
Dec. tree 73.69 74.40 74.03 73.96
The comparison result shows that the SVM classi-
fier performs better than the Decision tree-based clas-
sifier, which confirms the choice of SSC.
Overall, the experiment result seems to be com-
pelling evidence that the significance score of the
cyber-threat information could be calculated by an
SVM classifier using the features generated by se-
mantic textual similarity and a custom-trained NER
model.
5 CONCLUSION
In this paper we proposed a novel approach to quan-
tify the relevance and significance of cyber-threat in-
formation in text format by extracting the features
such as maximum similarity scores with a pre-defined
“significant” text and a number of relevant named en-
tities. The experiment result shows the potential of
our approach with 78% accuracy.
As sentences used in classes of Reference text
have come from the same source and are homoge-
neous in nature, the semantic similarities of the differ-
ent classes of Reference text are too close, as shown
in Table 2. This drawback affects the features of the
input text, thus reducing the overall performance. By
selectively using various sources and assigning dif-
ferent weight scores depending upon the relevance
to the classes of Reference text, this problem could
be solved. Also, the Named Entity Analyzer con-
tributes only one feature to the overall model predic-
tion; therefore, changing the design of the experiment
to generate more features from named entities could
improve the operation of the Named Entity Analyzer.
Regarding future work, we would improve our
experiment’s design by accommodating the changes
mentioned and also using a new NER model by uti-
lizing different algorithms and extending the scope of
the entities further than the single label of “ITProd-
uct”.
REFERENCES
Bridges, R. A., Jones, C. L., Iannacone, M. D., and Goodall,
J. R. (2013). Automatic labeling for entity extraction
in cyber security. CoRR, abs/1308.4941.
Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John,
R. S., Constant, N., Guajardo-Cespedes, M., Yuan, S.,
Tar, C., Sung, Y., Strope, B., and Kurzweil, R. (2018).
Universal sentence encoder. CoRR, abs/1803.11175.
Dion
´
ısio, N., Alves, F., Ferreira, P. M., and Bessani, A.
(2019). Cyberthreat detection from twitter using deep
neural networks. CoRR, abs/1904.01127.
Jones, C. L., Bridges, R. A., Huffer, K. M. T., and Goodall,
J. R. (2015). Towards a relation extraction framework
for cyber-security concepts. CoRR, abs/1504.04317.
Joshi, A., Lal, R., Finin, T., and Joshi, A. (2013). Extracting
cybersecurity related linked data from text. In Pro-
ceedings of 7th International Conference on Semantic
Computing.
Kirillov, I., Chase, P., Beck, D., and Martin, R. (2010).
Malware attribute enumeration and characterization.
White paper, The MITRE Corporation, Tech.
Le, Q. V. and Mikolov, T. (2014). Distributed rep-
resentations of sentences and documents. CoRR,
abs/1405.4053.
Mendsaikhan, O., Hasegawa, H., Yamaguchi, Y., and Shi-
mada, H. (2018). Mining for operation specific ac-
tionable cyber threat intelligence in publicly available
information source. In Proceedings of Symposium on
Cryptography and Information Security.
Mendsaikhan, O., Hasegawa, H., Yamaguchi, Y., and Shi-
mada, H. (2019). Identification of cybersecurity spe-
cific content using the doc2vec language model. In
Proceedings of IEEE 43rd Annual Computer Software
and Applications Conference (COMPSAC).
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space. In 1st International Conference on Learning
Representations.
More, S., Matthews, M., Joshi, A., and Finin, T. (2012).
A knowledge-based approach to intrusion detection
modeling. In Proceedings of 2012 IEEE Symposium
on Security and Privacy Workshops.
Mulwad, V., Li, W., Joshi, A., Finin, T., and Viswanathan,
K. (2011). Extracting information about security vul-
nerabilities from web text. In Proceedings of Interna-
tional Conferences on Web Intelligence and Intelligent
Agent Technology.
Pennington, J., Socher, R., and Manning, C. D. (2014).
Glove: Global vectors for word representation. In
Empirical Methods in Natural Language Processing
(EMNLP), pages 1532–1543.
Perone, C. S., Silveira, R., and Paula, T. S. (2018). Evalu-
ation of sentence embeddings in downstream and lin-
guistic probing tasks. CoRR, abs/1806.06259.
Phandi, P., Silva, A., and Lu, W. (2018). Semeval-2018
task 8: Semantic extraction from cybersecurity reports
using natural language processing (SecureNLP). In
Proceedings of The 12th International Workshop on
Semantic Evaluation.
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
332