threshold value of 0.5 for all evaluation metrics. In
this case, it achieved excellent results, reaching 98%
for all metrics.
Therefore, this experiment results showed that
the pre-annotation and training sub-processes of the
IDEA-C2 approach evolved to the point of reach-
ing a very good performance. However, improve-
ments can still be made, such as: improving the
pre-annotation task with new rules and replacing the
spaCy pipeline to use other existing architectures such
as BERT Large. Additionally, new experiments using
other C2 corpora may consolidate the initial good per-
formance results.
7 CONCLUSION
This article presented the IDEA-C2, a supervised
knowledge graph generation approach supported by
a high-level metamodel with Command and Control
Relations constructs, called C2RM. This metamodel
provides high flexibility to the approach since the
domain entities categories are not prefixed. In the
experiments carried out, promising results were ob-
tained, achieving more than 70% precision and recall
in the training of the LM based on the corpus from
other published works. The approach uses distance
supervision methods to pre-annotate Command and
Control Doctrinal Text for model fine-tuning. Like-
wise, the implemented IDEA-C2-Model application
showed remarkable results in training NER and RE
models, achieving over 80% precision and 98% recall,
using as input the Glossary C2 corpus. Finally, these
experiments using the IDEA-C2-Tool proved the use-
fulness and feasibility of the proposed approach and it
is already able to generate the IDEA-C2-KG, which is
available for queries and inferences. Future work in-
cludes improving pre-annotation tasks and evaluating
entity and relation categories statistically.
ACKNOWLEDGEMENTS
This research has been funded by
FINEP/DCT/FAPEB (no. 2904/20-01.20.0272.00)
under the S2C2 project.
REFERENCES
Augenstein, I., Das, M., Riedel, S., Vikraman, L., et al.
(2017). ScienceIE - Extracting keyphrases and rela-
tions from Scientific Publications. In Proc Int Work on
Semantic Evaluation, pages 546–555, Canada. ACL.
BRASIL (2009). Gloss
´
ario de Termos e Express
˜
oes para
uso no Ex
´
ercito. Estado Maior do Ex
´
ercito.
Chaudhri, V. K., Cheng, B., Overtholtzer, A., et al. (2013).
Inquire biology: A textbook that answers questions.
AI Magazine, 34(3):55–72.
Dang, L. D., Phan, U. T., and Nguyen, N. T. (2023). GENA:
A knowledge graph for nutrition and mental health.
Journal of Biomedical Informatics, 145:104460.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2019). BERT: Pre-training of deep bidirectional
transformers for language understanding. In Proc the
Conf of the North American Chapter of the ACL: Hu-
man Language Technologies, Volume 1, pages 4171–
4186, Minnesota. ACL.
Hogan, A., Blomqvist, E., Cochez, M., et al. (2021).
Knowledge Graphs. ACM Computing Surveys, 54(4).
Kent, W. (2012). Data and Reality: A Timeless Perspective
on Perceiving and Managing Information. Technics
publications.
Lee, J., Yoon, W., Kim, S., Kim, D., et al. (2019).
BioBERT: a pre-trained biomedical language repre-
sentation model for biomedical text mining. Bioin-
formatics, 36(4):1234–1240.
Liu, P., Qian, L., Zhao, X., and Tao, B. (2023). The
construction of knowledge graphs in the aviation as-
sembly domain based on a joint knowledge extraction
model. IEEE Access, 11:26483–26495.
Luan, Y., He, L., Ostendorf, M., and Hajishirzi, H. (2018).
Multi-task identification of entities, relations, and
coreference for scientific knowledge graph construc-
tion. In Proc Conf on Empirical Methods in NLP,
pages 3219–3232, Brussels, Belgium. ACL.
Mintz, M., Bills, S., Snow, R., and Jurafsky, D. (2009). Dis-
tant supervision for relation extraction without labeled
data. In Proc of the Joint Conf of the 47th Annual
Meeting of the ACL and the Int Joint Conf on NLP of
the AFNLP, pages 1003–1011, Singapore. ACL.
Russell, S. and Norvig, P. (2010). Artificial Intelligence: A
Modern Approach. 3ed. Prentice Hall.
Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTim-
bau: Pretrained BERT Models for Brazilian Por-
tuguese. In Cerri, R. and Prati, R. C., editors, Intelli-
gent Systems, pages 403–417, Cham. Springer Int Pub.
Spala, S., Miller, N., Dernoncourt, F., and Dockhorn, C.
(2020). SemEval-2020 task 6: Definition extraction
from free text with the DEFT corpus. In Proc of the
Fourteenth Workshop on Semantic Evaluation, pages
336–345, Barcelona. ICCL.
Weston, L., Tshitoyan, V., Dagdelen, J., Kononova, O., et al.
(2019). Named entity recognition and normalization
applied to large-scale information extraction from the
materials science literature. Journal of Chemical In-
formation and Modeling, 59(9):3692–3702.
Zhao, Q., Huang, H., and Ding, H. (2021). Study on
military regulations knowledge construction based on
knowledge graph. In 2021 7th Int Conf on Big Data
and Information Analytics (BigDIA), pages 180–184.
Zhou, J., Li, X., Wang, S., and Song, X. (2022). NER-
based military simulation scenario development pro-
cess. Journal of Defense Modeling and Simulation.
ICEIS 2024 - 26th International Conference on Enterprise Information Systems
288