TI-NERmerger: Semi-Automated Framework for Integrating NER Datasets in Cybersecurity
Inoussa Mouiche, Sherif Saad
2024
Abstract
Recent advancements highlight the crucial role of high-quality data in developing accurate AI models, especially in threat intelligence named entity recognition (TI-NER). This technology automates the detection and classification of information from extensive cyber reports. However, the lack of scalable annotated security datasets hinders TI-NER system development. To overcome this, researchers often use data augmentation techniques such as merging multiple annotated NER datasets to improve variety and scalability. Integrating these datasets faces challenges like maintaining consistent entity annotations and entity categories and adhering to standardized tagging schemes. Manually merging datasets is time-consuming and impractical on a large scale. Our paper presents TI-NERmerger, a semi-automated framework that integrates diverse TI-NER datasets into scalable, compliant datasets aligned with cybersecurity standards like STIX-2.1. We validated the framework’s efficiency and effectiveness by comparing it with manual processes using the DNRTI and APTNER datasets, producing Augmented APTNER (2APTNER). The results demonstrate over 94% reduction in manual labour, saving several months of work in just minutes. Additionally, we applied advanced ML algorithms to validate the effectiveness of the integrated NER datasets. We also provide publicly accessible datasets and resources, supporting further research in threat intelligence and AI model developments.
DownloadPaper Citation
in Harvard Style
Mouiche I. and Saad S. (2024). TI-NERmerger: Semi-Automated Framework for Integrating NER Datasets in Cybersecurity. In Proceedings of the 21st International Conference on Security and Cryptography - Volume 1: SECRYPT; ISBN 978-989-758-709-2, SciTePress, pages 357-370. DOI: 10.5220/0012867900003767
in Bibtex Style
@conference{secrypt24,
author={Inoussa Mouiche and Sherif Saad},
title={TI-NERmerger: Semi-Automated Framework for Integrating NER Datasets in Cybersecurity},
booktitle={Proceedings of the 21st International Conference on Security and Cryptography - Volume 1: SECRYPT},
year={2024},
pages={357-370},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012867900003767},
isbn={978-989-758-709-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 21st International Conference on Security and Cryptography - Volume 1: SECRYPT
TI - TI-NERmerger: Semi-Automated Framework for Integrating NER Datasets in Cybersecurity
SN - 978-989-758-709-2
AU - Mouiche I.
AU - Saad S.
PY - 2024
SP - 357
EP - 370
DO - 10.5220/0012867900003767
PB - SciTePress