DISC: A Dataset for Information Security Classification

Elijah Bass, Massimiliano Albanese, Marcos Zampieri

2024

Abstract

Research in information security classification has traditionally relied on carefully curated datasets. However, the sensitive nature of the classified information contained in such documents poses challenges in terms of accessibility and reproducibility. Existing data sources often lack openly available resources for automated data collection and quality review processes, making it difficult to facilitate reproducible research. Additionally, datasets constructed from declassified information, though valuable, are not readily available to the public, and their creation methods remain poorly documented, rendering them non-reproducible. This paper addresses these challenges by introducing DISC, a dataset and framework, driven by artificial intelligence principles, for information security classification. This process aims to streamline all the stages of dataset creation, from preprocessing of raw documents to annotation. By enabling reproducibility and augmentation, this approach enhances the utility of available document collections for information security classification research and allows researchers to create new datasets in a principled way.

Download


Paper Citation


in Harvard Style

Bass E., Albanese M. and Zampieri M. (2024). DISC: A Dataset for Information Security Classification. In Proceedings of the 21st International Conference on Security and Cryptography - Volume 1: SECRYPT; ISBN 978-989-758-709-2, SciTePress, pages 175-185. DOI: 10.5220/0012763400003767


in Bibtex Style

@conference{secrypt24,
author={Elijah Bass and Massimiliano Albanese and Marcos Zampieri},
title={DISC: A Dataset for Information Security Classification},
booktitle={Proceedings of the 21st International Conference on Security and Cryptography - Volume 1: SECRYPT},
year={2024},
pages={175-185},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012763400003767},
isbn={978-989-758-709-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 21st International Conference on Security and Cryptography - Volume 1: SECRYPT
TI - DISC: A Dataset for Information Security Classification
SN - 978-989-758-709-2
AU - Bass E.
AU - Albanese M.
AU - Zampieri M.
PY - 2024
SP - 175
EP - 185
DO - 10.5220/0012763400003767
PB - SciTePress