Data Sets for Cyber Security Machine Learning Models: A Methodological Approach

Innocent Mbona, Jan Eloff

2024

Abstract

Discovering Cyber security threats is becoming increasingly complex, if not impossible! Recent advances in artificial intelligence (AI) can be leveraged for the intelligent discovery of Cyber security threats. AI and machine learning (ML) models depend on the availability of relevant data. ML based Cyber security solutions should be trained and tested on real-world attack data so that solutions produce trusted results. The problem is that most organisations do not have access to useable, relevant, and reliable real-world data. This problem is exacerbated when training ML models used to discover novel attacks, such as zero-day attacks. Furthermore, the availability of Cyber security data sets is negatively affected by privacy laws and regulations. The solution proposed in this paper is a methodological approach that guides organisations in developing Cyber security ML solutions, called CySecML. CySecML provides guidance for obtaining or generating synthetic data, checking data quality, and identifying features that optimise ML models. Network Intrusion Detection Systems (NIDS) were employed to illustrate the convergence of Cyber security and AI concepts.

Download


Paper Citation


in Harvard Style

Mbona I. and Eloff J. (2024). Data Sets for Cyber Security Machine Learning Models: A Methodological Approach. In Proceedings of the 9th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS; ISBN 978-989-758-699-6, SciTePress, pages 149-156. DOI: 10.5220/0012598400003705


in Bibtex Style

@conference{iotbds24,
author={Innocent Mbona and Jan Eloff},
title={Data Sets for Cyber Security Machine Learning Models: A Methodological Approach},
booktitle={Proceedings of the 9th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS},
year={2024},
pages={149-156},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012598400003705},
isbn={978-989-758-699-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 9th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS
TI - Data Sets for Cyber Security Machine Learning Models: A Methodological Approach
SN - 978-989-758-699-6
AU - Mbona I.
AU - Eloff J.
PY - 2024
SP - 149
EP - 156
DO - 10.5220/0012598400003705
PB - SciTePress