GANMCMCRO: A Generative Adversarial Network Markov Chain Monte Carlo Random Oversampling Algorithm for Imbalance Datasets

Najmeh Abedzadeh, Matthew Jacobs

2023

Abstract

Machine learning techniques have showcased their adeptness in identifying patterns within data, yet their efficacy diminishes when dealing with imbalanced datasets—a pervasive concern, especially apparent in the realm of Intrusion Detection Systems (IDS). IDS, pivotal for monitoring malicious activities in networks or systems, requires strategic interventions to address dataset imbalances and increase machine learning model accuracy. Of note, imbalanced IDS datasets harbour covert cyber-attacks amid their substantial imbalances, intricately complicating detection for conventional machine learning methods. This study introduces novel algorithms designed to rectify imbalances within IDS datasets. The first algorithm, named Markov Chain Monte Carlo Random Oversampling (MCMCRO), seamlessly integrates Markov Chain Monte Carlo (MCMC) and Random Oversampling techniques to systematically synthesize fresh data. Additionally, MCMCRO’s novel data synthesis capability is harnessed within the Generative Adversarial Network framework to formulate the second algorithm, GANMCMCRO (Generative Adversarial Networks Markov Chain Monte Carlo Random Oversampling). This framework augments the potency of MCMCRO’s data generation function within the data generator model. An evaluation conducted on the CSE-CIC-IDS2018 Dataset substantiates the efficacy of both algorithms. MCMCRO showcases a recall of 0.66, precision of 1, an F1 score of 0.79, and an overall accuracy of 0.91. Similarly, GANMCMCRO attains a recall of 0.81, precision of 0.82, an F1 score of 0.81, and an overall accuracy of 0.88, providing compelling evidence of their prowess in mitigating the challenges posed by imbalanced datasets. This research advances the field by introducing innovative techniques that demonstrate substantial potential in enhancing the accuracy of machine learning models for imbalanced data domains, particularly IDS datasets.

Download


Paper Citation


in Harvard Style

Abedzadeh N. and Jacobs M. (2023). GANMCMCRO: A Generative Adversarial Network Markov Chain Monte Carlo Random Oversampling Algorithm for Imbalance Datasets. In Proceedings of the 19th International Conference on Web Information Systems and Technologies - Volume 1: DMMLACS; ISBN 978-989-758-672-9, SciTePress, pages 587-594. DOI: 10.5220/0012259600003584


in Bibtex Style

@conference{dmmlacs23,
author={Najmeh Abedzadeh and Matthew Jacobs},
title={GANMCMCRO: A Generative Adversarial Network Markov Chain Monte Carlo Random Oversampling Algorithm for Imbalance Datasets},
booktitle={Proceedings of the 19th International Conference on Web Information Systems and Technologies - Volume 1: DMMLACS},
year={2023},
pages={587-594},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012259600003584},
isbn={978-989-758-672-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 19th International Conference on Web Information Systems and Technologies - Volume 1: DMMLACS
TI - GANMCMCRO: A Generative Adversarial Network Markov Chain Monte Carlo Random Oversampling Algorithm for Imbalance Datasets
SN - 978-989-758-672-9
AU - Abedzadeh N.
AU - Jacobs M.
PY - 2023
SP - 587
EP - 594
DO - 10.5220/0012259600003584
PB - SciTePress