and anomalous traffic. International Journal of
Communication Systems, 30(1).
https://doi.org/10.1002/dac.2881
Asadi, A. N. (2016). An approach for detecting anomalies
by assessing the inter-arrival time of UDP packets and
flows using Benford’s law. Conference Proceedings of
2015 2nd International Conference on Knowledge-
Based Engineering and Innovation, KBEI 2015, 2(6),
257–262. https://doi.org/10.1109/KBEI.2015.7436057
Bonaccorso, G. (2020). Mastering Machine Learning
Algorithms: Expert techniques for implementing
popular machine learning algorithms, fine-tuning your
models, and understanding how they work. In OReilly
Media. OReilly Media.
Bowles, J. K. F., Silvina, A., Bin, E., & Vinov, M. (2020).
On Defining Rules for Cancer Data Fabrication. 168–
176. https://www.ndc.scot.nhs.uk/National-Datasets/
Brownlee, J. (2020). Data preparation for machine
learning: data cleaning, feature selection, and data
transforms in Python. Machine Learning Mastery.
https://books.google.co.za/books?hl=en&lr=&id=uAP
uDwAAQBAJ&oi=fnd&pg=PP1&dq=Data+preparati
on+for+machine+learning:+data+cleaning,+feature+se
lection,+and+data+transforms+in+Python&ots=Cl8Gu
chLpT&sig=MmuT6WgKuVEHvbXGQj91vPH2M_k
Cai, L., & Zhu, Y. (2015). The challenges of data quality
and data quality assessment in the big data era. Data
Science Journal, 14. https://doi.org/10.5334/dsj-2015-
002
Dankar, F. K., & Ibrahim, M. (2021). Fake it till you make
it: Guidelines for effective synthetic data generation.
Applied Sciences (Switzerland), 11(5).
https://doi.org/10.3390/app11052158
Figueira, A., & Vaz, B. (2022). Survey on Synthetic Data
Generation, Evaluation Methods and GANs. MDPI
Mathematics, 10(15).
https://doi.org/10.3390/math10152733
Financial Times. (2023, July 21). AI systems create
synthetic data to train next generation models.
https://ft.pressreader.com/v99c/20230721/2817326839
68639
García, S., Luengo, J., & Herrera, F. (2015). Feature
selection. Intelligent Systems Reference Library, 72(6),
163–193. https://doi.org/10.1007/978-3-319-10247-
4_7
Kaushik, R., & Dave, M. (2021). Malware Detection
System Using Ensemble Learning: Tested Using
Synthetic Data. Data Engineering and Communication
Technology: Proceedings of ICDECT 2021, 63, 153–
164. https://doi.org/10.1007/978-981-16-0081-4_16
Khalid, S., Khalil, T., & Nasreen, S. (2014). A survey of
feature selection and feature extraction techniques in
machine learning. Proceedings of 2014 Science and
Information Conference, SAI 2014, 1(October), 372–
378. https://doi.org/10.1109/SAI.2014.6918213
Khodjaeva, Y., & Zincir-Heywood, N. (2021, August 17).
Network Flow Entropy for Identifying Malicious
Behaviours in DNS Tunnels. ACM International
Conference Proceeding Series.
https://doi.org/10.1145/3465481.3470089
Kilincer, I. F., Ertam, F., & Sengur, A. (2021). Machine
learning methods for cyber security intrusion detection:
Datasets and comparative study. Computer Networks,
188. https://doi.org/10.1016/j.comnet.2021.107840
Kim, S., Hwang, C., & Lee, T. (2020). Anomaly based
unknown intrusion detection in endpoint environments.
Electronics (Switzerland), 9(6), 1–21.
https://doi.org/10.3390/electronics9061022
Kumar, V., & Sinha, D. (2023). Synthetic attack data
generation model applying generative adversarial
network for intrusion detection. Computers and
Security, 125.
https://doi.org/10.1016/j.cose.2022.103054
Kurniabudi, Stiawan, D., Darmawijoyo, Bin Idris, M. Y.
Bin, Bamhdi, A. M., & Budiarto, R. (2020). CICIDS-
2017 Dataset Feature Analysis with Information Gain
for Anomaly Detection. IEEE Access, 8, 132911–
132921.
https://doi.org/10.1109/ACCESS.2020.3009843
Lu, Y., Shen, M., Wang, H., & Wei, W. (2023). Machine
Learning for Synthetic Data Generation: A Review.
ArXiv Preprint ArXiv:2302.04062.
http://arxiv.org/abs/2302.04062
Mbona, I., & Eloff, J. H. P. (2022a). Detecting Zero-Day
Intrusion Attacks Using Semi-Supervised Machine
Learning Approaches. IEEE Access, 10(July), 69822–
69838.
https://doi.org/10.1109/ACCESS.2022.3187116
Mbona, I., & Eloff, J. H. P. (2022b). Feature selection using
Benford’s law to support detection of malicious social
media bots. Information Sciences, 582, 369–381.
https://doi.org/10.1016/j.ins.2021.09.038
Montazerishatoori, M., Davidson, L., Kaur, G., & Habibi
Lashkari, A. (2020). Detection of DoH Tunnels using
Time-series Classification of Encrypted Traffic. 2020
IEEE Intl Conf on Dependable, Autonomic and Secure
Computing, Intl Conf on Pervasive Intelligence and
Computing, Intl Conf on Cloud and Big Data
Computing, Intl Conf on Cyber Science and Technology
Congress (DASC/PiCom/CBDCom/CyberSciTech).,
63–70. https://doi.org/10.1109/DASC-PICom-
CBDCom-CyberSciTech49142.2020.00026
Moustafa, N., & Slay, J. (2017). The significant features of
the UNSW-NB15 and the KDD99 data sets for
Network Intrusion Detection Systems. Proceedings -
2015 4th International Workshop on Building Analysis
Datasets and Gathering Experience Returns for
Security, BADGERS 2015, 1(November), 25–31.
https://doi.org/10.1109/BADGERS.2015.14
Mukkamala, S., & Sung, A. H. (2003). Identifying
Significant Features For Network Forensic Analysis
Using Artificial Intelligent Techniques. International
Journal of Digital Evidence, 1(4), 1–17.
NetFort Technologies Limited. (2014). Flow Analysis
Versus Packet Analysis . What Should You Choose ?
White Paper, 6.
Pu, G., Wang, L., Shen, J., & Dong, F. (2021). A hybrid
unsupervised clustering-based anomaly detection
method. Tsinghua Science and Technology, 26(2), 146–
153. https://doi.org/10.26599/TST.2019.9010051