struggled when challenged with highly specific and
technical configurations.
When posed with the same repeated questions,
LLMs displayed a degree of randomness in their
responses. The study underscores the significance of
prompt engineering and engaging with the models to
achieve improved results. GPT-4 converges more
quickly to the desired answer and generally requires
fewer prompts. Additionally, the model demonstrated
strong capabilities in learning from documents,
though this learning is limited to a single session, as
noted by Al-Hawawreh et al. (2023).
While the models proved effective in suggesting
security configurations in compliance with
international standards, the potential for missing or
incorrect responses highlights their limitations. This
suggests that, although helpful, domain expertise is
still necessary.
6 FUTURE WORK
Future research should broaden the evaluation scope
to include a wider range of databases and operating
systems, improving the generalisability of the
findings and uncovering strengths and weaknesses
across diverse platforms. Additionally, developing
more objective evaluation criteria using advanced
metrics and automated tools can reduce subjective
bias and enhance accuracy in assessing LLM
compliance. Finally, exploring advanced prompt
engineering techniques could further refine LLM
performance, particularly in complex scenarios.
REFERENCES
Beheshti, A. (2023). Empowering Generative AI with
Knowledge Base 4.0: Towards Linking Analytical,
Cognitive, and Generative Intelligence.
doi:10.1109/icws60048.2023.00103
Sarker, I. H., Furhad, M. H., & Nowrozy, R. (2021). AI-
Driven Cybersecurity: An Overview, security
intelligence modeling and research directions. SN
Computer Science, 2(3). doi:10.1007/s42979-021-
00557-0.
Shanthi, R. R., Sasi, N. K., & Gouthaman, P. (2023). A New
Era of Cybersecurity: The Influence of Artificial
Intelligence. doi:10.1109/icnwc57852.2023.10127453.
Sharma, P., & Dash, B. (2023b). Impact of Big Data
Analytics and ChatGPT on Cybersecurity.
doi:10.1109/i3cs58314.2023.10127411.
Al-Hawawreh, M., Aljuhani, A., & Jararweh, Y. (2023).
Chatgpt for cybersecurity: practical applications,
challenges, and future directions. Cluster Computing,
26(6), 3421–3436. doi:0.1007/s10586-023-04124-5.
Oh, S., & Shon, T. (2023b). Cybersecurity Issues in
Generative AI.
doi:10.1109/platcon60102.2023.10255179.
Sobania, D., Briesch, M., Hanna, C., & Petke, J. (2023d).
An Analysis of the Automatic Bug Fixing Performance
of ChatGPT. doi:10.1109/apr59189.2023.00012.
Gupta, M., Akiri, C., Aryal, K., Parker, E., & Praharaj, L.
(2023). From ChatGPT to ThreatGPT: Impact of
Generative AI in Cybersecurity and Privacy. IEEE
Access, 11, 80218–80245. doi:10.1109/access.
2023.3300381.
Ali, T., & Kostakos, P. (2023, September 27). HuntGPT:
Integrating Machine Learning-Based Anomaly
Detection and Explainable AI with Large Language
Models (LLMs). arXiv.org. doi:10.48550/arXiv.
2309.16021
Pearce, H., Tan, B., Ahmad, B., Karri, R., & Dolan-Gavitt,
B. (2023). Examining Zero-Shot Vulnerability
Repair with Large Language Models.
doi:10.1109/sp46215.2023.10179324.
ISO/IEC 27001:2013. (n.d.). https://www.iso.org/obp/ui/#
iso:std:iso-iec:27001:ed-2:v1:en
Zeadally, S., Adi, E., Baig, Z., & Khan, I. A. (2020).
Harnessing artificial intelligence capabilities to
improve cybersecurity. IEEE Access, 8, 23817–23837.
doi:10.1109/access.2020.2968045.99
CIS Oracle MySQL Benchmarks. (n.d.). CIS.
https://www.cisecurity.org/benchmark/oracle_mysql
CIS MongoDB Benchmarks. (n.d.). CIS.
https://www.cisecurity.org/benchmark/mongodb
Ye, J., Chen, X., Xu, N., Zu, C., Shao, Z., Liu, S., Cui, Y.,
Zhou, Z., Gong, C., Shen, Y., Zhou, J., Chen, S., Gui,
T., Zhang, Q., & Huang, X. (2023). A comprehensive
capability analysis of GPT-3 and GPT-3.5 series
models. arXiv (Cornell University). doi:10.
48550/arxiv.2303.10420.