Authors:
Aditya Kuppa
1
;
Jack Nicholls
1
and
Nhien-An Le-Khac
2
Affiliations:
1
Mirror Security, Dublin, Ireland
;
2
School of Computer Science, Univeristy College Dublin, Dublin, Ireland
Keyword(s):
Generative AI, Service Providers, LLM, Security, Adversarial.
Abstract:
The emergence of large language models (LLMs) has revolutionized the field of AI, introducing a new era of generative models applied across diverse use cases. Within this evolving AI application ecosystem, numerous stakeholders, including LLM and AI application service providers, use these models to cater to user needs. A significant challenge arises due to the need for more visibility and understanding of the inner workings of these models to end-users. This lack of transparency can lead to concerns about how the models are being used, how outputs are generated, the nature of the data they are trained on, and the potential biases they may harbor. The user trust becomes a critical aspect of deploying and managing these advanced AI applications. This paper highlights the safety and integrity issues associated with service providers who may introduce covert, unsafe policies into their systems. Our study focuses on two attacks: the injection of biased content in generative AI search ser
vices, and the manipulation of LLM outputs during inference by altering attention heads. Through empirical experiments, we show that malicious service providers can covertly inject malicious content into the outputs generated by LLMs without the awareness of the end-user. This study reveals the subtle yet significant ways LLM outputs can be compromised, highlighting the importance of vigilance and advanced security measures in AI-driven applications. We demonstrate empirically that is it possible to increase the citation score of LLM output to include erroneous or unnecessary sources of information to redirect a reader to a desired source of information.
(More)