Safeguarding Ethical AI: Detecting Potentially Sensitive Data Re-Identification and Generation of Misleading or Abusive Content from Quantized Large Language Models

Navya Martin Kollapally; James Geller

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Safeguarding Ethical AI: Detecting Potentially Sensitive Data Re-Identification and Generation of Misleading or Abusive Content from Quantized Large Language Models

Topics: Application of Health Informatics in Clinical Cases; Big Data in Healthcare; Electronic Health Records and Standards; Interoperability and Data Integration; Large Language Models in Medicine

In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: , 554-561, 2024 , Rome, Italy

Authors: Navya Martin Kollapally ¹ and James Geller ²

Affiliations: ¹ Department of Computer Science, New Jersey Institute of Technology, Newark, U.S.A. ; ² Department of Data Science, New Jersey Institute of Technology, Newark, U.S.A.

Keyword(s): Natural Language Processing, Redaction, Re-identification of EHR Entries, Large Language Models, Privacy-Preserving Machine Learning, HIPAA Act, Social Determinants of Health.

Abstract: Research on privacy-preserving Machine Learning (ML) is essential to prevent the re-identification of health data ensuring the confidentiality and security of sensitive patient information. In this era of unprecedented usage of large language models (LLMs), LLMs carry inherent risks when applied to sensitive data, especially as LLMs are trained on trillions of words from the internet, without a global standard for data selection. The lack of standardization in training LLMs poses a significant risk in the field of health informatics, potentially resulting in the inadvertent release of sensitive information, despite the availability of context-aware redaction of sensitive information. The research goal of this paper is to determine whether sensitive information could be re-identified from electronic health records during Natural Language Processing (NLP) tasks such as text classification without using any dedicated re-identification techniques. We performed zero and 8-shot learning wi th the quantized LLM models FLAN, Llama2, Mistral, and Vicuna for classifying social context data extracted from MIMIC-III. In this text classification task, our focus was on detecting potential sensitive data re-identification and the generation of misleading or abusive content during the fine-tuning and prompting stages of the process, along with evaluating the performance of the classification. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 52.14.56.234

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Martin Kollapally, N. and Geller, J. (2024). Safeguarding Ethical AI: Detecting Potentially Sensitive Data Re-Identification and Generation of Misleading or Abusive Content from Quantized Large Language Models. In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF; ISBN 978-989-758-688-0; ISSN 2184-4305, SciTePress, pages 554-561. DOI: 10.5220/0012411900003657

@conference{healthinf24,
author={Navya {Martin Kollapally} and James Geller},
title={Safeguarding Ethical AI: Detecting Potentially Sensitive Data Re-Identification and Generation of Misleading or Abusive Content from Quantized Large Language Models},
booktitle={Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF},
year={2024},
pages={554-561},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012411900003657},
isbn={978-989-758-688-0},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF
TI - Safeguarding Ethical AI: Detecting Potentially Sensitive Data Re-Identification and Generation of Misleading or Abusive Content from Quantized Large Language Models
SN - 978-989-758-688-0
IS - 2184-4305
AU - Martin Kollapally, N.
AU - Geller, J.
PY - 2024
SP - 554
EP - 561
DO - 10.5220/0012411900003657
PB - SciTePress