Antibiotic Resistance Gene Identification from Metagenomic Data Using Ensemble of Finetuned Large Language Models
Syama K., J. Jothi
2024
Abstract
Antibiotic resistance is a potential challenge to global health. It limits the effect of antibiotics on humans. Antibiotic resistant genes (ARG) are primarily associated with acquired resistance, where bacteria gain resistance through horizontal gene transfer or mutation. Hence, the identification of ARGs is essential for the treatment of infections and understanding the resistance mechanism. Though there are several methods for ARG identification, the majority of them are based on sequence alignment and hence fail to provide accurate results when the ARGs diverge from those in the reference ARG databases. Additionally, a significant fraction of proteins still need to be accounted for in public repositories. This work introduces a multi-task ensemble model called ARG-LLM of multiple large language models (LLMs) for ARG identification and antibiotic category prediction. We finetuned three pre-trained protein language LLMs, ProtBert, ProtAlbert, and Evolutionary Scale Modelling (ESM), with the ARG prediction data. The predictions of the finetuned models are combined using a majority vote ensembling approach to identify the ARG sequences. Then, another ProtBert model is fine-tuned for the antibiotic category prediction task. Experiments are conducted to establish the superiority of the proposed ARG-LLM using the PLM-ARGDB dataset. Results demonstrate that ARG-LLM outperforms other state-of-the-art methods with the best Recall of 96.2%, F1-score of 94.4%, and MCC of 90%.
DownloadPaper Citation
in Harvard Style
K. S. and Jothi J. (2024). Antibiotic Resistance Gene Identification from Metagenomic Data Using Ensemble of Finetuned Large Language Models. In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR; ISBN 978-989-758-716-0, SciTePress, pages 102-112. DOI: 10.5220/0012999100003838
in Bibtex Style
@conference{kdir24,
author={Syama K. and J. Jothi},
title={Antibiotic Resistance Gene Identification from Metagenomic Data Using Ensemble of Finetuned Large Language Models},
booktitle={Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR},
year={2024},
pages={102-112},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012999100003838},
isbn={978-989-758-716-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR
TI - Antibiotic Resistance Gene Identification from Metagenomic Data Using Ensemble of Finetuned Large Language Models
SN - 978-989-758-716-0
AU - K. S.
AU - Jothi J.
PY - 2024
SP - 102
EP - 112
DO - 10.5220/0012999100003838
PB - SciTePress