Leveraging Large Language Models and RNNs for Accurate Ontology-Based Text Annotation

Pratik Devkota, Somya Mohanty, Prashanti Manda

2025

Abstract

This study investigates the performance of large language models (LLMs) and RNN-based architectures for automated ontology annotation, focusing on Gene Ontology (GO) concepts. Using the Colorado Richly Annotated Full-Text (CRAFT) dataset, we evaluated models across metrics such as F1 score and semantic similarity to measure their precision and understanding of ontological relationships. The Boosted Bi-GRU, a lightweight model with only 38M parameters, achieved the highest performance, with an F1 score of 0.850 and semantic similarity of 0.900, demonstrating exceptional accuracy and computational efficiency. In comparison, LLMs like Phi (1.5B) performed competitively, balancing moderate GPU usage with strong annotation accuracy. Larger models, including Mistral, Meditron, and Llama 2 (7B), delivered comparable results but required significantly higher computational resources for fine-tuning and inference, with GPU usage exceeding 125 GB during fine-tuning. Fine-tuned ChatGPT 3.5 Turbo underperformed relative to other models, while ChatGPT 4 showed limited applicability for this domain-specific task. To enhance model performance, techniques such as prompt tuning and full fine-tuning were employed, incorporating hierarchical ontology information and domain-specific prompts. These findings highlight the trade-offs between model size, resource efficiency, and accuracy in specialized tasks. This work provides insights into optimizing ontology annotation workflows and advancing domain-specific natural language processing in biomedical research.

Download


Paper Citation


in Harvard Style

Devkota P., Mohanty S. and Manda P. (2025). Leveraging Large Language Models and RNNs for Accurate Ontology-Based Text Annotation. In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS; ISBN 978-989-758-731-3, SciTePress, pages 489-494. DOI: 10.5220/0013267100003911


in Bibtex Style

@conference{bioinformatics25,
author={Pratik Devkota and Somya Mohanty and Prashanti Manda},
title={Leveraging Large Language Models and RNNs for Accurate Ontology-Based Text Annotation},
booktitle={Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS},
year={2025},
pages={489-494},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013267100003911},
isbn={978-989-758-731-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS
TI - Leveraging Large Language Models and RNNs for Accurate Ontology-Based Text Annotation
SN - 978-989-758-731-3
AU - Devkota P.
AU - Mohanty S.
AU - Manda P.
PY - 2025
SP - 489
EP - 494
DO - 10.5220/0013267100003911
PB - SciTePress