SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature
Vinícius Di Oliveira, Vinícius Di Oliveira, Yuri Bezerra, Li Weigang, Pedro Brom, Pedro Brom, Victor Celestino
2024
Abstract
Natural language processing (NLP) has seen significant advancements with the advent of large language models (LLMs). However, substantial improvements are still needed for languages other than English, especially for specific domains like the applications of Mercosur Common Nomenclature (NCM), a Brazilian Harmonized System (HS). To address this gap, this study uses TeenyTineLLaMA, a foundational Portuguese LLM, as an LLM source to implement the NCM application processing. Additionally, a simplified Retrieval-Augmented Fine-Tuning (RAFT) technique, termed SLIM-RAFT, is proposed for task-specific fine-tuning of LLMs. This approach retains the chain-of-thought (CoT) methodology for prompt development in a more concise and streamlined manner, utilizing brief and focused documents for training. The proposed model demonstrates an efficient and cost-effective alternative for fine-tuning smaller LLMs, significantly outperforming TeenyTineLLaMA and ChatGPT-4 in the same task. Although the research focuses on NCM applications, the methodology can be easily adapted for HS applications worldwide.
DownloadPaper Citation
in Harvard Style
Di Oliveira V., Bezerra Y., Weigang L., Brom P. and Celestino V. (2024). SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature. In Proceedings of the 20th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST; ISBN 978-989-758-718-4, SciTePress, pages 234-241. DOI: 10.5220/0012943400003825
in Bibtex Style
@conference{webist24,
author={Vinícius Di Oliveira and Yuri Bezerra and Li Weigang and Pedro Brom and Victor Celestino},
title={SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature},
booktitle={Proceedings of the 20th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST},
year={2024},
pages={234-241},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012943400003825},
isbn={978-989-758-718-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 20th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST
TI - SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature
SN - 978-989-758-718-4
AU - Di Oliveira V.
AU - Bezerra Y.
AU - Weigang L.
AU - Brom P.
AU - Celestino V.
PY - 2024
SP - 234
EP - 241
DO - 10.5220/0012943400003825
PB - SciTePress