SLIM-RAFT: A Novel Fine-Tuning Approach to Improve
Cross-Linguistic Performance for Mercosur Common Nomenclature
Vin
´
ıcius Di Oliveira
1,2 a
, Yuri Fac¸anha Bezerra
1 b
, Li Weigang
1 c
, Pedro Carvalho Brom
1,3 d
and Victor Rafael R. Celestino
4 e
1
TransLab, University of Brasilia, Brasilia, Federal District, Brazil
2
Secretary of Economy, Brasilia, Federal District, Brazil
3
Federal Institute of Brasilia, Brasilia, Federal District, Brazil
4
LAMFO, Department of Administration, University of Brasilia, Brasilia, Federal District, Brazil
Keywords:
Fine-Tuning, HS, Large Language Model, NCM, Portuguese Language, Retrieval Augmented Generation.
Abstract:
Natural language processing (NLP) has seen significant advancements with the advent of large language
models (LLMs). However, substantial improvements are still needed for languages other than English, es-
pecially for specific domains like the applications of Mercosur Common Nomenclature (NCM), a Brazilian
Harmonized System (HS). To address this gap, this study uses TeenyTineLLaMA, a foundational Portuguese
LLM, as an LLM source to implement the NCM application processing. Additionally, a simplified Retrieval-
Augmented Fine-Tuning (RAFT) technique, termed SLIM-RAFT, is proposed for task-specific fine-tuning of
LLMs. This approach retains the chain-of-thought (CoT) methodology for prompt development in a more
concise and streamlined manner, utilizing brief and focused documents for training. The proposed model
demonstrates an efficient and cost-effective alternative for fine-tuning smaller LLMs, significantly outperform-
ing TeenyTineLLaMA and ChatGPT-4 in the same task. Although the research focuses on NCM applications,
the methodology can be easily adapted for HS applications worldwide.
1 INTRODUCTION
Generative Artificial Intelligence (GenAI) has accel-
erated AI development, particularly through large lan-
guage models (LLMs) like ChatGPT, which support
multilingual and multimodal processing (Schulhoff
et al., 2024; Radosavovic et al., 2024). However, us-
ing these models often requires prompt engineering
skills, while open-source LLMs like LLaMA 3 offer
flexibility through local fine-tuning. Yet, non-English
users face limitations due to LLaMAs English-centric
training data (90%) (Souza et al., 2020).
Although LLMs can process related languages,
these capabilities are limited for technical tasks, espe-
cially when handling enterprise-specific privacy data.
The small pre-trained Portuguese corpus in models
like LLaMA highlights these challenges. Smaller
a
https://orcid.org/0000-0002-1295-5221
b
https://orcid.org/0009-0001-8294-7163
c
https://orcid.org/0000-0003-1826-1850
d
https://orcid.org/0000-0002-1288-7695
e
https://orcid.org/0000-0001-5913-2997
models, such as TeenyTinyLLaMA (Corr
ˆ
ea et al.,
2024), offer an alternative, fine-tuned approach for
well-defined tasks, as explored in this work.
Retrieval-Augmented Generation (RAG) miti-
gates LLM challenges such as hallucinations and out-
dated knowledge by integrating external databases,
enhancing content accuracy for knowledge-intensive
tasks (Lewis et al., 2020; Gao et al., 2023). Retrieval-
Augmented Fine-Tuning (RAFT) goes further, focus-
ing on relevant documents to improve model reason-
ing and performance by ignoring distractors and cit-
ing necessary sequences (Zhang et al., 2024; War-
nakulasuriya and Hapuarachchi, 2024). However,
generating chain-of-thought training data is costly
and complex.
This work focuses on the MERCOSUR Common
Nomenclature (NCM), essential for regional trade.
NCM classification involves more than translation; it
demands advanced Portuguese language processing
due to product classification intricacies. Initial experi-
ments show that LLMs like ChatGPT or TeenyTinyL-
LaMA are insufficient for this task. While neural net-
234
Di Oliveira, V., Bezerra, Y., Weigang, L., Brom, P. and Celestino, V.
SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature.
DOI: 10.5220/0012943400003825
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 20th International Conference on Web Information Systems and Technologies (WEBIST 2024), pages 234-241
ISBN: 978-989-758-718-4; ISSN: 2184-3252
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
Figure 1: The SLIM-RAFT diagram.
works handle simple classification (Du et al., 2021),
this study aims to extract deeper semantic knowledge
within the NCM system for enhanced commerce and
taxation applications.
The “Simplified Logical Intelligent Model”
(SLIM-RAFT) is introduced to address NCM chal-
lenges efficiently. It applies the RAFT methodol-
ogy in a streamlined form, capable of handling multi-
classification tasks while minimizing training com-
plexity.
A distinguishing feature of the SLIM-RAFT
model is its significantly smaller LLM source use
than traditional models. Specifically, the TeenyTineL-
LaMA model, comprising only 160 million param-
eters, was utilised in constructing SLIM-RAFT as
the fine-tuned LLM. Our model’s results substantially
outperformed those of ChatGPT 4 in the proposed
challenge: ChatGPT 4 scored 4.5/10, whereas SLIM-
RAFT achieved an impressive score of 8.67/10. The
SLIM-RAFT scheme can be seen in Figure 1.
This paper is organized as follows: Section 2 re-
views the works directly related to the concepts be-
hind the construction of SLIM-RAFT. Section 3 out-
lines the structure and functionality of the HS and
NCM codes. Section 4 details the construction of the
SLIM-RAFT model. Section 5 presents the results
from comparative evaluations of the models and dis-
cusses the findings. Finally, Section 6 concludes the
paper and suggests directions for future work related
to the proposed model.
2 RELATED WORKS
This section presents related work in two areas:
1) research on the implementation of LLMs with
Portuguese as the primary language, and 2) stud-
ies on Retrieval-Augmented Generation (RAG) and
Retrieval-Augmented Fine-Tuning (RAFT).
2.1 Portuguese LLM’s
The introduction of LLaMA (Touvron et al., 2023a) as
an open foundational model marked a key step in lan-
guage processing. With models ranging from 7B to
65B parameters, LLaMA proved that state-of-the-art
models could be trained on public datasets. Despite
having fewer parameters, the LLaMA-13B model
outperformed GPT-3 on several benchmarks. Later
versions, LLaMA 2 and 3 (Touvron et al., 2023b;
Meta, 2024), further refined these models, particu-
larly for chat-based applications, establishing a solid
foundation for future NLP advancements.
Despite available data for training Portuguese-
language models, native speakers still notice limita-
tions in models primarily trained on English data.
Growing interest in creating large-scale models for
Portuguese has driven recent advancements.
In European Portuguese (PT-PT), the initiative
known as Gl
´
orIA (Lopes et al., 2024) merits particular
attention. This project involves a trained decoder lan-
guage model meticulously constructed from a corpus
comprising 35 billion tokens from various sources.
For Brazilian Portuguese (PT-BR), Sabi
´
a (Pires
et al., 2023) was the first relevant LLM encountered.
This initiative underscores the development of robust
and scalable language models for the Portuguese lan-
guage. Leveraging advanced machine learning archi-
tectures, these models have been instrumental in ad-
vancing natural language processing applications in
Brazilian Portuguese.
The Cabrita model (Larcher et al., 2023) was
launched as a low-cost alternative for training LLMs.
The authors posited that their methodology could be
SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature
235
extended to any transformer-like architecture. To sub-
stantiate their hypothesis, they undertook continuous
pre-training exclusively on Portuguese text using a
3-billion-parameter model known as OpenLLaMA.
This effort culminated in the creation of openCabrita
3B. Remarkably, openCabrita 3B incorporates a novel
tokenizer, significantly reducing the number of to-
kens necessary to represent the text. Subsequently,
in a comparable approach, a new study introduced
a model predicated on LLaMA 2, designed specifi-
cally for handling prompts in Portuguese. This model,
named Bode (Garcia et al., 2024), is available in two
versions: one with 7B and 13B parameters. Both
models used the LoRa (Hu et al., 2021) fine-tuning
method over an open-source LLM. This technique
preserves the original parameters intact while intro-
ducing a new terminal layer atop the model, which
is subsequently trained to achieve the desired fine-
tuning outcome.
A noteworthy recent publication entitled
“TeenyTinyLlama: Open-Source Tiny Language
Models Trained in Brazilian Portuguese” (Corr
ˆ
ea
et al., 2024) offers a valuable perspective on develop-
ing compact, open-source language models tailored
to Brazilian Portuguese. Despite their reduced
scale, these models hold significant potential for
democratizing access to natural language processing
technology, particularly within resource-limited
communities.
Collectively, these works signify substantial ad-
vancements in implementing language models for the
Portuguese language. They underscore the diversity
of methodologies and the abundance of resources that
bolster research and applications in NLP and related
fields. These ongoing initiatives are poised to con-
tinue influencing the future of language technology
for Portuguese speakers globally.
2.2 Retrieval-Augmented Approach
Retrieval-Augmented Generation (RAG) (Lewis
et al., 2020) enhances content generation by inte-
grating external knowledge, improving coherence,
factual accuracy, and utility. This approach benefits
tasks like question-answering, summarization, and
dialogue systems by retrieving relevant information
for more precise outputs.
Retrieval-Augmented Fine-Tuning (RAFT)
(Zhang et al., 2024) combines RAG with fine-tuning,
allowing models to gain domain-specific knowledge
and retrieve essential external contexts. RAFT em-
ploys chain-of-thought prompting, enabling models
to provide more explainable, structured reasoning in
their responses.
RAG and RAFT were designed to confront the
complexity of tailoring LLMs to specialized domains.
Within these realms, the emphasis pivots from general
knowledge reasoning to optimizing accuracy vis-
`
a-vis
a meticulously defined array of domain-specific doc-
uments.
2.3 NCM Data Set
The ELEVEN data set, ELEctronic inVoicEs in the
Portuguese language (Di Oliveira et al., 2022), was
meticulously curated to furnish researchers and en-
trepreneurs with a repository of product descriptions
categorized under the Mercosur Common Nomencla-
ture (NCM). This extensive database comprises over
a million meticulously labelled records, each scru-
tinized by taxation experts. These descriptions are
short texts, limited to 120 characters, and extracted
from authentic electronic invoices documenting pur-
chase and sales transactions.
Labelled datasets are rare, yet they provide indis-
pensable resources for applications reliant on super-
vised learning (Van Engelen and Hoos, 2020). The
ELEVEN dataset has served as a cornerstone for sev-
eral noteworthy academic endeavours: 1) the devel-
opment of a CNN-based system for classifying goods
(Kieckbusch et al., 2021); 2) the creation of data vi-
sualization tools aimed at identifying outliers and de-
tecting fraud (Marinho et al., 2022); and 3) the estab-
lishment of a framework utilizing automatic encoders
to cluster short-text data extracted from electronic in-
voices, thereby enhancing anomaly detection within
numeric fields (Schulte et al., 2022).
3 HS AND NCM CODES
In international trade, customs brokers, exporters, and
importers must accurately classify goods under the
Harmonized System (HS), which forms the basis of
the Mercosur Common Nomenclature (NCM) code
(Valenc¸a et al., 2023). The HS underpins customs tar-
iffs and trade statistics in over 200 countries, while
also aiding in the monitoring of controlled goods, es-
tablishing rules of origin, and supporting trade nego-
tiations (WCO, 2024).
The NCM, used by all MERCOSUR mem-
bers—Argentina, Brazil, Paraguay, Uruguay, and
Venezuela—is legally required for all commercial
transactions in Brazil, including on electronic in-
voices (MERCOSUR, 2024b; Brazil, 2016). Devel-
oped by the World Customs Organization, the HS
and its hierarchical structure, which NCM mirrors,
assigns numerical codes to products for streamlined
WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies
236
identification and classification in customs processes
(WCO, 2018; MERCOSUR, 2024a). The HS struc-
ture is shown in Table 1
Table 1: Structure of the HS Codes (WCO, 2018).
2 digit (01-97) Chapter
4 digit (01.01 - 97.06) Heading
6 digit (0101.21 - 9706.00) Subheading
Table 2 shows an HS list cutout, showing the dis-
tinction in classification codes for fresh and dried ap-
ples. While this differentiation may appear trivial, in
an import operation where each type of apple is sub-
ject to different tax treatments, an error in code desig-
nation can result in significant repercussions. On one
hand, the seller faces the risk of tax penalties, while
on the other, customs authorities may encounter a loss
of revenue.
Table 2: Headings 08.08 and 08.13 in the HS (WCO, 2018).
08.08 Apples, pears and quinces, fresh.
0808.10 - Apples
0808.30 - Pears
0808.40 - Quinces
... ...
08.13
- Fruit, dried, other than that of
headings 08.01 to 08.06; mixtures
of nuts or dried fruits of this Chapter.
0813.10 - Apricots
0813.20 - Prunes
0813.30 - Apples
0808.40 - Other Fruit
0808.50
- Mixtures of nuts or dried fruits of
this Chapter
Accurate goods classification is crucial, affecting
taxation, regulatory compliance, and eligibility for in-
ternational trade benefits. Misclassification can result
in penalties, customs delays, and financial losses (Ya-
dav, 2023).
The MERCOSUR Common Nomenclature
(NCM) extends the HS system by adding two digits,
allowing for a more precise classification of products
based on specific characteristics, as shown in Table 3.
Table 3: Structure of the NCM Codes (MERCOSUR,
2024a).
2 digit (01-97) Chapter
4 digit (01.01 - 97.06) Heading
6 digit (0101.21 - 9706.00) Subheading
7 digit (0101.21.1 - 9706.00.9) Item
8 digit (0101.21.10 - 9706.00.90) Sub-item
The task of classifying product descriptions within
the HS or NCM system has been explored in aca-
demic literature (Du et al., 2021; Kieckbusch et al.,
2021; Schulte et al., 2022). Techniques such as neu-
ral networks with hierarchical learning and convolu-
tional neural networks (CNNs) have been effectively
employed to address this task. Nevertheless, specific
challenges associated with this domain remain insuf-
ficiently addressed.
One notable challenge is the variability in product
descriptions: the same product can be described in
multiple ways, and context-dependent synonyms and
abbreviations further complicate classification. This
complexity makes using LLMs a compelling alter-
native for interpreting product descriptions. Beyond
simple classification, LLMs offer the potential to ex-
tract deeper knowledge by identifying relationships
between products. Table 4 shows some abbreviations
that can be easily found.
Table 4: Abbreviation Examples.
English
Coc. 2L = Coca-Cola 2 Liters
P. W. Rice = Parboiled White Rice
Portuguese
Fr. Desc.
= Fralda descart
´
avel
(Disposable diaper)
T. Pap. FDupla
= Toalha de Papel Folha Dupla
(Double Ply Paper Towel)
French
EDT = Eau de Toilette
EDP = Eau de Parfum
Context is crucial in language processing, as it
can help resolve ambiguities that often arise when
considering abbreviations in isolation. This is where
TRANSFORMER-based algorithms shine, as their
ability to understand context is key (Vaswani et al.,
2017). For example, the abbreviation “fr. in Por-
tuguese could refer to “fralda” (diaper), as shown in
Table 4, or it could mean “fruta” (fruit). However,
when the term “desc” (meaning “descart
´
avel” or dis-
posable) follows, the context effectively resolves the
ambiguity between “fralda” and “fruit”.
Developing LLMs capable of handling NCM and
HS codes is valuable in fields like compliance and tax
inspection. Import and export companies must ensure
accurate product descriptions and classifications un-
der HS and NCM codes to avoid financial penalties or
legal trading restrictions. Customs authorities closely
monitor these codes as they define the tax treatment
of products. Misclassification can be seen as tax eva-
sion, leading to fines or sanctions.
Therefore, using AI models can improve control-
ling and correcting the issuance of invoices and re-
SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature
237
lated documents (Kieckbusch et al., 2021; Marinho
et al., 2022).
4 SLIM-RAFT MODEL
The SLIM-RAFT model simplifies RAFT logically
and intelligently. Just as RAFT maintains the RAG
in its designed form, SLIM-RAFT also maintains the
RAG mechanism in its structure. See Figure 1.
The preceding sections have elucidated that con-
structing the training base in the original RAFT model
is an expensive endeavour, frequently necessitating
the deployment of another powerful LLM. This sub-
stantial cost is predominantly attributable to two fea-
tures of RAFT: the chain-of-thought reasoning and
the inclusion of irrelevant documents. While learning
to disregard irrelevant documents is valid and logical
within RAFT’s objectives, it is not a requisite for all
applications. This insight prompted the exclusion of
this feature in the development of SLIM-RAFT.
SLIM-RAFT retains the chain-of-thought concept
in its fine-tuning process, albeit simplified. Instead of
using lengthy texts or entire documents as input, the
approach employs logical arguments derived from the
knowledge base. For instance: 1) element “a” belongs
to set A; 2) set A is contained within set B; 3) conse-
quently, “a” belongs to set B. The next subsection will
explain how it was done.
4.1 FT Database and Prompting
A theoretical example of a list of arguments for con-
structing the simplified chain of thought:
Doc. 1: a A
Doc.2: A B
Doc. 3: Consequently, a B
This was an application of the training base built for
fine-tuning within the idea of the simplified chain-
of-thought. See below for a generic example of a
prompt:
[{ "content":
[context 1 ]....\n
[context 2 ]...\n
[context 3 ]...\n
[...] ... \n
[context n]... \n
\n
Answer the following question
using information from the
previous context: question",
"role": "user"},
{"content": "response + reasoning
based on context",
"role": "assistant"}]
An expert in the NCM code developed a series of
question-and-answer sets, complete with their respec-
tive arguments. Utilising the open version of Chat-
GPT 3.5, numerous variations of these questions were
generated. Subsequently, a Python script was em-
ployed to create thousands of pairs [question + argu-
ment, answer + argument], wherein information de-
rived from the NCM database populated the generic
questions.
Then, for building a data training base in SLIM-
RAFT mode, there are three steps:
1. A domain expert creates a small question-and-
answer set, e.g. “What is the category of the prod-
uct ’product’?”;
2. Construct question-and-answer set variations (an
LLM could be used), e.g. “Could you specify
the category to which the product ’product’ be-
longs?”;
3. Populate de question-and-answer set mask. e.g.
“What is the category of the product ’fresh ap-
ple package’?”, “Could you specify the category
to which the product ’fresh apple package’ be-
longs?”
The total number of records in the data training base
will be:
N = q × v × n (1)
Where q is the number of question-and-answer
created by the domain expert, v is the number of vari-
ations from each question-and-answer unit, and n is
the total number of samples from the NCM database.
As delineated earlier, a notable distinction be-
tween SLIM-RAFT and the original RAFT lies in the
simplified approach to constructing the fine-tuning
base while preserving the chain-of-thought method-
ology.
4.2 Source LLM and Fine-Tuning
The LLM source chosen for this work was
TeenyTinyLLaMA (TTL), available in two sizes: 460
million and 160 million parameters. Two primary
characteristics of TTL guided this selection: its com-
pact size and the training on a corpus in Brazilian Por-
tuguese.
While other source models trained in Brazilian
Portuguese exist, as discussed in Section 2, their sub-
stantial size can make fine-tuning costly, even when
employing optimised techniques such as LoRa (Hu
et al., 2021). In contrast, the compact size of TTL
WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies
238
made our fine-tuning process more cost-effective,
demonstrating its practicality and potential for wider
application.
The Fine-tuning process adjusts all model param-
eters. The reduced size of TTL facilitated this task.
The codes employed were adapted from those pro-
vided by the authors of the original TTL paper (avail-
able on GitHub
1
) with minor modifications.
All codes developed for SLIM-RAFT are accessi-
ble on SLIM-RAFT’s GitHub repository. Both TTL
models, 160 million and 460 million parameters, were
fine-tuned to create SLIM-RAFT. The 160 million pa-
rameter version was used in SLIM-RAFT, while the
460 million parameter version was used for compara-
tive analysis during the final model evaluation.
The SLIM-RAFT GitHub repository
2
is a valu-
able resource that provides the codes used in this
study. This open access not only allows the commu-
nity to reproduce and assess this experiment but also
encourages further collaboration and potential contri-
butions to natural language processing.
5 RESULTS AND DISCUSSION
The results were evaluated through a comparative
analysis of the responses delivered by the tested mod-
els. Three other models were chosen for this compar-
ison: TeenyTinyLLaMA 460m, TeenyTinyLLaMA
460m + NCM fine-tuning, and ChatGPT 4.0. In the
end, four models were tested and evaluated by Chat-
GPT 4.0:
Model 1: TeenyTinyLlama with tiny460M with-
out fine-tuning on the dataset, defined as TTL.
Model 2: ChatGPT 4.0, defined as GPT.
Model 3: TeenyTinyLlama with fine-tuning on the
NCM dataset, defined as NCM-TTL.
Model 4: TinyLlama with fine-tuning on the
NCM dataset and using SLIM RAFT, defined as
SLIM-RAFT.
The model’s responses were then submitted to
ChatGPT-4.0, which compared the generated outputs
with the desired outputs. To ensure impartiality in the
evaluation, it is important to note that ChatGPT-4.0
was not informed of which model each response was
associated with.
1
https://github.com/Nkluge-correa/TeenyTinyLlama
2
https://github.com/yurifacanha/ncmrag
5.1 Results Presentation
The evaluation used 100 questions and answers
(Q/As) not included in the fine-tuning training base.
These 100 questions were presented to various mod-
els, and their responses were recorded and compared.
ChatGPT-4 assessed the quality of each response,
scoring it on a scale from 0 to 10. The final score
for each model represents the average of the scores
assigned to each response. Table 5 present the results
of this evaluation. It is clear that Model 4 of SLIM-
RAFT achieved the best score of 8.63 with a standard
deviation of 2.30 across the 100 Q/As.
Table 5: Score results between the four models.
Model Aver. St. Dev. Min. Max.
TTL 0.2 0.98 0 5
NCM-TTL 4.71 3.53 0 10
GPT 4.5 1.39 0 5
SLIM-RAFT 8.63 2.30 0 10
5.2 LLM Justification
It is essential to underscore the potential utility of an
LLM specialized in this type of task, as it extends
beyond mere classification. A straightforward input-
output classification system is confined to specific
subjects and input formats. However, an LLM sys-
tem can extract semantic knowledge from the training
base and demonstrates flexibility in handling various
input formats.
Answering a simple direct question like “What is
the NCM code for the product fresh apple package is
not enough. The question can come in several forms
or be embedded in a larger context, for example: “I
don’t know the NCM code for fresh apple package,
can you help me?”.
Another pertinent scenario involves cases of at-
tempted tax evasion. For instance, if the product fresh
apple package is exempt from taxation while apple
juice package is subject to tax, it could be mislead-
ingly described in a tax document as apple j. pack.
but assigned the NCM code for fresh apples package,
which is tax-exempt. If this discrepancy goes unde-
tected by customs authorities, it could result in a loss
of revenue due to the uncollected tax.
The SLIM-RAFT model can effectively help a
system for controlling and inspect documents regard-
ing the NCM Code misuse. But, because of its re-
duced size, the capability of extracting the right ques-
tion embedded in a bigger context is limited. Let us
consider the following example:
Portuguese - na padaria e comprei um suco de
SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature
239
laranja, percebi que na nota fiscal aparecia um
c
´
odigo chamado NCM, mas estava com a impress
˜
ao
borrada. Qual seria o c
´
odigo impresso?
English—I was at the bakery and bought some or-
ange juice. I noticed that a code called NCM appeared
on the invoice, but the print was blurry. What would
be the printed code?
When presented with this query, our model may
not discern the central issue: ”What is the NCM cat-
egory for orange juice?” Integrating an additional
LLM into the system could mitigate this limitation.
The Few-shot prompting technique (Ma et al.,
2024; Gu et al., 2021) can enable other large lan-
guage models (LLMs) to reformulate queries, thereby
adapting the context to enhance comprehension by
the smaller LLM integrated within the SLIM-RAFT
model.
Nonetheless, given its current limited size, it is im-
portant to acknowledge that our model may not fully
comprehend all contexts. The objective, however, is
to expand the model as more resources become avail-
able, thereby enhancing its capacity and performance;
as the model becomes larger, the need for integration
with another model will be suppressed.
5.3 Limitations
The SLIM-RAFT model is a prototype developed to
illustrate the original RAFT methodology’s simplifi-
cation and propose its application within the NCM
domain. Consequently, this model is a highly spe-
cialised tool tailored for its designated task.
It is not recommended for use in ChatBot applica-
tions, as TeenyTinyLLaMA (TTL) creators have ad-
vised against employing the TTL 160m model. The
TTL 460m model is recommended for chatbot func-
tionalities.
When constructing the training base for fine-
tuning, a simplified chain-of-thought approach is em-
ployed. However, it’s crucial to remember that the
involvement of a domain expert is beneficial and nec-
essary for developing the reference lines of reasoning,
highlighting the irreplaceable role of human expertise
in this process.
6 CONCLUSIONS
The SLIM-RAFT model demonstrated significantly
superior performance to ChatGPT 4 in interpreting
and classifying product descriptions according to the
NCM code.
This outcome indicates that a smaller-scale LLM
with specific domain knowledge can surpass a more
powerful LLM in specialized tasks, provided it is
appropriately adjusted and trained while maintaining
low execution costs.
The technique for simplifying the construction of
the chain-of-thought, as proposed in SLIM-RAFT,
not only reduces costs but also proves to be a vi-
able alternative for developing specialized LLMs with
high accuracy.
Since NCM coding is used not only for manag-
ing, transporting, paying, and taxing various goods
in the import and export trade between MERCO-
SUR countries but also for most tax bills for goods,
commodities, and restaurants in the Brazilian mar-
ket, the practical value of this research is substan-
tial. The findings provide convenience for govern-
ment departments involved in import and export, tax-
ation, banks, transportation, and manufacturers. Ad-
ditionally, since over 200 countries use the HS system
for import and export trade, the LLM-NCM solution
proposed in this article can also facilitate the effective
promotion of LLM-HS applications worldwide.
Future research will explore applying SLIM-
RAFT to LLaMA 3, with a focus on multilingual
tasks and comparisons with other techniques like
LoRa.
ACKNOWLEDGEMENTS
ChatGPT 4 was used in all sections of this work to
standardize and improve the writing in British En-
glish. This research is partially funded by the Brazil-
ian National Council for Scientific and Technological
Development (CNPq).
REFERENCES
Brazil, C. (2016). Ajuste sinief 17. https://www.confaz.
fazenda.gov.br/legislacao/ajustes/2016/AJ\ 017\ 16
Accessed on Jun 6th, 2024.
Corr
ˆ
ea, N. K., Falk, S., Fatimah, S., Sen, A., and
de Oliveira, N. (2024). Teenytinyllama: open-source
tiny language models trained in brazilian portuguese.
arXiv preprint arXiv:2401.16640.
Di Oliveira, V., Weigang, L., and Rocha Filho, G. P. (2022).
Eleven data-set: A labeled set of descriptions of goods
captured from brazilian electronic invoices. In WE-
BIST, pages 257–264.
Du, S., Wu, Z., Wan, H., and Lin, Y. (2021). Hscodenet:
Combining hierarchical sequential and global spatial
information of text for commodity hs code classifica-
tion. In Pacific-Asia Conference on Knowledge Dis-
covery and Data Mining, pages 676–689. Springer.
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y.,
Sun, J., and Wang, H. (2023). Retrieval-augmented
WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies
240
generation for large language models: A survey. arXiv
preprint arXiv:2312.10997.
Garcia, G. L., Paiola, P. H., Morelli, L. H., Candido, G.,
J
´
unior, A. C., Jodas, D. S., Afonso, L., Guilherme,
I. R., Penteado, B. E., and Papa, J. P. (2024). In-
troducing bode: A fine-tuned large language model
for portuguese prompt-based task. arXiv preprint
arXiv:2401.02909.
Gu, Y., Han, X., Liu, Z., and Huang, M. (2021). Ppt: Pre-
trained prompt tuning for few-shot learning. arXiv
preprint arXiv:2109.04332.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang,
S., Wang, L., and Chen, W. (2021). Lora: Low-rank
adaptation of large language models. arXiv preprint
arXiv:2106.09685.
Kieckbusch, D. S., Geraldo Filho, P., Di Oliveira, V., and
Weigang, L. (2021). Scan-nf: A cnn-based system for
the classification of electronic invoices through short-
text product description. In WEBIST, pages 501–508.
Larcher, C., Piau, M., Finardi, P., Gengo, P., Esposito, P.,
and Carid
´
a, V. (2023). Cabrita: closing the gap for
foreign languages. arXiv preprint arXiv:2308.11878.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin,
V., Goyal, N., K
¨
uttler, H., Lewis, M., Yih, W.-t.,
Rockt
¨
aschel, T., et al. (2020). Retrieval-augmented
generation for knowledge-intensive nlp tasks. Ad-
vances in Neural Information Processing Systems,
33:9459–9474.
Lopes, R., Magalh
˜
aes, J., and Semedo, D. (2024). Gl\’oria-
a generative and open large language model for por-
tuguese. arXiv preprint arXiv:2402.12969.
Ma, H., Zhang, C., Bian, Y., Liu, L., Zhang, Z., Zhao,
P., Zhang, S., Fu, H., Hu, Q., and Wu, B. (2024).
Fairness-guided few-shot prompting for large lan-
guage models. Advances in Neural Information Pro-
cessing Systems, 36.
Marinho, M. C., Di Oliveira, V., Neto, S. A., Weigang, L.,
and Borges, V. R. (2022). Visual analysis of electronic
invoices to identify suspicious cases of tax frauds. In
International Conference on Information Technology
& Systems, pages 185–195. Springer.
MERCOSUR (2024a). Mercosur - consultas
`
a nomen-
clatura comum e
`
a tarifa externa. https://www.
mercosur.int/pt-br/politica-comercial/ncm/ Accessed
on Jun 4th, 2024.
MERCOSUR (2024b). Mercosur contries. https://www.
mercosur.int/en/about-mercosur/mercosur-countries/
Accessed on Jun 6th, 2024.
Meta, A. (2024). Introducing meta llama 3: The most ca-
pable openly available llm to date. Last accessed June
3th, 2024 https://ai.meta.com/blog/meta-llama-3/.
Pires, R., Abonizio, H., Almeida, T. S., and Nogueira, R.
(2023). Sabi
´
a: Portuguese large language models.
In Brazilian Conference on Intelligent Systems, pages
226–240. Springer.
Radosavovic, I., Zhang, B., Shi, B., Rajasegaran, J., Kamat,
S., Darrell, T., Sreenath, K., and Malik, J. (2024). Hu-
manoid locomotion as next token prediction. arXiv
preprint arXiv:2402.19469.
Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu,
A., Si, C., Li, Y., Gupta, A., Han, H., Schulhoff,
S., et al. (2024). The prompt report: A system-
atic survey of prompting techniques. arXiv preprint
arXiv:2406.06608.
Schulte, J. P., Giuntini, F. T., Nobre, R. A., Nascimento, K.
C. d., Meneguette, R. I., Li, W., Gonc¸alves, V. P., and
Rocha Filho, G. P. (2022). Elinac: autoencoder ap-
proach for electronic invoices data clustering. Applied
Sciences, 12(6):3008.
Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau:
pretrained bert models for brazilian portuguese. In In-
telligent Systems: 9th Brazilian Conference, BRACIS
2020, Rio Grande, Brazil, October 20–23, 2020, Pro-
ceedings, Part I 9, pages 403–417. Springer.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux,
M.-A., Lacroix, T., Rozi
`
ere, B., Goyal, N., Hambro,
E., Azhar, F., et al. (2023a). Llama: Open and ef-
ficient foundation language models. arXiv preprint
arXiv:2302.13971.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi,
A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava,
P., Bhosale, S., et al. (2023b). Llama 2: Open foun-
dation and fine-tuned chat models. arXiv preprint
arXiv:2307.09288.
Valenc¸a, P. R. M. et al. (2023). Essays on foreign trade,
labor, innovation and environment.
Van Engelen, J. E. and Hoos, H. H. (2020). A sur-
vey on semi-supervised learning. Machine learning,
109(2):373–440.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. Advances in neural
information processing systems, 30.
Warnakulasuriya, S. and Hapuarachchi, K. (2024). From
knowledge to action: Leveraging retrieval augmented
fine tuning (raft) to empower quick and confident first-
aid decisions in emergencies. Researchgate (Preprint)
http://dx.doi.org/10.13140/RG.2.2.35911.30888.
WCO (2018). THE HARMONIZED SYSTEM A universal
language for international trade. World Customs
Organization. https://www.wcoomd.org/-/media/
wco/public/global/pdf/topics/nomenclature/activities-
and-programmes/30-years-hs/hs-compendium.pdf
(Visited 2024-06-04).
WCO (2024). List of contracting parties to the hs con-
vention and countries using the hs - world customs
organization. https://www.wcoomd.org/en/topics/
nomenclature/overview/list-of-contracting-parties-
to-the-hs-convention-and-countries-using-the-
hs.aspx Accessed on Jun 6th, 2024.
Yadav, B. K. (2023). Impact of regulation and conformity
assessment procedures on global trade. In Handbook
of Quality System, Accreditation and Conformity As-
sessment, pages 1–21. Springer.
Zhang, T., Patil, S. G., Jain, N., Shen, S., Zaharia, M., Sto-
ica, I., and Gonzalez, J. E. (2024). Raft: Adapting
language model to domain specific rag. arXiv preprint
arXiv:2403.10131.
SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature
241