Learning Knowledge Representation by Aligning Text and Triples via

Finetuned Pretrained Language Models

ıctor Jes

us Sotelo Chico

and Julio Cesar dos Reis

Instituto de Computac¸

ao, Universidade Estadual de Campinas (UNICAMP), Brazil

Keywords:

Language Model, Knowledge Graph, Multimodal Encoders.

Abstract:

Representation learning has produced embedding for structure and unstructured knowledge constructed in-

dependently, not sharing a vectorial space. Alignment between text and RDF triples has been explored in

natural language generation, from RDF verbalizers to generative models. Existing approaches have treated

the semantics in these data via unsupervised approaches proposed to allow semantic alignment with adequate

application studies. The existing datasets involved in text-triples are limited and have only been applied to

text-to-triple generation rather than for representation. This research proposes a supervised approach for rep-

resenting triples. Our approach feeds an existing pretrained model with triple-text pairs exploring measures

for the semantic alignment between the pair elements. Our solution employs a data augmentation technique

with contrastive loss to address the dataset limitation. We applied a loss function that requires only positive

examples, which is suitable for the explored dataset. Our experimental evaluation measures the effectiveness

of the ﬁne-tuned models in two main tasks: ’Semantic Similarity’ and ’Information Retrieval’. These tasks

were addressed to measure whether our designed models can learn triple representation while maintaining the

semantics learned by the text encoder models. Our contribution paves the way for better embeddings targeting

text-triples alignment without huge data, bridging unstructured text and knowledge graph data.

1 INTRODUCTION

The evolution of natural language processing in recent

years has been driven by the introduction of large lan-

guage models (LLMs) (Min et al., 2023), demonstrat-

ing the ability to process a large amount of text. They

seem to understand human language properly, and

their main application is suited to developing more

powerful question-and-answer systems. In these ap-

plications, LLMs suffer problems with hallucinations

(Perkovi

c et al., 2024) when they answer with no ac-

curate information. This behavior can be explained

by the nature of knowledge acquired by LLMs from a

large corpus of unstructured text, which might not be

appropriately curated.

Existing research has investigated the possibility

of integrating structure knowledge with LLMs. For

instance, Pan et al. (Pan et al., 2024) proposed a

roadmap to incorporate Knowledge Graphs (KG) with

LLMs to overcome hallucination issues, taking ad-

vantage of the structure and modeling of KGs and

https://orcid.org/0000-0001-9245-8753

https://orcid.org/0000-0002-9545-2098

their interpretability.

Working with LLM-KG means dealing with their

input elements (text and triples), which are heteroge-

neous data underlying different semantics. One of

the most well-known tasks that work with aligning

these data is the task of Natural Language Genera-

tion (NLG) in converting triples into natural text (La-

palme, 2020; Zhu et al., 2019) and natural text into

triples (Regino et al., 2023). These applications cover

the aligning triple to text to create one of them. Exist-

ing projects make datasets available for this challenge

(Castro Ferreira et al., 2020).

We understand that existing approaches related to

NLG need to discuss the semantics of text and triple

in terms of embedding representations. Constructing

embeddings representing the text develops further and

achieves new state-of-the-art with the introduction of

Transformers pre-trained model to represent text (Xia

et al., 2020), in which a text is mapped to a numeri-

cal vector. Similarly, Knowledge graph embeddings

(KGE) (Cao et al., 2024) create a vector representa-

tion for their elements, entities, and relations.

Representing triples as a single embedding has

been investigated in the literature (Fionda and Pirr

Chico, V. and Reis, J.

Learning Knowledge Representation by Aligning Text and Triples via Finetuned Pretrained Language Models.

DOI: 10.5220/0013015100003838

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2024) - Volume 2: KEOD, pages 51-62

ISBN: 978-989-758-716-0; ISSN: 2184-3228

2020; Kalinowski and An, 2022) due to their impor-

tance in triples classiﬁcation, clustering, and recom-

mendation systems. Generating vectors to represent

triplets (as a whole) instead of separate vectors for

speciﬁc triple elements (entities and relations) alters

the representation of elements in a KG in a vector

space. We understand that such an approach is fur-

ther adequate and advances representation learning.

These developments have achieved new, recent,

state-of-the-art encoders (Patil et al., 2023). How-

ever, investigating encoders capable of representing

facts, such as triples and texts, has followed indepen-

dent paths. This entails separating two approaches

that deﬁne the knowledge. Without a common rep-

resentation for such knowledge, intermediate spaces,

such as using SPARQL queries (generated by humans

or generative models), are necessary. Treating lan-

guages such as RDF as texts with a proper knowledge

representation in vector spaces might result in poor

results using incorrect information.

This article presents an investigation into devel-

oping a prevalent method for convergent semantic

alignment using a small dataset with triples and nat-

ural texts. Our study develops a more robust solu-

tion capable of representing knowledge in the same

vector space, considering KG triples and instructed

texts in an integrated way. Our approach covers the

task of using pre-trained language models and ﬁne-

tuning them using parallel data of triples and their re-

spective equivalents in natural texts. To improve the

alignment quality, we proposed a data augmentation

technique over the small dataset WEBNLG (Gardent

et al., 2017), which provides examples of triple with

their respective meaning in natural text. The devel-

oped solution aims to guarantee the preservation of

semantic representation.

We evaluate our proposal by considering different

NLP tasks, such as Semantic Textual Similarity and

Information Retrieval (IR). The ﬁrst collects and as-

sesses textual data, processing their embedding over

the STS-22 dataset and detecting if the ﬁnetuning pro-

cess degrades the STS benchmark. Finally, IR creates

embedding for triples and text from WEBNLG. We

conduct two main IR conﬁgurations: (1) we recover

text using a triple as a query, and then (2) we use text

to recover the adequate triples.

Our results demonstrate the need for improved

alignment between text and triples instead of apply-

ing these pretrained models directly because such an

approach only considers grammar sharing and not

a genuine semantic alignment. We found that ﬁne-

tuning improved model effectiveness, particularly in

the retrieval task (MRR@1 score). Furthermore, us-

ing Contrastive loss and augmentation enabled effec-

tive semantic learning with models without losing ef-

fectiveness in the semantic similarity task. Our ﬁnd-

ings establish that learning the alignment between text

and triples is possible only using positive examples

(pairs of examples with the same meaning) in small

datasets.

The remainder of this article is organized as fol-

lows: Section 2 presents the background concepts

and discusses related studies associated with our in-

vestigation; Section 3 presents our proposed original

method to construct a semantic model alignment for

text and triples. Section 4 presents our experimen-

tal evaluation methodology based on two evaluation

scenarios. Section 5 reports on our obtained results,

whereas Section 6 discusses our ﬁndings and study

strengths; ﬁnally, Section 7 wraps up the article and

points out directions for future investigations.

2 LITERATURE REVIEW

Subsection 2.1 addresses background concepts rele-

vant to our study, and Subsection 2.2 presents a syn-

thesis of related studies to our proposal.

2.1 Background Concepts and

Techniques

Language Models. A Language model (Chang and

Bergen, 2024) is a probabilistic model that learns lan-

guage properties from unstructured text trained in a

task that does not require human annotation. These

trained tasks provide the underlying model knowl-

edge about how human languages are represented by

understanding how text is composed. This learns

from raw data rather than relying on annotating all

the grammar and syntax properties.

Large Language Models (LLM). LLM (Min

et al., 2023) stands for massive artiﬁcial intelligence

models with billions of parameters that can under-

stand human language and perform NLP tasks such

as writing, summarizing, and others. These models

improve traditional Language models, such as BERT

(Devlin et al., 2019), which require a ﬁnetuned pro-

cess over an speciﬁc task.

Text Embeddings. Creating a vector representa-

tion for the meaning of the texts has been developed

since the beginning of Natural Language Processing

(Patil et al., 2023). This includes techniques such as

Word2vec (Mikolov et al., 2013), which map words

into a unique vector, to pre-trained language mod-

els such as BERT (Devlin et al., 2019) that considers

the surrounding context around the words to assign

a vector and applies subword tokenization to avoid

KEOD 2024 - 16th International Conference on Knowledge Engineering and Ontology Development

mapping each word to a numerical id. The develop-

ment of language models allowed the construction of

more robust embedding models, which could create

different vector representations for a word depending

on the surrounding words; we refer to this as context

because it determines the vector creations. This rep-

resents an advancement over the initial approaches,

such as a bag of words (Qader et al., 2019) by solving

a problem of polysemy (having the exact words with

different meanings).

Knowledge Graphs. KGs (Ji et al., 2022) are di-

rected graphs that model real-world facts. Each node

represents an entity in such graphs, and the edges rep-

resent a relation between them. Formally, we deﬁne a

KG as a set of Entities (E), relations (R), and triples

(T ), such as:

KG = {E, R, T }, e

∈ E, r

∈ R, t

∈ T

= (e

, r

, e

)

For example, the fact “The Star Wars ﬁlm was di-

rected by George Lucas” can be represented by the

triple (starWarsFilm, directed, GeorgeLucas). Over

the same KG, we can obtain more triples with differ-

ent facts about other relevant information, such as the

ﬁlm’s release date. KG requires knowledge model-

ing, giving a well-deﬁned and correct semantic deﬁ-

nition for concepts in a given domain. This is accom-

plished by using an Ontology as an artifact computa-

tionally representing knowledge (Ding et al., 2007).

KG provides an explanation and reasoning over the

inferences extracted from the graphs.

Knowledge Graph Embeddings (KGE). The

structured knowledge encoded in KGs requires a

query language such as SPARQL (Hogan, 2020) to

process and extract information from it. This lan-

guage allows the creation of speciﬁc queries to ex-

tract information from KGs. However, this creates

a new issue that requires either manually creating

these queries or using generative models to create

them..To overcome this problem, the KG embedding

(KGE) (Yan et al., 2022) aims to represent KG com-

ponents (entities and relations) in a semantic vec-

tor form. Such vectors aim to describe the seman-

tics involved in knowledge modeling accurately. In

this sense, similar entities must be near distances be-

tween their embeddings. The main application of

KG embeddings is creating an alternative format for

computer-consuming KG in machine learning mod-

els. This makes it possible to apply KG to link predic-

tion tasks, triplet classiﬁcation, recommendation sys-

tems, and others (Wang et al., 2017).

2.2 Related Studies

Lapalme et al. (Lapalme, 2020) proposed an English

RDF verbalizer that uses a symbolic approach to pro-

cess an RDF-Triple, extracting the subject and predi-

cate corresponding to the subject and object in a sen-

tence. The predicate is then mapped to a verb phrase,

which determines the structure of the ﬁnal sentence.

Human evaluation showed that this approach needs

more ﬂuency and that the human work involved in the

application is difﬁcult to scale for more customized

applications.

Similarly, Abhishek et al. (Abhishek et al., 2022)

proposed a technique to address triples to text gener-

ation with the addition that they cover a scenario dif-

ferent from English languages. This demands more

datasets for non-English languages, so they created

XAlign, a multilingual dataset. These were con-

structed from facts to text generation, aligning triple

facts with natural text. Their work demonstrated the

relevance of creating triples and text alignment for

low-resource languages to reduce human efforts.

The other direction can also be explored (from text

to triples instead of triple to text). The study con-

ducted by Regino et al. (Regino et al., 2023) devel-

oped a framework called QART for generating RDF-

Triples from E-commerce Product Question Answer-

ing using an E-commerce dataset and pretrained LM

and LLM application in few-shot learning.

Daw et al. (Daw et al., 2021) proposed the align-

ment of English triples to Hindi sentences using NER-

based ﬁltering that uses semantic similarity. Their ap-

proach maps Hindi and English into the same vector

spaces to recognize the most relevant words in En-

glish for a given word in Hindi. They used a key

phrase extraction to extract the critical phrase from

Hindi text and then applied POS-tag-based heuristics.

Then, the similarity is based on the key phrase and

the triples. This approach creates a heuristic for al-

lowing the alignment between text and triples. How-

ever, these intermediate steps map all words into the

same languages without providing training in creating

semantic alignment to the embedding spaces.

Moreover, Pahuja et al. (Pahuja et al., 2021) pro-

posed a method for aligning knowledge bases (KB)

with texts; the authors used data from Wikidata for the

structure knowledge and Wikipedia for unstructured.

This enables the possibility of testing the proposal’s

alignment methods. They considered alignment us-

ing the same embedding. This involves mapping the

entities from KB to a vector space for text data.

A synergy between LM and KG has been ex-

plored; Zhu et al. (Zhu et al., 2023) focused on

ﬁnding a unique representation creating a heteroge-

Learning Knowledge Representation by Aligning Text and Triples via Finetuned Pretrained Language Models

neous language model trained based on unstructured,

semi-structured, and structured data. The work was

developed by training with a text corpus of tourism

websites and constructing a KG for a tourism context

using websites. They proposed a pre-trained model

that handles the formats of text, as well as triples.

The authors proposed a different training objective:

A mock language model, title matching (whether a ti-

tle matches a paragraph), and triple classiﬁcation for

unstructured, semistructured, and structured, respec-

tively. As a consequence, these data were mapped

into the exact contextual representation. A primary

aspect of their study is that they do not use KGE di-

rectly. They use it to prepare their training objectives

in pretrained models. For example, they train a model

to perform the triple classiﬁcation task.

Table 1 summarizes our related work review. We

identiﬁed existing studies that focus on alignment for

NLG tasks and studies that handle semantic represen-

tation require training from scratch, which is compu-

tationally expensive.

Our present study focuses on the challenge of

dealing with diverse data using various methods.

Compared to other researchers, we are not address-

ing the issue by using a pretrained encoder directly,

employing intermediate steps to map triples, or cre-

ating pretrained models from scratch, which is often

unsuitable. To the best of our knowledge, the study

of the application of text to triple alignment needs to

investigate whether the application of pre-trained en-

coders is suitable for unsupervised alignments. More-

over, a pre-trained model’s tuning process can change

the pretrained models’ semantic learning (capability

to understand the text and adequately recover a simi-

lar one).

Our study complements the presented investiga-

tions by asking how well and acceptable the cur-

rent strategies of the used pre-trained models to align

triples and text are for direct application to vector rep-

resentation work. Additionally, we attempt to deter-

mine whether it is possible to create embedding repre-

sentations using small-size datasets. We propose su-

pervised training in small datasets for ﬁne-tuning cur-

rently open state-of-the-art models for semantic rep-

resentation between text and triples. We construct

a triple vector representation while maintaining ade-

quate text vector representation for the downstream

NLP applications. This aims to map vectors and

triples into the same vector spaces obtained by pre-

trained models. This might enhance current triple

alignment literature and open the ﬁeld for future re-

search, which can help create robust triple-text align-

ment.

3 SUPERVISED SEMANTIC

ALIGNMENT FOR

TRIPLE-TEXT EMBEDDING

CONSTRUCTION

We present our proposal for aligning text and triples

using supervised learning. Subsection 3.1 presents

the datasets explored in our study. Subsection 3.2

describes the models used in the ﬁnetuning process.

Subsection 3.4 presents the designed speciﬁc proce-

dure to conduct the learning representation process

and its outcome by involving our speciﬁc decisions.

3.1 Datasets

WebNLG. We chose WebNLG (Gardent et al., 2017),

a dataset from a challenge competition to transform

triples into text, and to the best of our knowledge,

it is the only one that aligned triples and text. This

dataset comprises examples of triples and their equiv-

alent in natural language text. The collection of triples

can be expressed in the natural text; for example,

the given two triples (Leonardo da Vinci, Profession,

Painter) and (Leonardo da Vinci, Born, Italy) can be

expressed in natural text with the sentences ‘Leonardo

da Vinci is an Italian painter.’

In this study, we focus on establishing a one-to-

many relationship between a single triple and multiple

texts to ensure each triple is aligned with a unique

text.

STS-12. We selected the STS-12 dataset for the

semantic textual similarity (STS) task, composed of

sentence pairs labeled with values from 0 to 5, in

which ﬁve expresses higher similarity. This dataset

was selected to maintain the model’s capability to

handle only textual data.

Table 2 presents the distribution of the data sets for

each stage; for STS-12, we reduce the original dataset

to guarantee a distribution similar in training and vali-

dation dataset concerning WEBNLG. Finally, we split

the original data set using 10% for validation.

3.2 Pretrained Language Models

Creating an intermediate vector representation for text

and triples requires signiﬁcant training data to train

models from scratch. Limited data is available for

alignment, and triples are scarce.

Bringing triple representations to the same-

dimensional spaces for text might improve knowl-

edge representation, creating a bridge between KG el-

ements and natural language texts.

We analyze the effects and suitability of pretrained

KEOD 2024 - 16th International Conference on Knowledge Engineering and Ontology Development

Table 1: Summary of related studies and their characteristics.

Text-triple

aligment

Pretrained

from scratch

Unsupervised Supervised NLG

Semantic

Representation

Abhishek et al. (Abhishek et al., 2022)

Regino et al. (Regino et al., 2023)

✓ ✓ ✓

Lapalme et al. (Lapalme, 2020) ✓ ✓

Daw et al. (Daw et al., 2021)

Pahuja et al. (Pahuja et al., 2021)

✓ ✓

Zhu et al. (Zhu et al., 2023) ✓ ✓ ✓

Ours - This work ✓ ✓ ✓

Table 2: Dataset distribution for ﬁne-tuned model into the

siamese networks.

Train/val Test

STS-12 2,234 3,108

WebNLG 3,598 774

state-of-the-art encoder models; we ﬁne-tune these

models for multimodality data (triple, text) to con-

struct bimodal dimensional spaces. In the following,

we present the models chosen and the rationale for

our decisions.

E5 encoder. E5 (Wang et al., 2024) refers to

a family of encoders trained with contrastive pre-

trained and weak supervision using approximately

a billion text pairs from multilingual datasets from

sources such as Wikipedia, Reddit, and others. These

E5 models pass through a supervised ﬁne-tuned with

high-quality annotated datasets; the knowledge des-

tination technique is used to improve the quality of

the embeddings from this family. In particular, we se-

lected the e5-multi-base (me5-base) and the e5-multi-

small (me5-small) model, which map text to 512 and

318 dimension vectors, respectively. We chose e5 be-

cause they were trained on curated datasets, which

have undergone a rigorous process. This makes them

more stable than options trained using unsupervised

learning.

DistilUSE.

This model (Reimers and Gurevych,

2019) refers to a Multilingual knowledge distilled

version of Multilingual Universal Sentence Encoder

(Yang et al., 2020) supporting more than 50 languages

a sentence and map text into a 512 dimensional. This

model represents a light version of bigger models, al-

lowing it to achieve good scores in STS benchmarks.

Paraphrase. This is another model for ﬁne-tuning

semantic textual similarity mapping text into 378-

dimensional vectors (Reimers and Gurevych, 2019).

These models perform well in the semantic similarity

task (STS) on the MTEB benchmark (Muennighoff

et al., 2023). This is why we took it as a model in our

study.

https://huggingface.co/sentence-

transformers/distiluse-base-multilingual-cased-v2

3.3 Data Augmentation Techniques

Our STS-12 dataset focuses on the semantic similar-

ity task in which text only covers text-text alignment.

Meanwhile, WEBNLG only covers a triplet-to-text

direction since this one contains similarity samples

(label 1). Further examples must be used to teach

the models when the triples and text differ. Figure 1

presents the ties between triples and text data. The

green lines represent the existing relations between

data in our dataset; to overcome the unexisting rela-

tions (red dashed lines), we proposed a data augmen-

tation technique to increase our dataset.

Triple-Triples (Negative): In contrast with the

text, each triple represents unique knowledge. Our

technique chooses a triple and randomly selects an-

other one as a negative example to create negative ex-

amples of a pair of triples. Additionally, to improve

the quality of our augmentation, the triple randomly

chosen belongs to the same category. We guarantee

this using the category metadata from the datasets.

Triple-Text (Negative): using the same principle

that triples are unique (they represent a unique knowl-

edge); and in WEBNLG, such triples are aligned to a

unique text (we guarantee this during the preprocess-

ing assigning a unique text to a triple); Our technique

selects a random sentence from the WEBNLG data

set to align triple with a negative sentence that does

not share semantic similarity.

Figure 1: Existing relations between our bimodal datasets

- green lines refer to existing relations in our dataset

triple-text (WEBNLG) and Text-Text (STS). The orange

dashed lines represent the unexisting relations present in our

datasets. Label 1 describes a perfect similarity between el-

ements, while 0 indicates no similarity.

Learning Knowledge Representation by Aligning Text and Triples via Finetuned Pretrained Language Models

3.4 Finetune Procedure for Aligning

Triples and Text

To ﬁnetune our semantic alignment for triples and

text, we propose adapting bi-encoder strategies from

sentence transformers

because their strategy for se-

mantic textual similarity is state-of-the-art (Reimers

and Gurevych, 2019), allowing the alignment of the

embedding of texts across different languages. In this

sense, we select this architecture to align semantic

over bimodal data (triple-text).

Figure 2 presents our ﬁne-tuned approach using an

adapting bi-encoder network that contains two identi-

cal models. Each creates an embedding representa-

tion for the left and the right sides u and v, respec-

tively. Then, in the training strategies, similarity func-

tions such as cosine are used to measure the similarity

between the embeddings.

To perform the alignment, we pass the described

training dataset from WEBNLG, which acts as triple-

text pair alignments. Feeding the model only with

this dataset could overﬁt the modeling, losing the se-

mantic learning by the pretrained models. To over-

come this problem, we use the STS-12 dataset during

the ﬁnetuning to maintain effectiveness in tasks of se-

mantic textual similarity. This allows a common rep-

resentation between text and triples while still being

capable of using the vector from text and creating a

representation for triples.

Additionally, as encoder models treat the input as

a sequence of characters, we aim to create a seman-

tic embedding from the feeding dataset. We changed

the model structure by adding two unique tokens to

the vocabulary [TEXT] and [TRIPLES]. Such tokens

surround the speciﬁc data to characterize better and

allow ﬂexible identiﬁcation.

We ﬁne-tuned the model using 90% of training

and 10% for validation. We select the same distribu-

tion for each dataset (WEBNLG and STS-12); as our

main objective is to align triples and text, we use an

embedding Information Retrieval evaluator from sen-

tence transformers to assess the capacity of the mod-

els to recover other kinds of data and decide whether

to continue training and save the best models.

Triples and textual data represent knowledge us-

ing structured and unstructured data, respectively. We

decided to test our study regarding supervised learn-

ing with two different losses to identify the more suit-

able one.

Contrastive Loss. (Hadsell et al., 2006) This is a

loss that does not follow a similarity score (range of

similarity). To compute the loss, contrastive requires

a pair of sentences: the ﬁrst one is called anchor, and

https://www.sbert.net

Figure 2: Fine-tuning a bi-encoder to align triples and texts.

this one is linked to another sentence that can be a

positive example (similar example to the anchor rep-

resented with label 1) and negative example (exam-

ple with dissimilarity with label 0). For this scenario,

all triple-text pairs are linked with a positive label.

In STS-12 dataset with a ﬁlter, sentences with high

similarity values greater than 0.9 are given label one,

while the remaining data are labeled zero.

Multiple Negatives Symmetric Ranking

Loss(Henderson et al., 2017) This is a loss function

that only considers a pair of sentences: an anchor and

a positive (similar sentence to the anchor); this loss

computes ﬁrst the loss for ﬁnding the positive for a

given anchor, and then, ﬁnding the anchor for a given

positive example.

Algorithm 1 presents our developed ﬁnetuning

process using different loss functions: (1) contrastive,

(2) contrastive with data augmentation, and (3) Multi-

ple Negative Symmetric Ranking Loss (MNSRL); for

the latter, we do not apply data augmentation due to

only need positive examples.

First, we pass four pre-trained models (line 1).

Each pretrained Encoder is selected to start the

ﬁnetuned; then, we add the datasets STS-12 and

WEBNLG (line 2); afterward, we chose each loss

function conﬁguration (line3); depending on the loss,

we conducted intermediate steps to ﬁlter the adequate

data (line 4 and 8). For example, we increased the

dataset with our data augmentation or ﬁltered only the

positive examples for Multiple Negative Symmetric

Ranking Losses. Finally, the process begins with the

ﬁnetuned information retrieval task, passing the train

data, the pretrained encoder, and the loss (line 10).

KEOD 2024 - 16th International Conference on Knowledge Engineering and Ontology Development

Algorithm 1: Iterative Fine-tuned process for text triples

alignment using Constrative and MultipleNegativesSym-

metricRanking loss (MNSRL).

Input: DistilU SE, Paraphrase, e5 −

base, e5 − small

Data: ST S12,W EBNLG

Result: 12 Finetuned Text-Triples

FineEncoder

lossEncoder

for each

model conﬁguration

1 foreach (Encoder) ∈

DistilUSE, Paraphrase, e5 − base, e5 − small

2 TrainData ←

joinData(ST S − 12, W EBNLG)

3 foreach (loss) ∈

Constrative, ConstrativeAug, MNSRL

4 if loss == ConstrativeAug then

5 TrainData ← DA(TrainData)

6 end

7 if loss == MSRL then

8 TrainData ←

FilterPositive(TrainData)

9 end

10 FineEncoder

lossEncoder

←

Finetuned

ST S

(TrainData, Encoder, loss)

11 end

12 end

4 EVALUATION

METHODOLOGY

This section presents the evaluation methodology for

the ﬁne-tuned models created via our approach. Our

solution creates embedding for text and triples, and

this study evaluates them using two main tasks. Fig-

ure 3 presents the evaluation methodology using the

two primary datasets explored in this study. Their

use emphasizes validating the models’ effectiveness

over speciﬁc tasks described in the following sections.

We compare our results of ﬁne-tuned models with the

original models without any tuning. In the following,

we explain each task and metrics.

4.1 Task 1: Sentence Similarity

The semantic similarity task consists of computing

the similarity between the encoder representations of

two sentences and comparing the results with a human

label score of similarity (measuring the correlation).

Figure 4 presents how we perform our evaluation.

First, we extract samples from the STS-12-Test par-

tition; as each sample is composed of a pair of sen-

tences with a score of similarity, then we used the en-

coder (original pre-trained or our generated ﬁne-tuned

ones) to encode sentences (S

and S

). Afterward,

we compute the similarity between the encoded vec-

tors (V 1

and V 2

) using the cosine function. Finally,

we compare CosineScore with score and compute the

correlation. This shows whether the models’ embed-

ding performs well with human judgments to score a

grade of similarity.

4.2 Task 2: Information Retrieval

This evaluation aims to assess the system’s capacity

to retrieve the alignment for a particular query cor-

rectly. For example, when presented with a “triple”,

the system should provide the associated text that ac-

curately represents its semantic meaning. Conversely,

when given an input text query, the system should re-

trieve the appropriate triple. Figure 5 presents the two

key steps: (A) the Population step, which creates the

vectors from the test datasets, and (B) the semantic

search step to recover the one with the highest simi-

larity.

In this evaluation, we start from an alignment

(X, Y ); we conduct two main paths, one in X be-

ing a triple and Y as a text; this coverage is a di-

rection (triples → text). After that, we replicate the

steps having X as text and Y as triple to cover the

(text → triples direction).

We describe each step for validating the informa-

tion retrieval task in the following.

A) Population

Figure 5-A presents the process for our alignment

evaluation for IR-Task. First, we extract text from

the WEBNLG dataset in the test portion. Each of the

sentences is associated with unique triples. We pass

X (Triple or sentences) through our encoder models

(zero-shot or ﬁne-tuned). This creates a vector for

each X (X

vector

), and then we save each vector in a

database to enable the semantic search.

B) Semantic Search

Figure 5-B presents the evaluation with information

retrieval in which a given Y

(triple or text depending

on the direction of the evaluation) from WEBNLG;

we represent into a vector form using our encoder,

then the vector passes to the semantic search module

in which this vector recovers the sentence with higher

similarity using cosine metrics to compare the query

vector with the vectors stored in our database.

Learning Knowledge Representation by Aligning Text and Triples via Finetuned Pretrained Language Models

Figure 3: Overall evaluation procedure regarding the generated encoders through two main tasks: Semantic similarity and

Information Retrieval. Speciﬁc metrics are computed for these tasks.

Figure 4: Evaluation for Task 1: Sentence Similarity.

4.3 Evaluation Metrics

We selected the most suitable metrics for each of the

investigated evaluation tasks.

Pearson Correlation Coefﬁcient - Task 1

We choose Pearson’s correlation coefﬁcient for STS,

which evaluates the linear correlation between two

datasets. This enables us to compare the human anno-

tation score with the cosine similarity of the embed-

dings of the trained models. The metrics values take

values between -1 and +1. The higher values repre-

sent a higher correlation.

Mean Reciprocal Rank (MRR) - Task 2

We approach the alignment problem as a retrieval is-

sue, opting to use Mean Reciprocal Rank (MRR) as

our primary metric for evaluating the accuracy of the

alignment between text and triples. We chose MRR

because it is a ranking metric that considers the po-

sition of the correct candidate when assigning scores.

This beneﬁts our needs because it prioritizes higher

scores for correctly identifying the top candidates. In

conclusion, we decided to utilize MRR@1 (indicating

whether the system identiﬁed the right candidate) and

MRR@3 (to determine if the system correctly iden-

tiﬁed the right candidate among the top 3 recovered

documents) as our evaluation criteria.

5 RESULTS

This section presents the experimental results of eval-

uating the pretrained models (without ﬁne-tuning)

and applying them to ﬁne-tune based on our devel-

oped approaches using two different loss functions

and our data augmentation technique. We show the

KEOD 2024 - 16th International Conference on Knowledge Engineering and Ontology Development

Figure 5: Evaluation concerning Task 2: Information Retrieval; A) Populating the triples from WEBNLG dataset to a vector

store database; and B) Semantic search evaluation to evaluate the models in the Information Retrieval task.

results for each evaluation task: Sentence Similarity

(cf. Subsection 5.1) and Information Retrieval (cf.

Subsection 5.2).

5.1 Result - Task 1: Sentence Similarity

Table 3 presents the results of STS-12 test datasets of

the ﬁne-tuned models. We note that for the original

pre-trained models, the paraphrased models achieved

better correlation, followed by models from the e5

family. For the ﬁnetuned models with Contrastive

loss, only distil-use increases the value of the Pearson

coefﬁcient, while the others decrease their values.

Applying data augmentation, we noticed a gain

in Pearson with e5-small and paraphrase. The other

models’ Pearson coefﬁcient values decreased, but

their values did not deviate signiﬁcantly from the

original.

Finally, MNSRL achieved better results with the

multilingual e5 base and increased effectiveness for

almost every model except for distil use.

Table 3: Results for Similarity Evaluator using STS-12 test

dataset and Pearson correlation to measure Coherence with

the human annotation. Over models without ﬁne-tuned

(original), Contrastive Loss (LS) without and with Data

augmentation and using Multiple Negatives Ranking Loss

(MNSRL).

STS-12 Pearson Correlation Test set

Models Original CL. CL. DA MNSRL

me5-base 0.8426 0.8167 0.8358 0.8535

me5-small 0.8416 0.8152 0.8523 0.8373

distil-use 0.7935 0.8275 0.7907 0.8246

Paraphrase 0.8470 0.8275 0.8533 0.8510

5.2 Result - Task 2: Information

Retrieval

Task 2: Information Retrieval Text-Triples

Table 4 presents the results in terms of MRR@1

and MRR@3 for IR using the embedding of text

to recover triples; the pretrained models (original)

over e5-small achieved the highest values among pre-

trained models 0.6602 and 0.7965 for MRR@1 and

MRR@3, respectively.

Fine-tuned models with contrastive loss positively

affect the paraphrase multilingual models, while the

others suffer degradations of MRR@1. Moreover,

applying data augmentation techniques looks like the

degradation persists in terms of MRR@1, but we also

notice a slight growth in terms of MRR@3 (para-

phrase and distill-use)

We observed that ﬁne-tuned models with MNSRL

slightly improved over all the models in MRR@1 and

MRR@3, which have notorious effects on the para-

phrase model.

Task 2: Information Retrieval Triples-Text

Table 5 presents the results of given a query triple re-

trieval similar texts over this table. Pretrained models

perform slightly lower than text-triple (cf. Table 4).

Similarly to the last table, the pre-trained models

from the e5-family achieved the best results without

any ﬁne-tuning.

The e5 models reduce their MRR scores for con-

trastive loss, while the others increase the MRR score

values. We also noticed that data augmentation pro-

duces positive effects compared to the CL ﬁnetuned

Learning Knowledge Representation by Aligning Text and Triples via Finetuned Pretrained Language Models

Table 4: Mean Reciprocal Rank (MRR@[1,3]) for Information Evaluator over the test dataset from WEBNLG challenge using

text as query and triples as a corpus for retrieval. Using Encoder without ﬁnetunning (Original), ﬁne-tuned with contrastive

loss (CL) without and with Data augmentation (DA) and Multiple Negatives Symmetric Ranking Loss (MNSRL).

Text as query and Triples as a corpus for retrieval (MRR)

MRR@1 MRR@3

Models Original CL. CL. DA MNSRL Original CL. CL. DA MNSRL

me5-base 0.6576 0.6382 0.6499 0.6693 0.7937 0.7786 0.7929 0.8058

me5-small 0.6602 0.6447 0.6499 0.6654 0.7965 0.7883 0.7907 0.8019

distil-use 0.6395 0.6318 0.6499 0.6408 0.7816 0.7877 0.7866 0.7892

Paraphrase 0.5917 0.6473 0.6370 0.6460 0.7334 0.7705 0.7810 0.8531

Table 5: Mean Reciprocal Rank (MRR@[1,3]) for Information Evaluator over the test dataset from WEBNLG challenge using

triples as query and text as a corpus for retrieval. Using Encoder without ﬁnetunning (Original), ﬁne-tuned with contrastive

loss (CL) without and with Data augmentation (DA), and Multiple Negatives Symmetric Ranking Loss (MNSRL).

Triples as query and Text as a corpus for retrieval (MRR)

MRR@1 MRR@3

Models Original CL. CL. DA MNSRL Original CL. CL. DA MNRSL

me5-base 0.6537 0.5969 0.6499 0.6615 0.7935 0.7418 0.7892 0.7991

me5-small 0.6576 0.5736 0.6395 0.6525 0.7922 0.7149 0.7784 0.7907

distil-use 0.6279 0.6408 0.6473 0.6486 0.7689 0.7780 0.7827 0.7901

Paraphrase 0.5646 0.5930 0.6292 0.6447 0.7110 0.7330 0.7661 0.7817

models. However, this improvement does not reach

values higher than pretrained models without tunned.

Finally, when we applied the MNSRL loss func-

tion, model e5 achieved the highest MRR (0.6615)

out of all the models tested. Additionally, this loss

function boosted all the models tested, resulting in

similar MRR@3 values.

6 DISCUSSION

The pretrained encoder models achieved reasonable

results over the two tested tasks; we notice that some

of them, such as e5 ﬂavors, present well without any

tuning over task 2; this might explained by the simi-

larity grammar presented between entities and texts.

We observed that the models achieved good over-

all results in the Task 1, with all Pearson coefﬁcients

higher than 0.79 (cf. Table 3), which correlates well

with human annotation. The results show that pre-

serving the semantics (STS) was possible in all con-

ﬁgurations, showing that the training did not consid-

erably decrease the Pearson correlation score.

We evaluated our models to perform retrieval in

two directions: returning the most similar triple given

a text query into natural text (Text—Triples, cf. Table

4) and retrieving the most similar text given a triple

(Triple—Text, cf. Table 4), where e5-base ﬁnetuned

with MNSRL achieved the best effectiveness in terms

of MRR@1 having 0.6693 and 0.6615 for retrieval

text-triples and triples-text, respectively. The models

can reasonably recover the associated triple or text.

The paraphrase shows an increase in the MRR

metrics (cf. Table 4, Table 5), a comparison with STS

results (cf. Table 3) reveals a maintaining in Pearson

correlation. This suggests that the observed improve-

ments are learning triples and text alignment while

preserving the capability of the models in task 1. Our

results for the information retrieval task showed slight

improvements. This happens because the WEBNLG

test dataset contains some relations and entities not

present in the training dataset. As a result, it can be

more challenging to accurately identify the connec-

tions between the text and the triples to achieve cor-

rect alignments.

Additionally, applying the data augmentation

technique demonstrated positive effects when used

in combination with constructive loss and improved

their results in IR Triple-text (Table 5)). This im-

provement can also be appreciated by focusing on

MRR@3 metrics related to the retrieval system hav-

ing the correct sentence in the tree of most relevant

retrieved sentences.

Furthermore, the MNRSL loss was more adequate

for this experiment. This evidence shows we can align

text and triples by only providing positive examples.

The results were comparable and also outperformed

our results with data augmentation.

We found that having datasets with both triples

and text examples is necessary to accelerate the devel-

KEOD 2024 - 16th International Conference on Knowledge Engineering and Ontology Development

opment of stronger models. Creating these datasets

could potentially impact the performance of our tested

loss, particularly for MNRSL. We observed that the

models maintain semantic integrity in Task 1 while

delivering improved results in Task 2.

7 CONCLUSION

Embeddings are crucial in combining KGs and lan-

guage models in modern digital applications. Nev-

ertheless, the literature lacks investigations into how

to create embeddings that align triples and text rep-

resentations using small-size datasets. This study ad-

vanced the state-of-the-art by exploring triple embed-

ding using pretrained models and attaching the evalu-

ation to two relevant tasks. We identiﬁed the need to

improve the alignment between text and triples before

directly applying the pretrained models. Our ﬁndings

revealed that some models responded well to the ﬁne-

tuning improvement of their MRR@1 score (retrieval

task). This suggests that the nature of the models

and more reﬁned techniques could contribute to better

alignment strategies for triples and texts. We demon-

strated that using multiple negatives symmetric rank-

ing loss enables semantic learning using all the pre-

trained models in a small dataset. Our ﬁndings indi-

cate that triple embeddings beneﬁted from data aug-

mentation with Contrastive loss in combination with

text-text data (STS). Future research involves creat-

ing our triple-text datasets to increase their richness

for alignment. We plan to explore other languages

besides English to measure whether the applications

of low-resource languages are applicable. Finally,

we intend to examine the potential for self-supervised

learning techniques in enhancing text-triples’ align-

ment.

ACKNOWLEDGEMENTS

This study was ﬁnanced in part by the Coordenac¸

de Aperfeic¸oamento de Pessoal de N

ıvel Superior –

Brasil (CAPES) – Finance Code 001. This work

is also supported by the ’PIND/FAEPEX - “Pro-

grama de Incentivo a Novos Docentes da Unicamp”

(#2560/23) and the S

ao Paulo Research Foundation

(FAPESP) (Grant #2022/15816-5)

The opinions expressed in this work do not necessarily

reﬂect those of the funding agencies.

REFERENCES

Abhishek, T., Sagare, S., Singh, B., Sharma, A., Gupta, M.,

and Varma, V. (2022). Xalign: Cross-lingual fact-to-

text alignment and generation for low-resource lan-

guages. In Companion Proceedings of the Web Con-

ference 2022, WWW ’22, page 171–175, New York,

NY, USA. Association for Computing Machinery.

Cao, J., Fang, J., Meng, Z., and Liang, S. (2024). Knowl-

edge graph embedding: A survey from the perspective

of representation spaces. ACM Comput. Surv., 56(6).

Castro Ferreira, T., Gardent, C., Ilinykh, N., van der Lee, C.,

Mille, S., Moussallem, D., and Shimorina, A. (2020).

The 2020 bilingual, bi-directional WebNLG+ shared

task: Overview and evaluation results (WebNLG+

2020). In Castro Ferreira, T., Gardent, C., Ilinykh,

N., van der Lee, C., Mille, S., Moussallem, D., and

Shimorina, A., editors, Proceedings of the 3rd Inter-

national Workshop on Natural Language Generation

from the Semantic Web (WebNLG+), pages 55–76,

Dublin, Ireland (Virtual). Association for Computa-

tional Linguistics.

Chang, T. A. and Bergen, B. K. (2024). Language Model

Behavior: A Comprehensive Survey. Computational

Linguistics, pages 1–58.

Daw, S., Sagare, S., Abhishek, T., Pudi, V., and Varma, V.

(2021). Cross-lingual alignment of knowledge graph

triples with sentences. In Bandyopadhyay, S., Devi,

S. L., and Bhattacharyya, P., editors, Proceedings of

the 18th International Conference on Natural Lan-

guage Processing (ICON), pages 629–637, National

Institute of Technology Silchar, Silchar, India. NLP

Association of India (NLPAI).

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2019). BERT: Pre-training of deep bidirectional

transformers for language understanding. In Burstein,

J., Doran, C., and Solorio, T., editors, Proceedings

of the 2019 Conference of the North American Chap-

ter of the Association for Computational Linguistics:

Human Language Technologies, Volume 1 (Long and

Short Papers), pages 4171–4186, Minneapolis, Min-

nesota. Association for Computational Linguistics.

Ding, L., Kolari, P., Ding, Z., and Avancha, S. (2007). Us-

ing Ontologies in the Semantic Web: A Survey, pages

79–113. Springer US, Boston, MA.

Fionda, V. and Pirr

o, G. (2020). Learning triple embeddings

from knowledge graphs. Proceedings of the AAAI

Conference on Artiﬁcial Intelligence, 34(04):3874–

3881.

Gardent, C., Shimorina, A., Narayan, S., and Perez-

Beltrachini, L. (2017). Creating training corpora for

NLG micro-planners. In Barzilay, R. and Kan, M.-Y.,

editors, Proceedings of the 55th Annual Meeting of the

Association for Computational Linguistics (Volume 1:

Long Papers), pages 179–188, Vancouver, Canada.

Association for Computational Linguistics.

Hadsell, R., Chopra, S., and LeCun, Y. (2006). Dimen-

sionality reduction by learning an invariant mapping.

In 2006 IEEE Computer Society Conference on Com-

puter Vision and Pattern Recognition (CVPR’06), vol-

ume 2, pages 1735–1742.

Learning Knowledge Representation by Aligning Text and Triples via Finetuned Pretrained Language Models

Henderson, M., Al-Rfou, R., Strope, B., hsuan Sung, Y.,

Lukacs, L., Guo, R., Kumar, S., Miklos, B., and

Kurzweil, R. (2017). Efﬁcient natural language re-

sponse suggestion for smart reply.

Hogan, A. (2020). SPARQL Query Language, pages 323–

448. Springer International Publishing, Cham.

Ji, S., Pan, S., Cambria, E., Marttinen, P., and Yu, P. S.

(2022). A survey on knowledge graphs: Represen-

tation, acquisition, and applications. IEEE Trans-

actions on Neural Networks and Learning Systems,

33(2):494–514.

Kalinowski, A. and An, Y. (2022). Repurposing knowledge

graph embeddings for triple representation via weak

supervision. In 2022 International Conference on In-

telligent Data Science Technologies and Applications

(IDSTA), pages 129–137.

Lapalme, G. (2020). RDFjsRealB: a symbolic approach

for generating text from RDF triples. In Castro Fer-

reira, T., Gardent, C., Ilinykh, N., van der Lee, C.,

Mille, S., Moussallem, D., and Shimorina, A., edi-

tors, Proceedings of the 3rd International Workshop

on Natural Language Generation from the Seman-

tic Web (WebNLG+), pages 144–153, Dublin, Ireland

(Virtual). Association for Computational Linguistics.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).

Efﬁcient estimation of word representations in vector

space.

Min, B., Ross, H., Sulem, E., Veyseh, A. P. B., Nguyen,

T. H., Sainz, O., Agirre, E., Heintz, I., and Roth, D.

(2023). Recent advances in natural language process-

ing via large pre-trained language models: A survey.

ACM Comput. Surv., 56(2).

Muennighoff, N., Tazi, N., Magne, L., and Reimers, N.

(2023). MTEB: Massive text embedding benchmark.

In Vlachos, A. and Augenstein, I., editors, Proceed-

ings of the 17th Conference of the European Chap-

ter of the Association for Computational Linguistics,

pages 2014–2037, Dubrovnik, Croatia. Association

for Computational Linguistics.

Pahuja, V., Gu, Y., Chen, W., Bahrami, M., Liu, L., Chen,

W.-P., and Su, Y. (2021). A systematic investigation

of KB-text embedding alignment at scale. In Zong,

C., Xia, F., Li, W., and Navigli, R., editors, Pro-

ceedings of the 59th Annual Meeting of the Associa-

tion for Computational Linguistics and the 11th Inter-

national Joint Conference on Natural Language Pro-

cessing (Volume 1: Long Papers), pages 1764–1774,

Online. Association for Computational Linguistics.

Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., and Wu, X.

(2024). Unifying large language models and knowl-

edge graphs: A roadmap. IEEE Transactions on

Knowledge and Data Engineering, pages 1–20.

Patil, R., Boit, S., Gudivada, V., and Nandigam, J. (2023).

A survey of text representation and embedding tech-

niques in nlp. IEEE Access, 11:36120–36146.

Perkovi

c, G., Drobnjak, A., and Boti

cki, I. (2024). Halluci-

nations in llms: Understanding and addressing chal-

lenges. In 2024 47th MIPRO ICT and Electronics

Convention (MIPRO), pages 2084–2088.

Qader, W. A., Ameen, M. M., and Ahmed, B. I. (2019).

An overview of bag of words;importance, implemen-

tation, applications, and challenges. In 2019 Interna-

tional Engineering Conference (IEC), pages 200–204.

Regino, A. G., Caus, R. O., Hochgreb, V., and dos Reis,

J. C. (2023). From natural language texts to rdf triples:

A novel approach to generating e-commerce knowl-

edge graphs. In Coenen, F., Fred, A., Aveiro, D., Di-

etz, J., Bernardino, J., Masciari, E., and Filipe, J., ed-

itors, Knowledge Discovery, Knowledge Engineering

and Knowledge Management, pages 149–174, Cham.

Springer Nature Switzerland.

Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sen-

tence embeddings using Siamese BERT-networks. In

Inui, K., Jiang, J., Ng, V., and Wan, X., editors, Pro-

ceedings of the 2019 Conference on Empirical Meth-

ods in Natural Language Processing and the 9th Inter-

national Joint Conference on Natural Language Pro-

cessing (EMNLP-IJCNLP), pages 3982–3992, Hong

Kong, China. Association for Computational Linguis-

tics.

Wang, L., Yang, N., Huang, X., Jiao, B., Yang, L., Jiang, D.,

Majumder, R., and Wei, F. (2024). Text embeddings

by weakly-supervised contrastive pre-training.

Wang, Q., Mao, Z., Wang, B., and Guo, L. (2017). Knowl-

edge graph embedding: A survey of approaches and

applications. IEEE Transactions on Knowledge and

Data Engineering, 29(12):2724–2743.

Xia, P., Wu, S., and Van Durme, B. (2020). Which *BERT?

A survey organizing contextualized encoders. In Web-

ber, B., Cohn, T., He, Y., and Liu, Y., editors, Proceed-

ings of the 2020 Conference on Empirical Methods in

Natural Language Processing (EMNLP), pages 7516–

7533, Online. Association for Computational Linguis-

tics.

Yan, Q., Fan, J., Li, M., Qu, G., and Xiao, Y. (2022). A

survey on knowledge graph embedding. In 2022 7th

IEEE International Conference on Data Science in

Cyberspace (DSC), pages 576–583.

Yang, Y., Cer, D., Ahmad, A., Guo, M., Law, J., Constant,

N., Hernandez Abrego, G., Yuan, S., Tar, C., Sung,

Y.-h., Strope, B., and Kurzweil, R. (2020). Multilin-

gual universal sentence encoder for semantic retrieval.

In Celikyilmaz, A. and Wen, T.-H., editors, Proceed-

ings of the 58th Annual Meeting of the Association for

Computational Linguistics: System Demonstrations,

pages 87–94, Online. Association for Computational

Linguistics.

Zhu, H., Peng, H., Lyu, Z., Hou, L., Li, J., and Xiao, J.

(2023). Pre-training language model incorporating

domain-speciﬁc heterogeneous knowledge into a uni-

ﬁed representation. Expert Systems with Applications,

215:119369.

Zhu, Y., Wan, J., Zhou, Z., Chen, L., Qiu, L., Zhang, W.,

Jiang, X., and Yu, Y. (2019). Triple-to-text: Convert-

ing rdf triples into high-quality natural languages via

optimizing an inverse kl divergence. In Proceedings

of the 42nd International ACM SIGIR Conference on

Research and Development in Information Retrieval,

SIGIR’19, page 455–464, New York, NY, USA. As-

sociation for Computing Machinery.

KEOD 2024 - 16th International Conference on Knowledge Engineering and Ontology Development