Entity alignment (GLEE) framework. Even though
graph hierarchies (Zhang et al., 2020) and datatypes
(Kim et al., 2022; Shen et al., 2021) contain crucial
information about entities, this information is not
directly reflected in KGRL EA methods. In the GLEE
framework, they are used to improve the EA
performance and provide explanation. The GLEE
framework consists of three steps.
In the first step, after selecting two KGs to be
aligned, candidate entities that can be aligned with a
specific entity in one KG (hereafter, target entity) are
found in the other KG. To do so, this paper calculates
graph structural similarity (hereafter, S-similarity)
using the existing embedding-based EA method. S-
similarity is not performed for EA directly, but for
finding candidate entities.
In the second step, EA is performed after finding
the entity to be aligned with the target entity among
candidate entities. To do so, this paper devises graph
hierarchical similarity (H-similarity) and datatype
similarity (D-similarity). H-similarity is calculated to
discover common hyper-concepts shared by the target
entity and the candidate entity. It has a larger value
when the same hyper-entities appear in fewer hops.
At this time, BERT (Devlin et al., 2018)-based word
embedding is used on the entity names to determine
whether the two hyper-entities are the same. If two
KGs are from different languages, multilingual BERT
model (Pires et al., 2019) is used. D-similarity is
calculated by discovering common properties of the
entities and counting the set of common properties
with the same datatype values. To determine whether
the two properties are same, BERT-based word
embedding is used on the property names, too. The
second step results are similarities between the target
entity and candidate entities.
In the third step, the EA is performed using the
three kinds of similarities and explanation is
performed using two subgraphs of the target entity
and its aligned entity extracted from each of the two
KGs. The subgraphs are extracted from the different
KGs but have same graph structure and contents. In
GLEE framework, the subgraphs consist of the paths
with the aligned entities as the starting node and the
hyper concept common to both subgraphs as the
ending node, and the datatype properties and their
data values of the aligned entities. The subgraphs are
used to explain the EA results.
The main contribution of this paper is providing a
detailed steps to utilize three kinds of similarity to
improve the EA performance and provide explanation.
Since S-similarity has relatively low computational
cost than H-similarity and D-similarity to find the
similar entities in the large KGs, S-similarity is
utilized in advance to discover the candidate pairs of
entities to be aligned. On the other hand, H-similarity
and D-similarity are more accurate and interpretable
than S-similarity, but it is impossible to directly apply
them to align entities in the large KGs. Therefore,
they are applied in the candidate pairs obtained by
using S-similarity.
This paper is organized as follow. The related
works of this paper is presented in Section 2. In
Section 3 the illustrative scenario is provided to show
how GLEE framework works. Section 4 depicts
GLEE framework with the details. The superiority of
the framework is presented in Section 5 with
experiments. Finally, Section 6 proposes the
conclusions and further research.
2 RELATED WORKS
Many studies have been performed to automatically
align entities of different knowledge graphs (KGs).
TransE (Bordes et al., 2013) treats the relation in a
relationship triple as the translation from head to the
tail entities achieving promising results in one-to-one
relation. However, it cannot consider the multi-hop
relationships nor process the complex relationships
such as one-to-many, many-to-one, many-to-many.
Therefore, many variants of TransE have
subsequently appeared, such as TransR (Lin et al.,
2015), TransH (Wang et al., 2014), TransD (Ji et al.,
2015), TransG (Xiao et al., 2015a), TransA (Xiao et
al., 2015b), and PTransE (Li et al., 2020). These
studies tackle different limitations of TransE and
enhance the ability to model structures within KGs.
Recently, due to the growth of the deep learning
methods, graph convolutional network (GCN) (Kipf
& Welling, 2016)-based methods are widely adopted
for entity alignment. GCN-Align (Wang et al., 2018)
captures the entities’ neighborhood structures for the
first time by utilizing GCNs and achieving promising
results. RN-GCN (Wu et al., 2019a) incorporates
relation information via attentive interactions
between the KG and its dual relation counterpart.
HGCN (Wu et al., 2019b) jointly learns entity and
relation representations without requiring pre-aligned
relations. EMGCN (Tam et al., 2020) perform
unsupervised EA by using both relation and attribute
information.
Several recent studies use semi-supervised
learning that generates new pre-aligned seeds during
the iteration process. MCEA (Qi et al., 2023) uses a
multiscale graph convolution model to embed the
graph and uses intermediate results to guide the
negative sampling process. PEEA (Tang et al., 2023)