change single components of the semantic model cre-
ation process in order to test and evaluate new or even
already existing approaches in the same system.
Our contributions in this paper are as follows:
First, we introduce the semantic refinement as a phase
of the semantic model creation process and state the
characteristics of this phase. Second, we present the
concept of auxiliary approaches and how those can be
utilized to build a comprehensive semantic modeling
environment. Third, we release PLASMA to serve as
a backbone for future semantic modeling that includes
a refinement UI.
In the further course of the paper, we discuss re-
lated work in Section 2, then we present our concept
of semantic refinement as its own phase in Section 3.
Afterwards, we introduce PLASMA in Section 4, fol-
lowed by current and future usage scenarios of the
platform in Section 5. We conclude and give an out-
look in Section 6.
2 RELATED WORK
In the past years, several approaches for performing
the creation of semantic models have emerged. The
process of creating a semantic model has been divided
into two phases so far. A basic annotation of a data
source can be achieved by performing a semantic la-
beling step, followed by the semantic modeling step,
which aims to relate initially mapped concepts and
equip the annotation with additional context (cf. (Vu
et al., 2019)).
Semantic labeling, which is the initial mapping
from labels (e.g., table headers or leafs in a struc-
tured data set) to a conceptualization (e.g., ontology),
is mainly performed by following either a label-driven
or a data-driven strategy. Examples for label-driven
mappings have been proposed by (Polfliet and Ichise,
2010), (Pinkel et al., 2017), (Paulus et al., 2018) or
(Papapanagiotou et al., 2006). Next to those, there
also exist data-driven mapping approaches that rely
mainly on using the data values of a data set. For
instance, (Syed et al., 2010) as well as (Wang et al.,
2012) proposed approaches that provide labels using
the data values within a data set in conjunction with
external (knowledge) databases to identify the most
reasonable concept. (Ramnandan et al., 2015),(Ab-
delmageed and Schindler, 2020) use algorithmic ap-
proaches whereas (Pham et al., 2016), (R
¨
ummele
et al., 2018), (Chen et al., 2019), (Hulsebos et al.,
2019) as well as (Takeoka et al., 2019) and (Pomp
et al., 2020) use machine learning approaches to solve
the task.
All previously mentioned semantic labeling ap-
proaches rely on a predefined conceptualization, al-
though approaches like (Pham et al., 2016), (Pomp
et al., 2020) or (Jim
´
enez-Ruiz et al., 2015) are able to
create concepts during the mapping process by inte-
grating the user into the labeling process. Using such
approaches in platforms, like Optique (Giese et al.,
2015), it becomes possible to redefine the found map-
pings using an R2RML editor (Sengupta et al., 2013).
Following the semantic labeling phase, the initial
mappings can be extended by additional meaningful
concepts as well as selecting the most suitable re-
lationships that hold between the different concepts,
resulting in a semantic model for that data set. Ap-
proaches presented by (Knoblock et al., 2012) and
(Taheriyan et al., 2013; ?; ?) as well as (U
˜
na et al.,
2018) have shown significant improvements in this
area during the last years. Most recent approaches
like (Vu et al., 2019) and (Futia et al., 2020) show the
ongoing research interest in the topic. However, none
of the approaches does solve the task perfectly, due to
ambiguities in the data sets or the encounter of unseen
cases in the training data.
To finalize the model and correct potential errors,
human interaction is needed. However there are only
few approaches that feature tools to inspect and mod-
ify a semantic model after the automated generation.
A first approach to support semantic model modifica-
tion was Karma, presented by (Knoblock et al., 2012)
and (Gupta et al., 2012). Karma is described as a
data integration tool that enables users to feed in data
and create a semantic model for it. The used auto-
mated semantic modeling approach is based on the
method developed by Taheriyan et al. Karma has
been used in various projects (cf. (Szekely et al.,
2015)). A similar approach to Karma is followed by
ESKAPE(Pomp et al., 2017), a semantic data plat-
form for enterprises, handling the full data manage-
ment process from ingestion to extraction. However,
ESKAPE only performs basic semantic labeling au-
tomatically - the semantic modeling has to be done
manually by the users. Furthermore, there exist mul-
tiple commercial tools like PoolParty
1
or Grafo
2
that
focus on creating knowledge graphs but are not avail-
able for free use and furthermore are not extendable
to support own algorithms.
Both Karma and ESKAPE are capable of doing a
data schema analysis, analyzing live input data to ex-
tract labels and structure to perform the labeling step
on. Both also support multiple data formats during
this phase as well as provide a GUI (cf. (Szekely
et al., 2015) and (Pomp et al., 2018)), which allows
users to modify the semantic model. However, al-
1
http://poolparty.biz
2
http://gra.fo
ICEIS 2021 - 23rd International Conference on Enterprise Information Systems
404