3 RELATED WORK
Some works have already approached the task of
graph modeling based on data from the relational mo-
del databases. One of them (Virgilio et al., 2014b)
focus on graph modeling aiming at the improvement
of query performance on the graph database. Diffe-
rently, another work (Wardani and Kng, 2014) focus
on avoiding semantic losses while modeling, whereas
in (Bordoloi and Kalita, 2013) the focus is on avoi-
ding data redundancy.
The main idea of Wardani and K
¨
ung’s work (War-
dani and Kng, 2014) is to build a graph as close as
possible to the relational database conceptual schema,
avoiding semantic losses. The authors use both rela-
tional and conceptual schemas to create specific map-
ping rules, such as to use foreign key attributes to map
a(n) relationship/edge between two nodes. Based on
such rules, the graph can be generated.
Similarly, De Virgilio et al.’s work (Virgilio et al.,
2014a) begins with an analysis over the relational
schema. In another work, (Virgilio et al., 2014b), the
same authors highlight that a careful conceptual ana-
lysis, based on the conceptual schema (ER), is needed
to perform the graph modeling. Initially, a ”template”
graph (schema) is conceived. In this schema, the en-
tities and relationships are conveniently grouped in
one single node, respecting the integrity reference ru-
les, which had been defined in the relational schema,
and/or some other rules defined by the authors. Ba-
sed on such schema, mapping rules are created, ena-
bling the graph generation. The idea is to optimize the
query processing by joining instances that will come
out together in some query.
While the previous mentioned works propose the
creation of property graphs, Bordoloi et al. (Bordoloi
and Kalita, 2013) presents a hypergraph construction
method from a relational schema. At first, star and de-
pendence graphs are built, evidencing the dependence
relations between table attributes. Secondly, these
graphs are merged in a single hypergraph. Likewise
to what De Virgilio et al. call ”template graph”, the
hypergraph represents the database schema, where the
nodes are relations’ attributes and the edges are attri-
butes’ (functional and referential) dependencies. Ba-
sed on that hypergraph, a new one is generated from
the original data, where each attribute value from the
relation tuples turns into a node, and the dependence
relations are instantiated as well. However, this is
a complex graph with too many nodes. To simplify
this graph and avoid node redundancy, a suggested
method includes an analysis of common domains be-
tween attributes. Therefore, another schema hyper-
graph is built taking that analysis into account, where
attributes from the same domain, which are in diffe-
rent tables, are represented just once. Finally, a data
hypergraph is built based on the schema hypergraph,
where a single node represents a value from a speci-
fic domain. Although, in this approach, all attribute
values are available for analysis, a hypergraph is not
easy to analyze, since most algorithms and tools are
not able to deal with hypergraphs.
Some other authors, although they state that re-
lational data can be represented as a graph, they ar-
gue that it is hard bringing the content stored in re-
lational storages to graph structure. Vertexica (Jin-
dal et al., 2014), for instance, is a relational database
system that provides a vertex-centric interface which
helps the user/programmer to analyze data contained
in a relational database, using graph-based queries.
The authors affirm that Vertexica allows easy-to-use,
efficient and rich analysis on top of a relational en-
gine. They report good performance results, handling
graphs with more than 80 thousand nodes and over
1.5 million edges.
Another work, similar to Vertexica, is the Aster
6 from Teradata (Simmen et al., 2014), which ena-
bles the user/programmer, by a vertex-centric pro-
gramming abstraction, to combine different analy-
sis techniques, such as embedding graph functions
within SQL queries. The solution proposed in this
work is an extended multi-engine processing archi-
tecture, able to handle large-scale graph analytics.
A recent work by Xirogiannopoulos (Xirogianno-
poulos et al., 2015; Xirogiannopoulos et al., 2017)
presents a graph analysis framework called GraphGen
which converts relational data into a graph data mo-
del, and allows the user to make graph analysis tasks
or execute convenient algorithms over the obtained
graph. This framework uses DSL language - which
is based on Datalog - to perform the extractions from
the relational database. Up to our knowledge, this is
the only work that discusses the relevance of obtai-
ning different feasible graph models from the same
dataset. However, it does not guide the user on the
graph modeling choices.
Thus, despite addressing the graph modeling task,
none of these works take into account the topological
analysis while choosing a graph modeling alternative.
4 MANAGING GRAPH
ALTERNATIVES APPROACH
Figure 2 summarizes the proposed approach. First,
we assume that it is possible to get an ER schema
from a relational logical schema (LS) using some re-
verse engineer technique (Heuser, 2009). Sometimes
Managing Graph Modeling Alternatives for Link Prediction
73