Authors:
Sepideh Sadat Sobhgol
1
;
Gabriel Campero Durand
2
;
Lutz Rauchhaupt
1
and
Gunter Saake
2
Affiliations:
1
Ifak e.v.Magdeburg, Germany
;
2
Otto-von-Guericke University, Magdeburg, Germany
Keyword(s):
Graph Analysis, Network Analysis, Link Prediction, Supervised Learning.
Abstract:
In the combination of data management and ML tools, a common problem is that ML frameworks might require moving the data outside of their traditional storage (i.e. databases), for model building. In such scenarios, it could be more effective to adopt some in-database statistical functionalities (Cohen et al., 2009). Such functionalities have received attention for relational databases, but unfortunately for graph-based database systems there are insufficient studies to guide users, either by clarifying the roles of the database or the pain points that require attention. In this paper we make an early feasibility consideration of such processing for a graph domain, prototyping on a state-of-the-art graph database (Neo4j) an in-database ML-driven case study on link prediction. We identify a general series of steps and a common-sense approach for database support. We find limited differences in most steps for the processing setups, suggesting a need for further evaluation. We identify b
ulk feature calculation as the most time consuming task, at both the model building and inference stages, and hence we define it as a focus area for improving how graph databases support ML workloads.
(More)