Authors:
Lorena Pujante-Otalora
1
;
Manuel Campos
1
;
2
;
Jose Juarez
1
and
Maria-Esther Vidal
3
Affiliations:
1
MedAILab, Department of IT and Systems, University of Murcia, Campus Espinardo, Murcia, 30100, Spain
;
2
Murcian Bio-Health Institute, IMIB-Arrixaca, El Palmar, Murcia, 30120, Spain
;
3
Leibniz University of Hannover and L3S Research Center and TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
Keyword(s):
Graph Database, Knowledge Graphs, Epidemiology, Benchmark, Neo4j, GraphDB.
Abstract:
We have evaluated the performance of property and knowledge graph databases in the context of spatiotemporal epidemiological investigation of an infection outbreak in a hospital. Specifically, we have chosen Neo4j as graph database, and GraphDB for knowledge graphs defined following RDF and its extension RDF*. We have defined a domain model describing a hospital layout and patient movements. For performance comparison, we have created ten graphs with different sizes based on MIMIC-III, implemented three epidemiological queries in Cypher, SPARQL and SPARQL* and defined three benchmarks that measure the execution time and main memory consumption of the three queries in each graph and database engine. Our research suggests that query complexity is a more determinant factor than graph size in the performance of the query executions. Neo4j presents better times and memory consumption than GraphDB for simple queries, but GraphDB is more efficient when traversing big subgraphs. Between RDF
and RDF*, RDF* offers a more compact and human-friendly modelling and a better performance of the query execution.
(More)