return p.Name as Patient,
d.Name as Diagnoses,
o.Name as Institution,
g.Name as City,
r.Date as Date
;
// List patients with hypertension
match (p:Patient) -[:ATTENDED_AN]-> (e:
Encounter)
match (:Diagnoses {Name:'Hypertension'}
) <-[:HAS_DIAGNOSES]-
(e) <-[:PROVIDED_AN]- (o:Organizati
on)
return p, e, o
;
3 RESULT
The constructed database was able to return answers
to queries requiring complex relationships. Our
previous queries assume data with complex
relationships, where each returned a network of
patient, institution and the encounter. Figure and
depicted profile of database hit from both queries.
Depicted in the figure is the representation of query
scalability on data with different sizes. Fitted GLM is
presented as Figure 4, which shows that relationship
and graph property have the most implication on data
input runtime (ρ = 0.79, ρ = 0.84).
4 DISCUSSION
Our study demonstrates the graph database as a
potential platform to store life science research data.
Previous studies emphasized graph database
credibility in storing interconnected data, where a
graph database pattern query on such data may
outperform RDB (Medhi and Baruah, 2017; Fabregat
et al., 2018; Mathew and Kumar, 2014). However, in
other cases requiring analytical query, RDB
outperformed the graph database; in their study,
Hölsch, Schmidt and Grossniklaus (2017) argued that
Neo4j became less performant due to a less advanced
disk and buffer management, compared to RDB. We
therefore conclude that a graph database is quite
performant for integrating medical health records
generated for 5,000 subjects using a synthea program.
Our simulation demonstrated the viability of
storing and querying a large dataset. On exponentially
increasing data size, time consumption on particular
queries also increased exponentially, as demonstrated
in Figure 5. However, it appears to us that further
optimization should be of essence, considering that
query runtime increases from 20,000 to 50,000
dataset. In preparing the database, the log captured
objects, thus causing an immense burden during data
input. Said objects include relationship and graph
property, where previously mentioned graph database
stores an object explicitly instead of implying the
relationship. This feature aids graph database to
answer queries for complex relationships. As such,
longer time spent in creating an object within the
database will not be an issue. During data preparation,
we observed a longer time duration in bigger dataset.
It seems Neo4j may perform better when using
smaller data, so we suggest dividing data into smaller
chunks to improve data input performance.
5 CONCLUSIONS
As a concluding remark, the graph database is quite
performant to integrate medical health record
generated for 5,000 to 50,000 subjects using synthea.
REFERENCES
Berg, K. L., Seymour, T. L., and Goel, R. L. (2012). History
of Databases. International Journal of Management &
Information Systems (IJMIS), 17(1), 29–36.
http://doi.org/10.19030/ijmis.v17i1.7587.
Fabregat, A., Korninger, F., Viteri, G., Sidiropoulos, K.,
Marin-Garcia, P., Ping, P., … Hermjakob, H. (2018).
Reactome graph database: Efficient access to complex
pathway data. PLoS Computational Biology, 14(1),
e1005968. http://doi.org/10.1371/journal.pcbi.1005968.
Hölsch, J., Schmidt, T., and Grossniklaus, M. (2017). On
the Performance of Analytical and Pattern Matching
Graph Queries in Neo4j and a Relational Database. In
Workshop Proceedings of the EDBT/ICDT 2017 Joint
Conference. Venice.
Mathew, A. B., and Madhu Kumar, S. D. (2014). An
Efficient Index Based Query Handling Model for
Neo4j. International Journal of Advances in Computer
Science and Technology (IJACST), 3(2), 12–18.
Maula, A. W., Fuad, A., and Utarini, A. (2018). Ten-years
trend of dengue research in Indonesia and South-east
Asian countries: a bibliometric analysis. Global Health
Action, 11(1), 1504398. http://doi.org/10.1080/
16549716.2018.1504398.
Medhi, S., and Baruah, H. K. (2017). Relational database
and graph database: A comparative analysis. (JPMNT)
Journal of Process Management – New Technologies,
5(2), 1–9. http://doi.org/10.5937/jouproman5-13553.
Oakden-Rayner, L., Beam, A. L., and Palmer, L. J. (2018).
Medical journals should embrace preprints to address
the reproducibility crisis. International Journal of
Epidemiology, 47(5), 1363–1365. http://doi.org/
10.1093/ije/dyy105.