Graph Database Application using Neo4j

Railroad Planner Simulation

Steve Ataky Tsham Mpinda, Luis Gustavo Maschietto, Marilde Terezinha Santos Prado

and Marcela Xavier Ribeiro

Universidade Federal de S

ao Carlos, Rodovia Washington Luis, S

ao Carlos, Brazil

Keywords:

Graph Database, Relational Database, Railroad Planner, Simulation, Neo4j, Cypher.

Abstract:

Such as relational databases, most graphs databases are OLTP databases (online transaction processing) of

generic use and can be used to produce a wide range of solutions. That said, they shine particularly when the

solution depends, ﬁrst, on our understanding of how things are connected. This is more common than one may

think. And in many cases it is not only how things are connected but often one wants to know something about

the different relationships in our ﬁeld - their names, qualities, weight and so on. Brieﬂy, connectivity is the

key. The graphs are the best abstraction one has to model and query the connectivity; databases graphs in turn

give developers and the data specialists the ability to apply this abstraction to their speciﬁc problems. For this

purpose, in this paper one used this approach to simulate the route planner application, capable of querying

connected data. Merely having keys and values is not enough; no more having data partially connected through

joins semantically poor. We need both the connectivity and contextual richness to operate these solutions.

The case study herein simulates a railway network railway stations connected with one another where each

connection between two stations may have some properties. And one answers the question: how to ﬁnd the

optimized route (path) and know whether a station is reachable from one station or not and in which depth.

1 INTRODUCTION

A graph database is a database speciﬁcally dedicated

to the storage type of graph data structures. It is there-

fore necessary to store only the data in the nodes and

arcs. By deﬁnition, a basic graph is any storage sys-

tem providing an adjacency between neighboring el-

ements without indexation: any neighboring entity is

directly accessible by a physical pointer. The types of

graphs that can be stored vary, the undirected graph

”single standard” to hyper-graph, including of course

the propertygraphs.

Such a database therefore meets generally the fol-

lowing criteria (Domenjoud, 2012): i) Optimized

storage for data represented in a graph, with the op-

tion to store the nodes and arcs; ii) Optimized stor-

age for reading and clickstream data in the graph (or

Traversal), without using an index to browse rela-

tions; iii) Flexible data model for certain products: no

need to explicitly create an entity for nodes or edges,

unlike the rigid model tables in a relational database;

iv) Integrated API to use some standard algorithms of

graph theory (shortest path, Dijsktra, A *, calculating

centrality ...).

A graph database is optimized for searching oper-

ator data locality, from one or more root nodes, rather

than global searches.

2 CURRENT POSITION

The NoSQL movement reached its heyday in recent

years, particularly as it seeks to address several issues

that relational databases do not respond adequately:

• availability to handle very large volumes and Par-

titioning;

• ﬂexibility scheme;

• difﬁculty to represent and process complex struc-

tures such as trees, graphs, or relationships

in large numbers (In the databases ecosystem,

graphs bases are often positioned mainly in the

last two points:);

• process highly connected data;

• easily manage a complex and ﬂexible model;

• deliver outstanding performance for local read-

ings, for graph traversal;

399

Ataky Tsham Mpinda S., Maschietto L., Santos Prado M. and Ribeiro M..

Graph Database Application using Neo4j - Railroad Planner Simulation.

DOI: 10.5220/0005469003990403

In Proceedings of the 17th International Conference on Enterprise Information Systems (ICEIS-2015), pages 399-403

ISBN: 978-989-758-096-3

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

3 GRAPH DATABASE,

RELATIONAL DATABASE AND

OTHER NoSQL

(Jones, 2012) coined the term ”relational hubs” to de-

scribe the differences. According to Alistair we are

at a crossroads. One of the paths leads to the ap-

proach adopted by most NoSQL databases, in which

the data are highly denormalized and we rely on the

application to gather with typically high latency and

understanding. The other path leads to the approach

adopted by the graphs databases in which we use

the expressive power of the graph to build a ﬂexible

model and connected to the problem in question we

then request with low latency to better understand.

Relational databases are included. As the graphs

databases, relational databases have a model centered

around the queries. But this model is not as powerful

as bases graphs. In particular, it does not create on

the ﬂy arbitrarily large structures, semantically rich

and connected. To create any broad structure with

a relational database we need to plan our knuckles

in advance. To authorize changes, you end up cre-

ating a lot of columns that can be zero. Result: Tables

”dotted” fanciful joins (expensive), object-relational

impedance problems even with simple applications.

Furthermore, a graph database is adapted to the

use of graph-type data structures like trees or derived,

especially if it is to exploit the relationships between

data. The case of perfect use of a search is to start

with one or more nodes and browse the graph. It is al-

ways possible to make myEntity.ﬁndAll type of read-

ings (”ﬁnd all the entities of a kind”), but in this case

it is necessary to use an indexing system, which can

be internal as appropriate to the graph (super-nodes

for indexing) or above the graph (via Apache Lucene

for example).

Conversely, relational databases are well suited to

ﬁndAll queries through the internal structures of ta-

bles, especially if it is to perform aggregations of op-

erations on all the rows in a table.

Despite their names, they are, however, less effec-

tive on the holding relationships, which must be op-

timized by the index creation, including foreign key.

As mentioned previously, a graph database offers the

ability to browse by physical pointers relations where

foreign keys offer only logical pointer.

4 SOME GRAPH DATABASE

APPLICATION

Social networks modeling has obviously in recent

years become one of the most visible when using the

graph databases. LinkedIn comes easily to display

the degree of separation between each contact, which

is ultimately only the distance between nodes in the

graph representing people and their relationships. Al-

though very interesting, this problem is not very com-

mon because of the small number of actors in this

market.

Use cases that should reveal most common are

(Figuiere, 2010): i) modeling a set of knowledge

about people, and a market-sector organizations or

more generally ecosystem; ii) the speciﬁc business

data representation such as cinema (ﬁlms, actors, di-

rectors, and so on), publishing (books, authors, pub-

lisher, and so on) or the description of all the parts

of an industrial machine how they are interconnected;

iii) In any case, such a database will be conveniently

integrated into a heterogeneous environment Persis-

tence (thus speaks of ”polyglot persistence”) that

would address the various problems the best solution,

etc.

5 Neo4j

Neo4j is a graph database in Java designed to be em-

bedded in an application or accessed in client/server

via a REST API. The graph manipulation in Java

Neo4j is very natural with its API: Node and Rela-

tionship are the major classes used to model a graph

while adding a set of properties for each node and re-

lationship.

Imagine a social networking application such as

Facebook, Linkedin and Viadeo in which the user can

bind with friends. This user wants to know what

friends he has in common with other friends. With

a graph, he could easily see the relationship. Here is

a basic example (Figure 1):

Figure 1: example of social network relationship.

The implementation of the below scheme in a re-

lational database is not easy. There are several ways

to do so, as the pattern Querie to make resolutions,

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

400

but it is complicated to use. To solve these problems

several type of graph databases exist, including the fa-

mous base Neo4j and HyperGraphDB and InfoGrid.

In a graph we can store two types of information:

nodes and links, or , in other words, nodes and edges.

Each node can have multiple links that point to other

nodes. It is through this that relationships can be be-

tween nodes. They will thus allow us to organize

them. In addition, each node can have multiple prop-

erties or attributes for stoker as key / value our data.

In brief: A graph store data in nodes that have

properties. The nodes are organized by relations that

have they same properties. A traversal allow navigat-

ing in the graph from a node and identiﬁes roads or

paths with as nodes ordered according options. An

index is mapped by the properties of nodes and rela-

tionships.

And where is Neo4j? The database used to man-

age all types of objects, nodes, relationships and in-

dexes. And through the algorithms, internal and ex-

ternal tools such as Apache modules Lucuene, Cypher

or Gremlin recovery of our data is easier.

Developing applications on Neo4j is a breeze.

These language guides help you connect to Neo4j

from your preferred programming language. In this

work we used Java language.

6 CYPHER

Neo4j is generating much interest among NoSQL

database users for its features, performance and scal-

ability, and robustness. The software also provides

users with a very natural and expressive graph model

and ACID transactions with rollbacks(PANZARINO,

2014). However, utilizing Neo4j in a real-world

project can be difﬁcult compared to a traditional rela-

tional database. Cypher ﬁlls this gap with SQL, pro-

viding a declarative syntax and the expressiveness of

pattern matching. This relatively simple but powerful

language allows you to focus on your domain instead

of getting lost in database access. With cypher, very

complicated database queries can easily be expressed

through.

7 CASE STUDY

In our case study (ﬁgure2) we are simulating a rail-

way network railway stations connected with one an-

other and each connection between two stations has a

property: the distance. The question is: how one can

ﬁnd the optimized path and know whether a station is

reachable from one station or not. To do so, we need

to use two approaches:

1 breadth-ﬁrst (Figure 3) search (BFS), earch al-

gorithm that begins at the root node and ex-

plores all the neighboring nodes. Then for each

of those nearest nodes, it explores their unex-

plored neighbor nodes, and so on, until it ﬁnds the

goal. In Neo4j-Cypher, one is traversing the nodes

given some additional criteria only relations with

a property visibility=public should be traversed,

aiming at ﬁltering nodes.;

2 in addition of BFS, one is using Dijkstra algorithm

(Figure 4) to calculate the shortest route from one

station to another. It picks the unvisited vertex

with the lowestdistance, calculates the distance

through it to each unvisited neighbor, and updates

the neighbor’s distance if smaller. Mark visited

(set to red) when done with neighbors.

Figure 2: Railroad simulation.

Figure 3: DFS in Cypher.

8 EXPERIMENT AND RESULTS

The projects implementation using Java language

with Neo4j database access can occur in two different

ways: using JDBC connection driver (Java DataBase

Connectivity) or with the use of connection class be-

longing to Neo4j libraries imported with build depen-

dencies in MAVEN project speciﬁed in the pom.xml

ﬁle.

GraphDatabaseApplicationusingNeo4j-RailroadPlannerSimulation

401

Figure 4: Dijkstra’s algorithm in Cyphe for ﬁnding the

shortest path between Amazonas and Minas considering

both direction.

Figure 5: Running the code above (Figure 3 and 4), we will

get such cypher output, that shows the shortest path between

Amazonas an Minas and the cost (depht and distance).

For connection to Neo4j with JDBC one must im-

port the JDBC connection driver for the project and

start the Neo4j service on the local machine on the

default port 7474 (http: // localhost: 7474 /) if the

project is under local basis. Thus, requests for access

to the database management system occur by using

already established connection on the local machine,

i.e, this process avoids the opening and closing the

connection each request performed by the developed

system.

Using the JDBC driver, it is possible to use

Cypher, the speciﬁc language for data manipulation

in Neo4j environment and return results using Result-

set. The process occurs using established connection

for both queries (MATCH) and inserts (CREATE) in

the database. This process denotes a shorter duration

compared to the process using EmbeddedDatabase

method of GraphDatabaseFactory class.

The use of the method GraphDatabaseFactory of

the EmbeddedDatabase class creates a ﬁle if it does

not exist, which will store the data managed by Neo4j

and opens the connection. When the database already

exists the connection is established in all required pro-

cess. In such cases it is always necessary to initiate

a connection and at the end of the process to close

it. This task causes an excessive consumption of time

compared to using JDBC connection driver.

Importantly, for the use of the method Embedded-

Database Neo4j service should not be started out of

the application, therefore, the application tries to es-

tablish a connection that is already established, this

attempt will result in a blocked connection error.

Although it seems a good option to use JDBC con-

nection driver due to the gain in response time, some

methods belonging to Neo4J library may not be used,

in the case of searches for depth and using dijkstra

algorithms. This fact is because the dijkstra algo-

rithm, for example, requires the identiﬁcation of an

initial node to check nodes related and as the search

for JDBC returns the data nodes as String and not as

Node object, it would be impossible to identify and

begin the search. Now with the use of Neo4j libraries

one can open the connection and perform a search us-

ing the ExecutionResult class that will receive the re-

turn of the data as a list of Nodes objects.

9 CONCLUSION AND FUTURE

WORKS

The resulting model and associated queries are simply

projections of questions you want to ask about your

data. With Cypher, language query Neo4j, the com-

plementarity of these projections becomes apparent:

the paths used to create the structure of the graph are

the same as those used for querying.

One noted that this application, even though it be

possible to be implemented with relational databases,

the performance would not be as good as the graph

one. Moreover, it has been observed the when pro-

cessing many concurrent transactions, the nature of

the graph data structures helps distribute the transac-

tional cost through the graph. Usually as the graph

grows, the transactional conﬂict disappears. In other

words, the higher the graph, the larger the ﬂow rate is

important, which is an interesting result.

As future works, it is being developed and im-

plemented some strategies capable of replacing some

meta-heuristic for railway and railroad optimization

since both of scenarios can be represented as graphs.

Moreover, some prediction techniques, such as esti-

mate the planning that better ﬁt for some period of

time based on similar past plannings or datas are be-

ing implemented as well.

REFERENCES

Domenjoud, M. (2012). Bases de donnes graphes : un tour

dhorizon. France.

Figuiere, M. (2010). cember 5th, 2014 MICHAL FIGU-

IRE, NoSQL Europe : Bases de donnes graphe et

Neo4j. http://blog.xebia.fr/2010/05/03/nosql-europe-

bases-dedonnees-graphe-et-neo4j/, acessed on De-

cember 6th, 2014.

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

402

Jones, A. (2012). http://www.infoq.com/fr/articles/graph-

databases-bookreview, acessed on December 5th,

2014.

PANZARINO, O. (2014). Learning Cypher, Onofrio Pan-

zarino. Packt, 1st edition.

GraphDatabaseApplicationusingNeo4j-RailroadPlannerSimulation

403