stores, first began to gain popularity in early 2009. It
is a topic that has gained credit from the IT
community but has yet to garner large-scale
academic study.
The advantages provided by these systems are
evident; the freedom to organize the structures so
that they are not restrictive, and the speed to process
information stored (higher than RDBMS when
larger is the amount of information to be processed
in a query). Table 2 shows the queries to simulate
some of the types of queries used in our provenance
systems. For example, traversals are necessary to
determine data objects (nodes) derived from or
affected by some starting object or node:
• Q0: Find all orphan nodes. That is, find all
nodes in the graph that are singletons, with
no incoming edges and no outgoing edges.
• Q4: Traverse the graph to a depth of 4 and
count the number of nodes reachable.
• Q128: Traverse the graph to a depth of 128
and count the number of nodes reachable.
Table 2: Structural query results (in milliseconds).
Database
MySQL
Q0
Neo4j
Q0
MySQL
Q4
Neo4j
Q4
MySQL
Q128
Neo4j
Q128
1000int 1.5 9.6 38.9 2.8 80.4 15.5
5000int 7.4 10.6 14.3 1.4 97.3 30.5
10000int 14.8 23.5 10.5 0.5 75.5 12.5
100000int 187.1 161.8 6.8 2.4 69.8 18.0
1000char8K 1.1 1.1 1.1 0.1 21.4 1.3
5000char8K 7.6 7.5 1.0 0.1 34.8 1.9
10000char8K 14.9 14.6 1.1 0.6 37.4 4.3
100000char8K 187.1 146.8 1.1 6.5 40.9 13.5
1000char32K 1.3 1.0 1.0 0.1 12.5 0.5
5000char32K 7.6 7.5 2.1 0.5 29.0 1.6
10000char32K 15.1 15.5 1.1 0.8 38.1 2.5
100000char32K 183.4 170.0 6.8 4.4 39.8 8.1
For the traversal queries, Q0, Q4, and Q128, Neo4j
was clearly faster, sometimes by a factor of 10, as
detailed in Table 2. This was expected since
relational databases are not designed to do
traversals.
We conclude that the Graph Database systems
are the most optimal for searching existing
connection between nodes, providing response times
much lower than those obtained using Relational
Databases, which is the critical problem of flight
management systems.
3.3 Graph Database Tool
Based on previous experiments, we have chosen the
Graph DBMS architecture. In this section, we
discuss which is the most suitable tool to implement
a system of this type, which contains several
requirements when implementing this solution.
These requirements are the speed of data processing,
heterogeneity of access to information, migration of
the platform, or a possible scalability of the system.
Among all the possibilities, and based on a study of
efficiency of Graph DBMS platforms, we have
determined that the most efficient tool could be
Neo4j (Woodie, 2015). It is a Graph DBMS that we
can get in free community versions, or Enterprise,
which involves an economic outlay, taking
advantage of technical assistance, which can be of
great help in certain occasions.
After choosing the best Graph DBMS tool used to
develop the application, it is necessary to analyze in
which platform will deploy it. When choosing where
to host the database, there are several possibilities:
hosting on AWS / EC2, Windows Azure, or Cloud
Hosting providers like GrapheneDB, GraphStory,
Structr, etc. Thinking about the flexibility of the
system, the best option may be to acquire a server in
a hosting. This option is based in being able to
deploy databases, implement Data Mining system,
and develop diverse features under the same hosting.
3.4 Problem Definition
When thinking about how to approach the solution
to the problem of connections between different
flights, we have the idea of a large interconnected
network in which all points will not be
interconnected, but we have a variety of possibilities
to arrive at the desired point from a given origin. By
this way we can model the problem, and the best
option is using graphs, because the essential
information resides in the interaction between
connections, naturally expressed by graphs DBMS.
An example of the nature of the problem is shown
in Figure 5. In order to determine the possibilities of
reaching from Paris-Orly Airport (ORY, Paris,
France) to the International Brussels Airport (BRU,
Brussels, Belgium), we have not defined a direct