representations are intuitive presentations for these
conceptual representations where nodes represent
entities and arcs or edges represent the relationships.
To facilitate working with the representations,
entities and relationships have labels, however, the
labels are merely for the purpose of discussion and
do not suggest the “meaning” of the node or
relationship. Nodes and edges derive their meaning
strictly from their connections.
4.2 Graph-based Information
Representation
Graphs, as a mathematical construct, have been
studied for hundreds of years. More recently, graphs
have been applied to practical problems involving
networks, particularly in transportation and
communication. The key observation is that network
problems focus, not on the things, but on the nature
of the connections between things. The essential
information in the traveling salesman problem is not
the destination cities, but the ways in which those
cities are connected in a transportation network and
the cost of making a trip between two particular
cities. As we observed earlier, the interpretive
frameworks that enable us to operate in terms of
information emphasize relationships. A graph-based
representation is the natural choice for expressing
relationships. (Ebert, 1996)
In a graph-based information representation scheme
nodes are labeled end-points that represent a single,
atomic entity and arcs represent an assertion of an
association between two nodes. Arcs are typed so
that multiple associations may be expressed in a
single representation. Values may be associated with
each node to carry information that may be needed
at other levels of the system (e.g., a string label to be
displayed to a user) but are treated, insofar as the
graph representation is concerned, as opaque blocks
of data.
Because information is expressed in relationships,
systems that implement a graph-based information
representation will be optimized to store and manage
networks of relationships. Graph theory considers
directed and undirected arcs. We have found that a
pair of directed arcs, where one arc points from the
first node to the second and another points from the
second to the first, gives us a general construct that
can be used as either a directed or undirected
connection. More importantly, this representation
captures the fact that if we can assert that one object
has a relationship with another, we also implicitly
assert that the other object has a reciprocal
relationship with the first. By making the reciprocal
relationship explicit the graph-based representation
naturally provides back-links that double the
possible traversal patterns. We call this construct a
relationship.
With the majority of the information residing in the
networks of relationships, nodes must represent
single, finer-grained entities. Because any two nodes
in a graph may be linked by a relationship, a concept
need only be expressed once and represented by a
single node. This has the important side-effect of
naturally creating a fully-normalized representation.
The notion that nodes represent atomic entities can
be a difficult concept. In a graph-based information
representation scheme, each node should represent
one, atomic thing. In practice, this generally means
that what would be an object in an object-oriented
system or a row in a relational system would be a
network in a graph: The graph representation of, say,
an employee record would have a node for each field
in the record and all of those nodes would be
connected with the node that represents the
employee record.
4.3 Performance of Graph-Based
Information Representation
Extracting precise information can be time
consuming and expensive when working with
complex data sets. For example, consider the
following comparison of using a modeling paradigm
implemented in a graph based storage system versus
the current solutions in a relational database system.
Administrators using relational database technology
strive to optimize queries across multiple tables, but
this often involves iterative cycles for filtering out
irrelevant information and structuring statements
that reduce the answer set based on ordered
sequences. Because of this, relational queries
through chained data are often limited to four or five
connection levels. In many cases, a four or five
degree search becomes unmanageable, overly time
consuming, and requires additional hardware and
software.
Queries when using a graph database are
significantly simpler, with the ability to traverse data
that was never de-structured to fit into tables. To a
large degree, data in a graph follows its natural
pattern of existence with relevant information
related through close association. This pattern
follows even as the data is committed to disk.
To illustrate, assume a large data set with records
indicating parent-child relationships but no extended
INFORMATION-CENTRIC VS. STORAGE/DATA-CENTRIC SYSTEMS
505