imported data can be arbitrarily structured and the
end user is typically only interested in a subset of the
attribute hierarchy, a separate metadata collection
was used to track the desired attributes. This proved
useful in reducing network bandwidth, and
considerably simplified client-side processing and
display.
Access to the data is provided through an
interactive web page that allows the user to
formulate geospatial queries towards a selected
database collection. Metadata information is used to
provide the user with automatically generated,
meaningful filter criteria (sliders for numeric values,
check boxes for enumerations and string searches for
arbitrary text) and to specify the format of the result
set. To minimize network latency and round-trip
delays, only geospatial queries are processed within
the database itself. The result set is then processed
by JavaScript code within the web page, allowing
near-instant refinement and updates as the user
explores varying filter criteria. Unfortunately,
support for geospatial queries was very limited at the
time of this implementation, only allowing us to
filter for a coarse bounding box (min/max tests for
latitude and longitude) within the database query
itself. Therefore the required precise distance-from-
polygon metric is calculated in a separate pass
through the result set before returning it to the client.
The query language for MongoDB allows queries of
moderate complexity but can be enhanced by java
script code which is slow but may be executed in
parallel following the map-reduce pattern.
Multiple instances can be combined to form
database clusters for resilience against server or
connection failure. Databases can also be split
(sharded), using an arbitrary attribute as a key for
distribution over several instances (shards). Each
shard then only carries a fragment of the full
database and the system automatically attempts to
equalize the workload carried by all participating
shards. Clustering and sharding are entirely
transparent to the client, allowing the database
backend to be rearranged and extended as required.
MongoDB allows the distribution of large databases
over multiple networked instances and queries are
processed in parallel. This is desirable for compute
intensive queries and aggregations (which can be
formulated as Map-Reduce patterns).
2.3 Graph Oriented Database
Graph oriented databases were considered, because
of the required functionality to add new information
and links between existing data.
Of the several possible database systems
available, the RDF (Resource Description
Framework) database Allegrograph (Allegro, (n. d.))
was chosen, because it supports several
programming interfaces, e.g. Java Jena, Java
Sesame, Python, Lisp and others. Also geospatial
types and distributed queries are supported. The
reference implementation uses the Java Jena
interface for importing data and the Java Sesame
interface and SPARQL to query the database.
Source files contain flat tables of data. The import
supports three different operations. The most basic
operation is an import of new entities, where each
imported row is treated as a new instance of a
definable type, the column header is the predicate
and the actual value is the object of the triple. The
second possibility is to append attributes to an
existing entity by letting the user define which
columns in the source file must match which
columns in the existing database. The third mode is
to add links between existing data. The user defines
which columns in the source table must match which
properties of the subject instance and the object
instance. In all three modes the user can define a
data type for each column. If possible, the value in
the table is then automatically converted to this type
and can hence be queried appropriately. If a
conversion is not possible, e.g. the source field
contains values like “15 million” instead of
“15,000,000”, the value is still imported but without
a type. These values can then be displayed in tables
like all others but are not available for queries.
The user can access the data using an
automatically generated query interface and two
automatically generated report formats. SPARQL is
used as a query language and is generated by setting
filters However, in the implementation of
Allegrograph, the extensions for geospatial queries
are not yet available for SPARQL. Other functions
have to be combined to filter the original query
result. Two different implementations of user
interfaces were implemented. In the first method, the
query interface is comprised of a filter table that
displays input fields for all predicates. The controls
for entering the filter depend on the data type
defined during the import, e.g. a textbox for texts or
a checkbox for boolean values. The result of the
preliminary search is a table containing all entities
that match the search criteria. Values in the table
that have additional triples associated to it with
further information can be clicked on to show a
more detailed report. This second report format
shows all triples associated with the selected entity,
both those where the entity is the subject as well as
EVALUATION OF DATABASE TECHNOLOGIES FOR USAGE IN DYNAMIC DATA MODELS - A Comparison of
Relational, Document Oriented and Graph Oriented Data Models
221