can be used by any means provided to access the
Oracle database, like the Oracle Application Server,
OBDC, etc. The Redland framework is not designed
as a standalone solution, which accepts common ac-
cess methods. It rather provides API interfaces for use
from within other applications.
Indexing method: Oracle implements two three-
column B
∗
-Tree indexes on the triple table, based on
typical query patterns. The first one is Predicate, Sub-
ject, Object, the second one is Predicate, Object, Sub-
ject. While Sesame DB uses the underlying database
in an abstract manner, all storing and indexing issues
are handled by this database. However, Sesame’s na-
tive version uses separate indexes on subject, predi-
cate, and object respectively. Kowari uses a combi-
nation of AVL trees and B
∗
-Trees to incorporate ad-
vantages of both approaches. Frequently used subsets
of graph patterns are indexed directly. Redland uses
three Hash indexes, each mapping two triple elements
to the third one. YARS implements B
+
-Tree indexes
for all combinations of quad patterns.
Implementation: Sesame, Kowari, and YARS are
all implemented in Java
TM
, where Redland is imple-
mented in C. The Oracle table function, however, is
seamlessly integrated in the table function interface
of the Oracle Kernel.
RDFS Semantics: YARS does not support RDFS se-
mantics or other rule bases for reasoning on the RDF
data. Oracle implicitly supports RDF Schema rules,
but is able to load user defined rule bases in addition.
Kowari further supports OWL constructs. The Red-
land framework supports RDF Schema only in terms
of translation of specific vocabulary into pure RDF,
and does not support any RDFS reasoning.
In Table 1 we show an overview of the features
discussed in this section for all RDF large scale stor-
age approaches considered in this study.
5 PERFORMANCE EVALUATION
For testing RDF representation of UniProt data
(UniProt, 2007) are used, which scales up to 80 mil-
lion triples. Oracle’s
RDF MATCH
function (Eugene In-
seok Chong and Souripriya Das and George Eadon
and Jagannathan Srinivasan, 2005) has been tested
and demonstrated reasonable performance on large-
scale RDF data. The tests showed that the query per-
formance is highly scalable, since query runtime does
not change significantly for scaling the data size from
10 to 80 million triples. In fact, the longest runtime
in these tests took a query, matching 6 triples with
5 variables and results limited to 15,000 rows, which
returned the answer in about one second.
Testing of YARS, Sesame and Redland was per-
formed on 4 different queries representing typical
query patterns. The results show, that YARS gen-
erally outperforms the other approaches, except for
queries when simple Hash lookup is possible. In this
case, Redland has better performance, due to its Hash
indexing method. That means, that Redland is opti-
mised for getting the sources, getting the targets, or
getting the arcs in an RDF graph, provided all other
information. Note that only triples are considered in
this test, although YARS and Kowari operate on a
quad structure. This, however, only adds context in-
formation and can be regarded as constant for these
test purposes.
Because Kowari could not be implemented, per-
formance analysis can only be discussed in isolation
based on the authors’ testing results (David Wood
and Paul Gearon and Tom Adams, 2005). The test
consisted of building data and index structures for
up to 235 million triples, as well as simultaneous re-
quests by 250 simulated clients. The authors state
that Kowari compares with MySQL using a simplistic
schema. Apart from that, no comparison to competing
systems, and no qualitative query performance results
are given.
5.1 Analysis
Large scale testing has been carried out using the
UniProt (UniProt, 2007) data of about 80 million
triples. Oracle’s
RDF MATCH
function, YARS, Red-
land, Sesame MySQL, and Sesame Native were com-
pared. In Table 2 we present index space require-
ments for different approaches to store and manage
large scale RDF data. A reason for those differences
is that Sesame Native uses 4 byte IDs, where Kowari
and YARS uses 8 byte IDs. Redland does not use IDs
at all, and operates directly on URIs, which results in
a relatively large index.
All discussed approaches for large scale RDF stor-
age but Redland use variations of the B-Tree as an
index structure. Redland uses three Hashes for fast
lookup of graph patterns with only one variable in the
triple. In the comparison Redland performed best in a
query, where only the subject was requested for con-
stant predicate and object, which Redland carries out
as a simple Hash lookup. Since Redland only pro-
vides three Hash indexes, all other queries involv-
ing more complex graph patterns need cumbersome
joins and combinations make Redland’s performance
ICSOFT 2007 - International Conference on Software and Data Technologies
106