VERSION CONTROL FOR RDF TRIPLE STORES

Steve Cassidy and James Ballantine

Department of Computing, Macquarie Univeristy, Sydney, Australia

Keywords:

RDF Version Control Annotation.

Abstract:

RDF, the core data format for the Semantic Web, is increasingly being deployed both from automated sources

and via human authoring either directly or through tools that generate RDF output. As individuals build up

large amounts of RDF data and as groups begin to collaborate on authoring knowledge stores in RDF, the

need for some kind of version management becomes apparent. While there are many version control systems

available for program source code and even for XML data, the use of version control for RDF data is not a

widely explored area. This paper examines an existing version control system for program source code, Darcs,

which is grounded in a semi-formal theory of patches, and proposes an adaptation to directly manage versions

of an RDF triple store.

1 INTRODUCTION

Semantic web applications store core data as RDF

triples. In many cases, this data is authored manually

in some form or other; as such, it can be seen as a form

of program code and should be subject to the same

kind of management. In particular keeping track of

different versions of a code-base is especially impor-

tant when more than one author is involved. Tools are

needed to allow authors to synchronise their working

copies, to exchange updates with each other, and to al-

low changes to be undone if necessary when problems

are discovered. While it would be possible to manage

RDF under an existing text-based version control sys-

tem, it would be hard to guarantee consistency and

correctness of a triple store managed in this way. This

paper explores a novel model of version control em-

bodied in the Darcs Roundy (2006) system to manage

RDF data in the triple store.

The context of this work is a project to sup-

port Linguistic Annotation Cassidy and Harrington

(2000); Bird and Liberman (2000) using RDF as the

back-end data store. This application is similar in

some ways to the Annotea and Vannotea projects

R. Schroeter (2003) but the annotations themselves

are often inter-related and represent linguistic inter-

pretation of source data rather than comments and

opinions on media. Our current work is aimed at

providing a collaborative annotation facility to allow

groups to develop and share annotations over the web.

The provision of a version control mechanism will

ensure that different copies of the same annotations

can be kept synchronised where appropriate and that

changes and updates to annotations can be managed.

In our application, it is common for a community to

settle on a ‘correct’ annotation through negotiation;

hence the RDF store will commonly be modiﬁed over

time with contributions from different parties. Even

though our use of RDF isn’t entirely aligned with its

more general use in the Semantic Web, we feel that

the need for version control for RDF is broader than

just our project and that the proposals in this paper

will have more general applicability.

2 BACKGROUND

A version control system is used to manage changes

to a body of work, for example a set of source code

documents in a project. Its major function is to

record changes such that the changes can be rolled

back and changes made by different authors can be

Cassidy S. and Ballantine J. (2007).

VERSION CONTROL FOR RDF TRIPLE STORES.

In Proceedings of the Second International Conference on Software and Data Technologies - Volume ISDM/WsEHST/DC, pages 5-12

DOI: 10.5220/0001340100050012

 SciTePress

merged into a single code-base. The most common

application for version control is the management of

changes in program source code and here the most

widely used system is CVS Cederqvist (2003) al-

though more recently competitors such as Subversion

Collins-Sussman et al. (2004) have been developed.

The majority of version control systems record

different versions of documents in their store, opti-

mising the store to make common operations inex-

pensive. For example, CVS stores the most recent

version of each ﬁle and a set of patch ﬁles that encode

the changes needed to revert to the previous version.

Stepping back to the previous version is fast, but step-

ping back a long way requires the sequential applica-

tion of patch ﬁles.

A recent development in version control has been

the use of a distributed development model where

multiple copies of the version control repository are

maintained. In the CVS model, there is one central

canonical copy of the version history that all develop-

ers use to send and receive updates. In the distributed

model supported by Darcs, Arch Lord (2006) and Bit-

keeper

among others, each developer has a copy of

the version history and changes can be registered lo-

cally and passed between developers without the need

for a central server.

2.1 Version Control and RDF

The On-to-Knowledge project has published a num-

ber of papers on managing changes in RDF ontologies

Klein et al. (2002); Kiryakov and Ognyanov (2002).

The focus of much of their work is on the description

of different versions of ontologies although Kiryakov

and Ognyanov (2002) note that much of their method-

ology applies equally to plain RDF descriptions as

it does to RDF(S) schema expressed in RDF. How-

ever, their main goal is to track changes, not manip-

ulate them and their proposal doesn’t claim offer a

full version control mechanism. Meta-data is asso-

ciated with speciﬁc versions of an ontology allow-

ing users to identify appropriate versions and reason

about the compatibility of newer and older versions.

Their papers contain some relevant work on automat-

ically comparing ontologies to ﬁnd differences.

When semantic web data is viewed as a knowl-

edge store from which new knowledge will be de-

duced, a major concern is maintaining the consis-

tency of the knowledge store. This problem has a lot

in common with the version control problem in that

consistency must be maintained in the face of addi-

tion and deletion of assertions. Deletion especially

http://www.bitmover.com/

presents the problem of ﬁnding what deduced knowl-

edge should be removed to keep the knowledge store

consistent. Broekstra and Kampman’s work on truth

maintenance Broekstra and Kampman (2003) is con-

cerned with tracking the deductive dependencies be-

tween statements in the RDF graph to avoid having to

re-run deductive processes when statements are added

or deleted from the graph.

The source or provenance of data is also a ma-

jor concern when reasoning about knowledge drawn

from diverse locations around the Semantic Web. A

number of projects are investigating the representa-

tion of the provenance of statements within the RDF

model enabling reasoning about both a fragment of

data and its source. While the RDF reiﬁcation mech-

anism provides one way of making statements about

statements, as pointed out by Watkins and Nicole

Watkins and Nicole (2006) reiﬁed statements cannot

be used in semantic inferences. They propose the use

of the Named Graph mechanism as implemented in

Jena Bizer et al. (2005) to record provenance infor-

mation about statements in the graph. They have used

this mechanism to implement a software version con-

trol system Watkins and Nicole (2005) which uses

RDF to describe changes to text based source code.

An interesting result of this work is the ability to rea-

son about the version meta-data using RDF tools to,

for example, ﬁnd instances where a developer reverts

a change made by another developer. The use of RDF

in this role adds a useful additional capability to the

normal version control model.

Some recent work has directly directly addressed

the problem of version management for RDF knowl-

edge bases. The recently released IBM BOCA Sys-

tem

is an RDF repository that supports rollback of

transactions in the RDF store. Very little informa-

tion is available at this time about how this is imple-

mented or the capabilities of the model used. Auer

and Herre Auer and Herre (2006) present a model

based on atomic changes to RDF triple stores which

enables a kind of transaction based version manage-

ment enabling changes to be rolled back if neces-

sary. The main focus of the paper is on the appli-

cation of change management to the evolution of on-

tologies and the authors discuss some ideas for evolu-

tion patterns which enable them to characterise sets of

changes to an ontology as, for example, adding a new

class or changing the cardinality of a property. The

intention of these patterns is to provide more a useful

change history to a human author than if the raw RDF

changes were shown. The goals of this work are very

close to that of our own project and there are many

http://ibm-slrp.sourceforge.net/v1/wiki/

index.php/BocaUsersGuide

ICSOFT 2007 - International Conference on Software and Data Technologies

overlaps in the solutions we put forward. This will be

discussed further at the end of this paper.

2.2 Darcs and the Theory of Patches

One issue with common version control systems such

as CVS and Subversion is that they are not based on

any formal model of how changes may or may not be

combined with each other. This means that they are

unable to convincingly assert that the result of merg-

ing a change into a working version maintains any

kind of consistency constraint. One system which at

least attempts to address this issue is the recently de-

veloped Darcs Roundy (2006) system.

Darcs is a distributed version control system built

to manage software source code; Darcs views a ver-

sion as a sequence of patches which when applied to

an empty source tree, builds a particular version of

the code. Also unlike other version control systems,

the design of Darcs is built on a theory of patches – a

semi-formal model of the way that individual patches

can be manipulated to implement the required version

control operations. This theory of patches is indepen-

dent of the nature of a patch or change and so we can

make use of it in thinking about how changes to an

RDF store might be managed.

In Darcs, a patch is a set of changes made to a doc-

ument (or collection of documents) which has been

recorded by a user. A patch will tend to implement

one meaningful modiﬁcation to program source code,

such as adding a new function or ﬁxing a bug. The

patch may include changes to multiple documents and

to different parts of individual documents. A version

of the document collection can be described by the

sequence of patches that led up to that version; this is

called the context of the version. A patch is said to

move the working copy from one context to another.

A patch is written as an uppercase letter, option-

ally annotated with the left and right contexts. The

sequence of patches A followed by B can be written:

or more simply:

the superscripts in the ﬁrst notation denote named

contexts:

can be read as “patch A modiﬁes the

repository from context o to context a”.

A Darcs patch must have the property of being in-

vertible; that is, it must be possible to construct a sec-

ond patch that will reverse the effect of a patch on

the current context to get back to the previous con-

text. The inverse of a patch is deﬁned as the simplest

patch that achieves this. The inverse is written as A

−1

and by deﬁnition we know that the application of a

patch followed by its inverse will bring us back to the

original context:

−1 o

2.2.1 The Commute Operation

A fundamental operation when manipulating patches

is to be able to reverse the order of two patches. As

will be shown later, this forms the basis of the merge

operation between two repositories and the undo op-

eration for a patch which isn’t the most recent.

Darcs deﬁnes the commute operation as the re-

ordering of two patches. This can be written as:

↔

The commute operation is symmetrical in that

both sides of the equation have the same effect. Note

however that the individual patches on the right hand

side have been modiﬁed. The new patch B

is deﬁned

as having the same effect as B but is modiﬁed to allow

it to apply to the context o; similarly, A

has the same

effect as A but is modiﬁed so as to apply to the context

In Darcs, which deals with changes to text ﬁles,

the modiﬁcations needed can be quite complex since

adding and deleting lines can change where subse-

quent changes should be applied. However, in RDF,

as we will see later, no changes are ever needed when

commuting patches.

In some cases the commute operation may not be

deﬁned. For example, if one patch modiﬁes a line

within a region added by the earlier patch. In this

case there is said to be a dependency or conﬂict be-

tween the patches which prevents them from being

re-ordered. The effect of a dependency will be seen

later.

2.2.2 The Revert Operation

To revert the most recent patch from the current con-

text all that is required is to remove the patch from

the context (the head of the sequence). To update the

working copy of the document set, the inverse of the

patch can be applied to undo the changes. So, to revert

the patch C from the sequence ABC, the patch C

−1

applied giving AB.

The ability to change the order of a sequence of

patches means that it is possible to revert patches

which aren’t at the head of the sequence. To do so,

the patch is moved to the head by a series of com-

mute operations before the inverse patch is applied.

For example, to revert A from the sequence ABC the

sequence of operations is:

VERSION CONTROL FOR RDF TRIPLE STORES

ABC

C Commute A and B

Commute A

and C

−1

Apply the inverse patch of A

The resulting patch sequence

This process will fail if any commute operation

fails; that is if the patch being reverted has a de-

pendency relationship with any of the subsequent

patches. The process of reverting the patch is able to

identify which subsequent patches are dependent and

hence would have to be reverted along with the target

patch to maintain consistency.

2.2.3 The Merge Operation

Two patches which apply to the same context are said

to be parallel patches. This situation can arise when

two people work from a common base each making

changes to the documents. As long as the changes

do not conﬂict with each other, it should be possible

to combine them into a sequence which reﬂects both

changes. This is the merge operation and it can be

derived in the following manner from the inverse and

commute operations.

A Choose patch A to apply ﬁrst.

−1

Apply the inverse of A.

−1

B Apply B since we are back at the ini-

tial context.

−1

Commute A

−1

and B to put the in-

verse patch at the end

Drop the ﬁnal patch to give the de-

sired sequence of patches

It should be clear that the outcome of merging A

and B could be either of AB

or BA

depending on

which patch was applied ﬁrst and that the result of

both of these should be the same. The merge oper-

ation will succeed as long as the commute operation

between A

−1

and B succeeds, that is as long as there

is no dependency between these two. The resulting

patch B

above is a modiﬁed version of B which can

be applied to the context including the changes in-

troduced by A. This modiﬁcation is the same as that

required for the commute operation described above.

3 VERSION CONTROL FOR RDF

In the context of an RDF triple store, there are a num-

ber of ways in which to conceive of a version con-

trol system. A primary choice is the granularity of

changes that will be recorded in patches. Following

Kiryakov and Ognyanov (2002) we settle on the RDF

statement (triple) as the smallest directly manageable

element. This implies that it is not possible to add

just a resource name to a store if it doesn’t take part in

some RDF statement. Kiryakov and Ognyanov also

state that “An RDF statement cannot be changed – it

can only be added or removed”. This restriction fol-

lows from the observation that changing a statement

< a, b, c > to < a, x, c > is entirely equivalent to re-

moving the ﬁrst triple and adding the second.

While the RDF statement is the smallest unit of

change in the triple store, for the most part a recorded

patch will consist of many additions and deletions

from the store. A patch should be a meaningful

change to the store such as the addition of a person

or event; however, this is not an enforceable restric-

tion as users of the version control system must decide

what makes sense as a meaningful change. Hence, in

general, a change will involve the addition or removal

of a sub-graph from the triple store. We note here

that recent work by Auer and Herre (2006) includes

a very useful discussion of the role of blank nodes

in recording, and more importantly in communicat-

ing, changes to a triple store. They deﬁne an atomic

graph being one which can’t be split without dupli-

cating blank nodes and use that as the smallest unit

of change in a triple store. This project has not con-

sidered issues related to blank nodes because our ap-

plication doesn’t make use of them, but the work of

Auer and Herre is entirely compatible with the ideas

expressed here as will be discussed at the end of this

paper.

3.1 RDF Patches

This section deﬁnes the properties of patches used in

our version control system, shows how patches are

inverted, how conﬂicts are detected between patches

and how alternate patches are computed.

A patch to an RDF store consists of a set of

additions and deletions of sub-graphs. Unlike with

changes to text ﬁles, no context or line number is re-

quired for each addition or deletion. The main con-

cern when manipulating patches is to be able to invert

any patch and to work out whether two patches con-

ﬂict with each other.

Given these requirements, the basic form of a

patch should be two sub-graphs, one to be added and

the other to be deleted from the store. An example is

shown in Figure 1.

While this is the basic form of a patch it may be

desirable to add additional information to make iden-

tifying conﬂicts easier in some cases. For most uses

of the version control system, modiﬁcations will be

ICSOFT 2007 - International Conference on Software and Data Technologies

add:

:john foaf:knows :bob .

delete:

:john foaf:knows :mary .

Figure 1: A simple RDF patch.

made by human authors; however, it may also be use-

ful to include inferred knowledge. If this is to be done,

the additions (assuming that inference does not delete

knowledge) will be based on some of the existing con-

tents of the store. These should be recorded as support

for the new triples because their removal would cause

a conﬂict with the new patch. Hence the represen-

tation of a patch is extended to include an optional

dependency sub-graph taken from the working store.

These dependancies must be explicitly asserted, we

do not propose any mechanism for discovering de-

pendancies from the knowledge base or any inference

process. Figure 2 shows an example of such a patch.

add:

:john foaf:knows :bob .

depends:

:john foaf:works :acmecorp .

:bob foaf:works :acmecorp .

Figure 2: An example RDF patch including a dependency

sub-graph.

Patch inversion can be achieved by swapping the

add and delete sub-graphs in a patch fulﬁlling the re-

quirement that every patch should have an inverse.

This corresponds to the intuition that removing a

change can be achieved by deleting anything added

and adding anything deleted by the change. An inter-

esting question is whether a dependency on the patch

should remain after inversion. In the above example

we infer that John knows Bob based on them both

working for AcmeCorp; meaning that we should not

apply this patch if the triple store does not contain the

two dependent triples and that removing either depen-

dent triple would conﬂict with this patch. If we were

to invert this patch, so that John doesn’t know Bob,

we can do it irrespective of any dependency and still

maintain consistency. Hence the inversion of a patch

discards any dependency sub-graph in the original.

The Darcs commute operation is fundamental to

being able to undo the effect of a historical patch and

to merging changes from another repository. This op-

eration requires that there is a test for conﬂict between

two patches and that the alternate patches, to be ap-

plied in a different context, must be computed. These

two problems are considered here.

A conﬂict occurs between two RDF patches if ap-

plying them in a different order would generate a pos-

sibly inconsistent state in the triple store. The easi-

est example is when patch A depends on a statement

added by patch B: here A and B can’t be re-ordered

because A’s dependencies won’t be met in the context

before applying B and so A conﬂicts with B. When

two patches contain only add and delete sub-graphs,

a conﬂict occurs when patch A adds a statement that

is deleted by patch B or vice-versa. In this case, re-

ordering the patches would result in deleting a non-

existing statement and then adding that statement in

the second patch. Hence the state of the triple store

would be different if the patches were applied in a

different order.

Note that it would be possible to resolve this last

kind of conﬂict automatically. Consider the example

of the patches shown in Figure 3; here the result af-

ter applying the patches in the sequence AB is that

John knows James and Mary but not Bob. One way

to construct a valid sequence B

′

is to remove any

mention of Bob from each patch. The effect of B

′

would be the same as AB and consistency would be

assured. The only issue with this approach is that we

are potentially losing information since A and B may

have come from different users who should probably

be made aware of the conﬂict so that they can negoti-

ate whether or not John knows James.

A { add:

:john foaf:knows :bob .

:john foaf:knows :james . }

B { add:

:john foaf:knows :mary .

delete:

:john foaf:knows :bob . }

Figure 3: Two patches A and B which conﬂict due to adding

and deleting a shared statement.

This brings us to the question of how to construct

the alternate patch that is required when two patches

are to be reversed. Recall that when a patch sequence

AB is to be reversed, the patch B must be altered so

that it can be applied in the context prior to A and A

must be altered so that it can be applied after the mod-

iﬁed B. In text ﬁles, these changes involve changes to

line numbers in the patch. In our case, no change is

needed to patches to allow re-ordering if there is no

conﬂict between them.

VERSION CONTROL FOR RDF TRIPLE STORES

4 IMPLEMENTATION AND

EVALUATION

Our implementation goals for this project are ulti-

mately to provide both server and desktop based triple

stores and the ability to synchronise changes between

different workspaces. In the ﬁrst instance we have

built an implementation based on the Redland Beckett

(2001) system which provides a wrapper around the

existing Redland Python interface to implement ver-

sion control operations. This section discusses some

aspects of the implementation and then provides an

evaluation of the overhead cost of maintaining version

control information in this way.

The ﬁrst implementation decision relates to how

patches will be computed from the current workspace.

In text based version control the universal solution is

to compute a diff between the current and previous

versions showing which lines have been added and

removed. It would be possible to use this approach

for RDF and indeed we can refer to work published

by Berners-Lee and Connolly (2004) on how these

can be computed and stored. However, an alterna-

tive presents itself in our context which is to monitor

all changes to the RDF store via the Redland inter-

face and build the patch from the observed changes.

This should be considerably more efﬁcient when the

number of statements stored becomes large.

Our approach then is to instrument the Redland

Python interface to provide a wrapper around the op-

erations that perform additions and deletions from the

store to also record these changes in a patch. The

wrapper also provides the version control methods

record

, to ﬁnalise a patch,

revert

to undo the effects

of a patch and

merge

to merge a patch with the cur-

rent working store. These will be described in more

detail below.

4.1 Storing Patches

Since we have settled on a logging approach to gen-

erating patches, one possible approach is to use a

tabular structure to log the add, delete and depen-

dency triples as they are asserted in the working store.

Kiryakov and Ognyanov (2002) suggest that a rela-

tional database is an appropriate store for this data as

this can be stored more compactly than if it were as-

serted into the RDF store. A patch then would be a

table of add/delete operations with their parameters

(subject, predicate, object).

An alternative to this approach is to use RDF to

represent the patches with each add and delete oper-

ation forming a node in a special patch store. This

aligns with the work outlined by others Berners-Lee

and Connolly (2004); Auer and Herre (2006) who

deﬁne ontologies for RDF patches and allows for

recording metadata and reasoning about the patches

themselves, perhaps along the lines of the inferences

suggested by Watkins and Nicole (2006). Since we

need to make statements about the addition and dele-

tion of statements some kind of reiﬁcation is required.

Following the earlier work, we can also include state-

ments about the provenance of the changes and about

the relationships between patches.

Our implementation makes use of the Redland

Beckett (2001) context mechanism, which is simi-

lar to the named graph model used by Watkins and

Nicole. A context in Redland can be associated with

one or more assertions into the triple store. These are

used to group together the changes in a given patch

and since a context is identiﬁed by a resource node in

the graph, can also be used to make statements about

the patches themselves. Our implementation uses two

separate RDF stores for the working store and the

patch store although it would be only a little more

complex to use one store containing many contexts

for both of these. The main working store is main-

tained as it would normally be except that the add and

delete operations are intercepted by our wrapper. The

patch store contains just the assertions about patches,

with each patch being in its own context. One fur-

ther item of metadata must be stored in the working

store to record the sequence of patches that deﬁne the

current state. This information is needed when new

patches are made and when patches are reverted and

merged from other working stores.

Each addition or deletion from the working store

is asserted as a reiﬁed statement within the current

context (shown here using the TriG named graph syn-

tax Bizer (2005)):

:patchA {

:add1 rdf:type darcs:assert,

rdf:subject :john,

rdf:predicate foaf:knows,

rdf:object :bob .

:del1 rdf:type darcs:delete,

rdf:subject :john,

rdf:predicate foaf:knows,

rdf:object :jane .

}

Here

:patchA

is a named graph containing the

reiﬁed assertions. These assertions are collected in a

current patch context until such time as the user asks

to record or commit the patch. At this point the patch

is closed and assertions can be made about it via the

context node (in Redland, the context identiﬁer is a

ﬁrst class node in the graph). For example, the au-

ICSOFT 2007 - International Conference on Software and Data Technologies

thor and time of the patch and any commentary can

be recorded.

:patchA rdf:type darcs:patch,

dc:creator <mailto:fred@bloggs.com>,

dc:date "2005-10-20",

darcs:comment "Added new name" .

One step remains before the patch can be ﬁnalised

and this relates to the possibility that the same state-

ment might be added and deleted in the patch. If there

is a pair of add and delete nodes referring to the same

statement then they should be removed from the patch

since the net effect will be nil on the ﬁnal difference

between the current and prior version. Normalisation

produces a non-redundant set of changes required to

create the effect of a patch. This allows conﬂicts to be

detected without checking for redundant add/delete

pairs within the patches themselves. After normali-

sation, the patch context can be closed and a new one

opened for any future changes to be recorded.

4.2 The Revert Operation

The revert operation undoes the effect of a patch on

the working store, any additions are removed and any

deletions are re-added. As was shown above (Section

2.2.2), the revert operation on the most recent patch is

achieved by applying the inverse patch to the working

store. To revert a historical patch, the original patch

must be commuted to the head of the patch history

and then the inverse patch applied. Since commuting

an RDF patch is a simple re-ordering the only issue in

this operation is whether the patch being reverted con-

ﬂicts with a patch more recent in the history. For ex-

ample, if we wish to revert patch A from the sequence

ABCD and A asserts that John knows Jane while C re-

moves that statement, then the revert cannot succeed

and the user must be informed of the conﬂict. The

user may decide to revert the patch C before reverting

A to avoid this problem.

4.3 The Merge Operation

The merge operation applies changes from one work-

ing store to another. The situation arises when a

branch workspace is created, for example if Steve

takes a copy of James’ workspace to add some ma-

terial on a special topic. At the point copying the

workspace, the patch history might be ABCD, Steve

adds a number of statements and records a patch S,

meanwhile James has also made some more changes

and recorded a patch J. Now Steve sends James the

patch S and James must merge it with his workspace

ABCDJ. As shown above (Section 2.2.3), the merge

operation requires commuting S and J to give the ﬁnal

sequence ABCDJS

′

where S

′

is the modiﬁed version

of S that applies after J. For our RDF patches S = S

′

and so the merge can be performed simply by apply-

ing the patch S as long as there is no conﬂict between

S and J.

4.4 Evaluation

A brief evaluation of our implementation was carried

out to ensure that the overhead introduced by version

control was acceptable in terms of additional space

and processing time. The results showed that using

the MySQL backend for the Redland store, VC is

around four to eight times slower and needs from two

to four times as much space as the raw RDF store.

While this overhead is signiﬁcant, it is still work-

able in the kinds of applications we envisage. There

is room for optimisation here by working at a lower

level in the implementation of the RDF store.

5 DISCUSSION

As RDF is used more widely it will require the same

kind of tools as mainstream programming languages

since developers will want to collaborate on creat-

ing and maintaining an RDF store in the same way

that they currently do with source code. This paper

has presented a proposal for a version control system

for RDF based on a semi-formal theory of patches

as implemented in the Darcs version control system.

This provides the basic building blocks to build a dis-

tributed version control system for RDF which can

support collaborative editing of RDF stores. The

model is being implemented as part of an eResearch

project developing tools for collaborative annotation

of Linguistic data.

Auer and Herre (2006) recently described their

work on version control for RDF which has many

similarities with the work described here. Their fo-

cus was somewhat different to our own in that they

are concerned with ontology evolution and spend time

deriving evolutionary patterns which can be used to

classify the changes made to the RDF store. However,

they do cover a number of points that aren’t consid-

ered in our work.

Auer and Herre deﬁne an atomic graph as the

smallest sub-graph that can’t be sub-divided with-

out duplicating blank nodes; these are the smallest

units of change in their system. A change (their

name for our patch) consists one or more of these

atomic graphs added and/or deleted from the triple

store. Apart from the consideration of blank nodes,

VERSION CONTROL FOR RDF TRIPLE STORES

this formulation is entirely compatible with our no-

tion of RDF patches. Our system could be modiﬁed to

work with atomic graphs in more general RDF vocab-

ularies where blank nodes are an issue. The notion of

conﬂict between two changes is introduced as a bar-

rier to rolling back a change that occured some time

in the past. However, this notion is only described

as a relation between a change (patch) and a working

store. The idea of a conﬂict relation between patches

is not introduced and hence the manipulations that the

patch theory allows are not explored in their system.

It is clear then that this work is complimentary to

our own and that by borrowing the idea of the atomic

graph we should be able to expand our system to deal

with a broader range of RDF applications.

Our work is currently focused on the application

of our VC managed triple store to the problem of

Linguistic Annotation. Once we have successfully

demonstrated the technology in this application we

will look further into the more general problem of

VC for RDF and evaluate it in the context of ontol-

ogy evolution and other RDF authoring tasks.

REFERENCES

Auer, S. and Herre, H. (2006). A versioning and evolu-

tion framework for rdf knowledge bases. In Proceedings

of Ershov Memorial Conference, Novosibirsk, Akadem-

gorodok, Russia.

Beckett, D. (2001). The Design and Implementation of the

Redland RDF Application Framework. In Proceedings

of WWW10, Hong Kong.

Berners-Lee, T. and Connolly, D. (2004). Delta: an on-

tology for the distribution of differences between rdf

graphs. World Wide Web.

http://www.w3.org/

DesignIssues/Diff

Bird, S. and Liberman, M. (2000). A Formal Framework

for Linguistics Annotation. Speech Communication.

Bizer, C. (2005). The TriG Syntax.

http://www.wiwiss.

fu-berlin.de/suhl/bizer/TriG/

Bizer, C., Cyganiak, R., and Watkins, R. (2005). Named

Graphs for Jena (NG4J) API. In Proceedings of the Sec-

ond European Semantic Web Conference., Greece.

Broekstra, J. and Kampman, A. (2003). Inferencing and

Truth Maintenance in RDF Schema: exploring a naive

practical approach. In Workshop on Practical and Scal-

able Semantic Systems (PSSS).

Cassidy, S. and Harrington, J. (2000). Multi-level Annota-

tion in the Emu Speech Database Management System.

Speech Communication, 33:61–77.

Cederqvist, P. (2003). Version Management with CVS. Net-

work Theory.

http://www.network-theory.co.uk/

docs/cvsmanual/

Collins-Sussman, B., Fitzpatrick, B. W., and Pilato, C. M.

(2004). Version Control with Subversion. O’Reilly.

Kiryakov, A. and Ognyanov, D. (2002). Tracking Changes

in RDF(S) Repositories. In EKAW ’02: Proceed-

ings of the 13th International Conference on Knowledge

Engineering and Knowledge Management. Ontologies

and the Semantic Web, pages 373–378, London, UK.

Springer-Verlag.

Klein, M., Fensel, D., Kiryakov, A., and Ognyanov, D.

(2002). Ontology versioning and change detection on

the web. In EKAW ’02: Proceedings of the 13th In-

ternational Conference on Knowledge Engineering and

Knowledge Management. Ontologies and the Semantic

Web, pages 197–212, London, UK. Springer-Verlag.

Lord, T. (2006). How arch works. Web page.

http:

//regexps.srparish.net/www/arch-tech.htm

acc-

cesed on 13 December 2006.

R. Schroeter, J. Hunter, D. K. (2003). Vannotea - a collabo-

rative video indexing, annotation and discussion system

for broadband networks. In K-CAP 2003 Workshop on

Knowledge Markup and Semantic Annotation, Florida.

Roundy, D. (2006). The darcs revision control sys-

tem.

http://abridgegame.org/darcs/

. Accessed on

2006-05-18.

Watkins, E. R. and Nicole, D. A. (2005). Version control

in online software repositories. In Proceedings of the

2005 International Conference on Software Engineering

Research and Practice II, pages 550–556.

Watkins, E. R. and Nicole, D. A. (2006). Named graphs as

a mechanism for reasoning about provenance. In Lecture

Notes in Computer Science Frontiers of WWW Research

and Development - APWeb 2006: 8th Asia-Paciﬁc Web

Conference, pages 943–948, Harbin, China.

ICSOFT 2007 - International Conference on Software and Data Technologies