DIFFERENCING AND MERGING OF SOFTWARE DIAGRAMS

State of the Art and Challenges

Sabrina F

ortsch and Bernhard Westfechtel

Bayreuth University, Applied Computer Science I

95440 Bayreuth, Germany

Keywords:

Merging, Differencing, UML Diagram.

Abstract:

For long, ﬁne-grained version control for software documents has been neglected severely. Typically, software

conﬁguration management systems support the management of text or binary ﬁles. Unfortunately, text-based

tools for ﬁne-grained version control are not adequate for software documents produced in earlier phases in the

software life cycle. Frequently, these documents have a graphical syntax; therefore we will call them software

diagrams. This paper discusses the current state of the art in ﬁne-grained version control (differencing and

merging) for software diagrams with an emphasis on UML diagrams.

1 INTRODUCTION

Software engineers create a large variety of arti-

facts such as requirements deﬁnitions, software ar-

chitectures, program code, etc. All of these artifacts

are subsumed under the generic term software doc-

ument. Throughout its life cycle, a software docu-

ment evolves into multiple versions, each of which

records a snapshot of its evolution. Version control

has been studied for a long time in the discipline of

software conﬁguration management (see e.g. (Con-

radi and Westfechtel, 1998) for an overview).

This paper investigates ﬁne-grained version con-

trol for software diagrams. In the early phases of the

software life cycle, software documents with a graph-

ical syntax are used widely; consider e.g. data ﬂow di-

agrams, entity-relationship diagrams, UML diagrams

(the primary focus of this paper), etc. Traditional

software conﬁguration management provides version

control for text ﬁles or binary ﬁles. Low-level sup-

port of this kind is not sufﬁcient to compare or merge

software diagrams on a conceptual level. Therefore,

structure-based algorithms and tools are required for

differencing and merging of software diagrams.

The rest of this paper is structured as follows: Sec-

tion 2 introduces some basic notions, providing the

foundation for the following sections. Section 3 states

general requirements for differencing and merging.

Section 4 brieﬂy reviews previous work on differenc-

ing and merging of program code. Section 5, which

constitutes the core part of this paper, deals with dif-

ferencing and merging of software diagrams. Sec-

tion 6 gives an overview of existing tools for differ-

encing and merging, Section 7 concludes the paper.

2 BASIC NOTIONS

A difference is represented formally as a delta. There

are two kinds of deltas: A symmetric delta of two ver-

sions v

and v

contains all elements which belong

to v

but not to v

and vice versa. Using set nota-

tion loosely, the symmetric delta may be written as

∆(v

, v

) = (v

) ∪ (v

). In contrast, a directed

delta starts from one of the versions - say v

- and

creates the other one (v

) by applying a sequence of

operations. Thus, a directed delta may be formalized

as a sequence ∆ = op

. . . op

such that ∆(v

) = v

Merging denotes the process of combining n alter-

native versions a

, . . . , a

into a consolidated version

m. Usually, n = 2, which will be assumed in the fol-

lowing. Two-way merging compares two versions a

and a

with the help of a diff algorithm which calcu-

lates a symmetric delta. When a differing element is

for terminology see (Conradi and Westfechtel, 1998)

Förtsch S. and Westfechtel B. (2007).

DIFFERENCING AND MERGING OF SOFTWARE DIAGRAMS - State of the Art and Challenges.

In Proceedings of the Second International Conference on Software and Data Technologies - SE, pages 90-99

DOI: 10.5220/0001342900900099

 SciTePress

detected, the user has to decide whether the element

is to be included into the merge version. Furthermore,

the user may have to decide upon the relative arrange-

ments of elements in the merge version, e.g., when

two different lines occur at the same position in two

text ﬁles.

In the case of two-way merging, any difference

requires a user interaction. Frequently, the alterna-

tive versions have been derived from a common base

version b, and the merge is intended to combine the

parallel changes to the base version. Thus, three-

way merging compares three versions b, a

, and a

and constructs a merge version m incorporating the

changes from b to a

and a

, respectively.

Three-way merging increases the level of automa-

tion by consulting the base version as an arbitrator in

the case of differences. For example, when a line in a

text ﬁle occurs in only one of the alternative versions,

it is inserted into the merge version if and only if it

has not been present yet in the base version. A con-

ﬂict occurs in the case of contradicting changes, e.g.,

when two lines have been inserted at the same posi-

tion. User interaction is required only in the case of

conﬂicts.

The task of three-way merging may be charac-

terized as follows: Given a base version b, two al-

ternative versions a

and two directed deltas ∆

∆(b, a

)(i = 1, 2), construct a merge version m and a

merge delta ∆

such that ∆

(b) = m and ∆

consti-

tutes an order-preserving, complete merge of the op-

eration sequences ∆

and ∆

. Thus, merging builds

upon differencing, but adds further complications: In

general, it cannot be guaranteed that the input deltas

may be merged successfully. For example, an oper-

ation op from ∆

may be overridden by an operation

′

from ∆

, or op may not be executable any more

after the execution of op

′

. In these cases, a conﬂict

occurs because op and op

′

do not commute. But even

when such conﬂicts are not detected, the result of the

merge may not make sense at all if the merge is per-

formed at a too low level of abstraction.

3 REQUIREMENTS

In this section, we will deﬁne potential requirements

for diff and merge tools. The attribute “potential” in-

dicates that these requirements may be posed in one

application context, but they may also be considered

irrelevant in another context. Furthermore, since re-

quirements may contradict each other, one require-

ment may have to be traded against another require-

ment.

Table 1 lists requirements for differencing tools.

(R1) and (R2) refer to the quality of the result pro-

duced by the diff algorithm. (R3) and (R4) ensure

reusability. (R5) enables the comparison of diagram

versions which were created independently. (R6) re-

Table 1: Requirements for differencing.

(R1) Accuracy: The diff tool should calculate

the difference between two versions v

and

as precisely as possible.

(R2) High conceptual level: The diff tool

should report differences on a high level of

abstraction, i.e., it has to operate on a logi-

cal rather than a physical level.

(R3) Domain independence: The diff tool

should be applicable to a large set of dia-

gram types.

(R4) Tool independence: The diff tool should

be independent of the tools which were

used to create the diagram versions to be

processed.

(R5) History independence: The result pro-

duced by the diff tool should depend only

on the ﬁnal states of the diagram versions,

but not on the history of edit operations

used to create these versions.

(R6) Efﬁciency: The diff tool should calculate

its output as fast as possible, requiring as

little space as possible.

(R7) User-friendly representation: The diff

tool should represent its output in a user-

friendly way.

(R8) Lightweight approach: Implementation

of the merge tool should require as little ef-

fort as possible.

Table 2: Requirements for merging.

(R9) Conﬂict detection: The merge tool should

detect conﬂicts between the changes to be

merged.

(R10) Conﬂict resolution: The merge tool

should support the resolution of conﬂicts

detected during the merge.

(R11) User interaction: The merge tool should

offer an interactive mode, where conﬂicts

are resolved according to user decisions

rather than automatically.

(R12) Three-way merging: The merge tool

should support three-way merging, which

employs a common base version as an ar-

bitrator in order to eliminate unnecessary

user interactions.

(R13) Preservation of consistency: The merge

tool should preserve the consistency level

of the input versions as far as possible

when producing the merge version.

DIFFERENCING AND MERGING OF SOFTWARE DIAGRAMS - State of the Art and Challenges

quests for a fast response, which is important when

processing large volumes of data. (R7) ensures that

the result is represented in a user-friendly way. (R8)

is motivated by reducing the implementation effort.

Of course, requirements may contradict each

other. For example, efﬁciency may contradict accu-

racy, domain independence may stand in conﬂict with

operation at a high conceptual level, etc.

Please note that all of these requirements hold for

merge tools, too. Thus, Table 2 lists only those re-

quirements which are speciﬁc to merge tools. (R9)

and (R10) demand conﬂict detection and conﬂict res-

olution, respectively. (R11) calls for conﬂict resolu-

tion by user interaction. We believe that automatic

(default) decisions are too dangerous. (R12) prefers

three-way merging over two-way merging because

the amount of user interaction may be reduced by con-

sulting a common base version. However, two-way

merging may still be required if the base version is

not known or the alternative versions have been de-

veloped independently

. Finally, (R13) requires that

the result of the merge should be “as consistent as pos-

sible”.

4 DIFFERENCING AND

MERGING OF PROGRAM

VERSIONS

For a long time, differencing and merging has been

studied primarily for versions of programs, since

eventually each software process has to produce an

executable program (other artifacts are often consid-

ered merely as “documentation”). For a survey of pro-

gram merging, see (Mens, 2002).

It is interesting to note that up to now text-based

tools dominate the current state of practice. Text-

based tools for differencing and merging have been

provided as stand-alone tools; in addition, they have

been implemented in both commercial and free soft-

ware conﬁguration management systems. The under-

lying technology is cheap, efﬁcient, and widely appli-

cable as it stands.

Text-based tools for differencing and merging are

characterized by the following features:

• Usually, text ﬁles are treated as sequences of text

lines, i.e., text lines are considered as atomic

units. However, some tools operate at a more ﬁne-

grained level (sequences of characters).

Three-way merging partially conﬂicts with history in-

dependence, since it assumes a common base version. How-

ever, change logs are not necessarily assumed for three-way

merging.

• Differences are calculated a posteriori without

assuming any historical information (logs of

changes). Thus, differencing and merging does

not rely on tools recording change logs. In partic-

ular, no information beyond the actual text (e.g.,

unique identiﬁers of text lines) is required.

• Even for text ﬁles, there is no unique formal deﬁ-

nition of the term difference. The actual meaning

of this term depends on the level of granularity

(lines or characters) and on the set of edit opera-

tions which are taken into account (insert, delete,

move, copy).

• For text ﬁles, there are exact algorithms available

which calculate the minimal difference with re-

spect to a formally deﬁned metric. For exam-

ple, (Hunt and Szymanski, 1977) calculates the

longest common subsequence (lcs), while (Tichy,

1984) additionally covers block moves.

• Text-based merging usually relies on the lcs al-

gorithm for comparing text ﬁles line by line (e.g.

the well-known Unix utility diff3). Thus, changes

within one line and moves cannot be handled by

such tools.

Remarkably, text-based merging can guarantee no

more than a text ﬁle as output. As a consequence, the

result of the merge may contain syntactic and seman-

tic errors. Furthermore, syntactic and semantic con-

ﬂicts may go undetected. These shortcomings have

triggered numerous research activities at the syntac-

tic or semantic level (e.g. (Buffenbarger, 1995) and

(Horwitz et al., 1989)).

So far syntactic or semantic differencing and

merging of programs have been implemented not only

in some research prototypes but also in widely used

development environments (see the eclipse plugin

compare). On the one hand, the required technology

has proved to be much more sophisticated than for

text-based tools. On the other hand, some approaches

are severely constrained with respect to the set of pro-

grams to which they can be applied (in particular, this

statement holds true for semantic merging).

In contrast, text-based tools perform very badly

in theory, but fairly well in practice. As noted in

(Mens, 2002), empirical evaluations have shown a

very high fraction (more than 90%) of successful,

non-conﬂicting merges. Even when the merge result

is not consistent, errors injected by the merge are usu-

ally caught by the compiler or by failing regression

tests. Nevertheless, merging cannot be trusted blindly,

and has proved difﬁcult when it is performed after

a fairly long time of parallel development (Wiborg-

Weber, 1999).

ICSOFT 2007 - International Conference on Software and Data Technologies

5 DIFFERENCING AND

MERGING OF SOFTWARE

DIAGRAMS

Treating diagrams as texts is possible when a textual

format is deﬁned which may be used as backup or for

exchanging data between different tools. Nowadays,

all kinds of diagrams may be stored as XML docu-

ments, i.e., structured text. Effectively, this means

that documents are represented as trees, augmented

with cross-references. In contrast, the term plain text

refers to a ﬂat text, consisting of a sequence of text

lines.

Viewing diagrams as plain text is not very helpful

for differencing and merging (see (Ohst and Kelter,

2002)). Text-based tools for differencing and merg-

ing are sensitive to changes of the order in which lines

appear in a text ﬁle, and they are also sensitive to

changes in the layout such as e.g. the applied rules of

indentation. To a large extent, the order of text lines

and their layout is immaterial to the diagram which

is represented by the text. Therefore, applying diff

and merge tools at the level of plain text will hardly

produce meaningful results. Instead, some suitable

structural representation has to be used.

Below, we will explore several design decisions

which affect functionality, user interface, and efﬁ-

ciency of tools for differencing and merging of soft-

ware diagrams. In particular, we will discuss the re-

spective trade-offs that have to be taken into account.

5.1 Delineation of the Domain

In the ﬁrst place, it has to be decided to which types

of diagrams the diff or merge tool is going to be ap-

plied. For example, the tool may operate on any kind

of UML diagrams (Kelter et al., 2005), a speciﬁc kind

of UML diagram (e.g., class diagrams (Ohst et al.,

2003; Xing and Stroulia, 2005)), any diagram pro-

cessable by a meta-CASE tool (Mehra et al., 2005),

etc. The trade-off which has to be made concerns the

requirements (R2) and (R3): A tool which is appli-

cable to a large domain can make only basic assump-

tions with respect to the contents of the diagram to be

processed.

5.2 Determination of a Document

Model

After having ﬁxed the domain, the tool developer has

to design a document model deﬁning the elements, re-

lationships, and attributes to be considered. The doc-

ument model has a strong impact on the capabilities

of the diff or merge tool. Via the document model,

views are deﬁned on the diagrams to be processed. A

simple document model allows for simple (R8) and

(relatively) efﬁcient (R6) algorithms, but lowers the

conceptual level of differencing or merging (R2).

Considerably differing document models have

been proposed for differencing and merging software

diagrams. For example, (Engel et al., 2006; Alanen

and Porres, 2003), (VVU, 2007, Ahrens) are based on

MOF and are thus applicable to MOF instances, in-

cluding UML diagrams. (Kelter et al., 2005; Xing

and Stroulia, 2005) rely on tool-speciﬁc document

models (trees augmented with cross-references). In

(Soto and M

unch, 2006) diagrams are transformed

into RDF. (VVU, 2007, St

orrle) proposes to trans-

form diagrams into Datalog clauses. The motivation

to transform diagrams into some generic model is to

reuse generic tools and algorithms for differencing

and merging. Graphs are another promising docu-

ment model (VVU, 2007, Ebert et al.).

The document model has to be selected carefully.

Differencing and merging have to be performed at an

adequate level of abstraction such that a conceptual

mismatch is avoided. In particular, this issue has to be

taken into account when a document is transformed

into another representation for the purpose of differ-

encing and merging: The results of differencing and

merging have to be translated back into the “native”

document model. In the case of differencing, this

means that sets of low-level differences have to be

aggregated into high-level differences. Likewise, for

merging it has to be checked whether combinations of

low-level changes may be composed into correspond-

ing high-level changes.

5.3 Deﬁnition of Differences

After having determined the document model, the no-

tion of difference has to be deﬁned. A diff tool has

to calculate (or at least approximate) a minimal differ-

ence between two diagrams. In the case of directed

deltas, the minimal difference may be deﬁned as a se-

quence ∆ of operations since that ∆(v

) = v

and the

cost c(∆) is minimal. In the case of a symmetric delta,

∩ v

has to be maximized.

Thus, deﬁning the difference involves the selec-

tion of an appropriate deﬁnition of a cost model for

calculating the costs of executing a sequence of op-

erations. What is considered “appropriate”, is even-

tually answered by the user. A formal notion of dif-

ference may serve as a speciﬁcation against which the

implementation may be veriﬁed or tested. In addition,

a validation is required to check whether the require-

ments of the user are actually satisﬁed (i.e., whether

DIFFERENCING AND MERGING OF SOFTWARE DIAGRAMS - State of the Art and Challenges

the user considers the calculated difference as mini-

mal).

Note that a formal deﬁnition of minimal differ-

ence introduces a metric for measuring the distance

between documents. A metric also allows us to assess

the quality of the diff algorithm quantitatively. With-

out a metric, there is no speciﬁcation of the problem

to be solved by the diff algorithm.

On the other hand, it should be noted that an eval-

uation of some diff algorithm with respect to a given

metric alone is not sufﬁcient: Even if the minimal dif-

ference with respect to that metric is always found, the

user may still complain about missing accuracy (R1).

As a simple example, assume that the set of base op-

erations does not contain a move operation. Then,

each move is simulated by deletions and insertions,

and the user will not consider the calculated differ-

ence as minimal.

It is worthwhile to notice that many approaches to

differencing do not rely on a formally deﬁned metric.

In some cases, the metric may be customized by the

user. E.g., (Melnik et al., 2002; Kelter et al., 2005)

support customizable similarity functions on which

the matching decisions are based.

5.4 Reliance on Unique Identiﬁers

In order to calculate differences, a criterion of same-

ness is required: Which elements of different versions

, v

are considered to be the same? In the case of

text-based tools, sameness is decided solely with the

help of the contents and position of text lines. In this

way, history independence (R5) is achieved.

On the other hand, structure-based differencing

and merging turns out to be much more difﬁcult. It is

not easy to identify elements of different versions in

such a way that a minimal delta is computed. There-

fore, several diff and merge tools rely on unique iden-

tiﬁers (Ohst et al., 2003; Lindholm, 2004; Alanen and

Porres, 2003; Rho and Wu, 1998; Mehra et al., 2005;

Engel et al., 2006; Soto and M

unch, 2006): When an

element is created, it is assigned a new unique identi-

ﬁer. When the containing diagram is copied, the iden-

tiﬁers of its elements are retained. In this way, differ-

ent copies of the “same” element may be located in

the versions to be processed.

To a great extent, the calculation of differences is

“for free” when unique identiﬁers are present. Thus

unique identiﬁers simplify algorithms (R8) and make

them more efﬁcient. However, unique identiﬁers

make differencing and merging dependent on the his-

tory of changes, which implies a contradiction to re-

quirement (R5). In the extreme, it might happen that

two versions v

and v

are considered to have an

empty intersection even though they are isomorphic.

This situation occurs when both versions have been

created with the same contents independently by dif-

ferent users.

Thus, even when using a tool maintaining unique

identiﬁers, differencing and merging may not perform

accurately (R1) and even produce counter-intuitive re-

sults. It should be noted that the user usually is not

aware of unique identiﬁers and thus might experience

anomalies which violate the principle of least possible

amazement.

In addition, unique identiﬁers introduce tool de-

pendencies, contradicting (R4). In the worst case,

identiﬁer-based differencing and merging will work

only if all versions have been created with the same

tool. This situation is improved when multiple tool

vendors agree upon the management of unique iden-

tiﬁers, as it is encouraged - yet not enforced - in the

XMI standard (xmi, 2005). In (VVU, 2007, Hein

and Ritter), an approach is presented which supports

diff and merge across tool boundaries by relying on

unique identiﬁers introduced in the MOF versioning

standard (mof, 2005).

5.5 Design of Algorithms

Clearly, the previous decisions heavily inﬂuence the

algorithms for differencing and merging. Unfortu-

nately, structure-based algorithms tend to be more

complex and less efﬁcient than text-based algorithms.

In particular, this holds true without unique identi-

ﬁers: Optimal matches may be expensive to compute.

In the case of trees, computing a minimal delta is

known to be an NP-hard problem.

With respect to a formally deﬁned notion of dif-

ference, we may distinguish among exact algorithms

which are guaranteed to produce a minimal differ-

ence, approximation algorithms which may miss the

minimum only up to a deﬁned maximal distance, and

heuristic algorithms with no guarantees at all. Accu-

racy (R1) has to be balanced against efﬁciency (R6).

So far, we are aware only of heuristic algorithms

for differencing and merging. All algorithms assum-

ing unique identiﬁers fall into this class, since they

take the identiﬁcation for granted and do not search

for a better match. But those algorithms which do not

build upon unique identiﬁers are also heuristic algo-

rithms (Kelter et al., 2005; Melnik et al., 2002; Xing

and Stroulia, 2005; Chawathe and Garcia-Molina,

1997; Cobena et al., 2002). Accuracy of these al-

gorithms is typically evaluated by human judgment;

a metric is not used for this purpose. This makes it

difﬁcult to compare these algorithms with respect to

their accuracy.

ICSOFT 2007 - International Conference on Software and Data Technologies

Computational complexity may exclude the appli-

cation of exact algorithms. For example, let us as-

sume that documents are modeled as graphs. Com-

putation of minimal graph differences includes the

search of a graph isomorphism as a special case (if

this search is successful, the graphs would be con-

sidered to be identical). Graph isomorphism is not

known to be a tractable problem. Known algo-

rithms for testing graph isomorphism have a super-

exponential worst case behavior. However, this need

not be a “killing argument”; consider e.g. the

speed up of graph pattern matching achieved in the

PROGRES system (Sch

urr et al., 1999) by exploiting

additional information, e.g., from the graph schema.

5.6 Designing the User Interface

Differencing and merging of software diagrams re-

quires a well-designed user interface which in partic-

ular relates differences and conﬂicts to diagram repre-

sentations the user is familiar with (R7). In the case of

differencing, diagrams may be displayed side by side

with differences being marked graphically (e.g. by us-

ing colors). If not enough space is available, instead a

uniﬁed diagram may be constructed which shows the

common and all speciﬁc elements contained in only

one version (Kelter et al., 2005). However, this rep-

resentation may easily be overloaded (as an analogy,

consider reading a C ﬁle with extensive conditional

compilation).

Unfortunately, the requirement for a user-friendly

representation is neglected in several tools. In

(Schneider et al., 2004),(VVU, 2007, Schneider and

undorf), a merge tool for Fujaba models reports

changes in a cryptic textual format. In (Engel et al.,

2006) differences between MOF instances are repre-

sented as trees rather than graphically. In (Xing and

Stroulia, 2005) structural changes are visualized in

a more sophisticated way as containment and inher-

itance change trees.

5.7 Conﬂict Detection and Resolution

The problem of conceptual mismatch mentioned ear-

lier has to be considered particularly for three-way

merging: The user expects that operations are com-

bined and conﬂicts are detected and resolved at a

conceptual level conforming to the document model

which (s)he has in mind. When the merge tool oper-

ates at a different (physical) level, conﬂicts reported

at that level cannot be understood by the user. Like-

wise, conﬂict resolution has to be performed on the

conceptual rather than on the physical level.

No merge tool can be blamed for a failing merge

if the changes that have been performed concurrently

by different users cannot be combined in a meaning-

ful way. While the merge tool has to strive for produc-

ing a consistent result (R13), uncoordinated changes

may cause inconsistencies. For example, two users

may have inserted a class with the same name, result-

ing in a name clash. Or one user may have made c

a subclass of c

, while another user has deﬁned the

inheritance relationship in the opposite direction. In

general, we cannot expect that a merge tool detects

all kinds of context-sensitive conﬂicts, reports them to

the user, and ensures consistency by rejecting changes

causing inconsistencies. Please recall that we do not

expect such a behavior when applying text-based pro-

gram merging; rather, errors introduced by the merge

can (partially) be detected by running the compiler.

Likewise, merging of diagrams may result in incon-

sistencies which are ﬁxed in a post-processing step.

However, there is one crucial difference compared

to text-based merging: As a result of merging text

ﬁles, we will get a text ﬁle which may be checked

by the compiler and which may be viewed and edited

with the help of some text editor. In contrast, merg-

ing of diagrams may result in fundamental inconsis-

tencies: The output produced by the merge may not

be processable any more because fundamental con-

straints are violated. In fact, in many approaches pre-

sented in the literature merging may produce an in-

consistent result (Lindholm, 2004; Alanen and Porres,

2003; Ohst et al., 2004; Rho and Wu, 1998; Mehra

et al., 2005; Chen et al., 2003), (VVU, 2007, Schnei-

der and Z

undorf). The merge tool may be blamed for

this problem if the merge is performed at the wrong

level of abstraction. On the other hand, the consis-

tency constraints enforced by some CASE tool may

be too tight. If inconsistencies were tolerated in a dia-

gram editor, the merge tool could create a potentially

inconsistent output, which the user can improve sub-

sequently in the editor. To make this work, the under-

lying document model has to be generalized such that

inconsistencies are tolerated (Schneider et al., 2004),

(VVU, 2007, Schneider and Z

undorf).

6 DESCRIPTION AND

CLASSIFICATION OF KNOWN

TOOLS

In the following paragraphs, six tools are compared in

more detail: one algorithm for calculating a matching,

two algorithms for differencing without unique iden-

tiﬁers (see 6.1) and two differencing methods that rely

on unique identiﬁers (see 6.2). The latter are merging

DIFFERENCING AND MERGING OF SOFTWARE DIAGRAMS - State of the Art and Challenges

tools too, as described in 6.3, where another merging

tool that uses logged operations as delta is presented.

6.1 Differencing without Unique

Identiﬁers

Differencing tools that do not rely on unique identi-

ﬁers need other criteria to identify corresponding di-

agram elements that are sufﬁciently similar. In all

known concepts the matching and the computation

of differences are considered to be separate problems.

Since the computation of differences with respect to

a given matching is discussed in the next paragraph,

we regard the following three algorithms only with

respect to the heuristics they use to ﬁnd a matching.

Similarity Flooding (Melnik et al., 2002) is a gen-

eral graph matching algorithm operating on directed

labeled graphs whose nodes represent diagram ele-

ments. For the computation of the similarity of two

nodes their neighbourhood is relevant: The similarity

of two nodes increases if their adjacent nodes are sim-

ilar. The computation itself works as follows: Initial

similarity values are computed by a pairwise compar-

ison of the node names. These similarity values are

then propagated in a ﬁxpoint computation along the

edges of the similarity propagation graph. A matching

can be selected from these globally computed simi-

larity values by choosing thresholds, constraints and

selection metrics.

The generic algorithm SiDiff (Kelter et al., 2005)

uses an internal data model comparable with a sim-

pliﬁed UML meta-model and is conﬁgurable for var-

ious types of UML diagrams. A diagram is extracted

from an XMI ﬁle and is represented as a tree con-

sisting of the composition structure augmented with

cross-references. Assuming that model elements are

characterized by the elements they consist of, the

difference algorithm starts with a bottom-up traver-

sal at the leaves of the composition tree. All ele-

ments of the same type are compared pairwise using

a type-speciﬁc similarity function that evaluates the

weighted aggregation of type-speciﬁc criteria. For

instance, in the case of class diagrams the similar-

ity of name, attributes, methods and inheritance re-

lations are considered. If a signiﬁcant correspon-

dence of two nodes has been identiﬁed, these nodes

are matched and if they possess child nodes that have

not been matched yet, the similarity is propagated

top-down into the subtree, i.e. the similarity values

of the child nodes are computed anew with respect

to the correspondence of their parent nodes. The al-

gorithm ends if all nodes have been processed in the

bottom-up phase and all similarities have been propa-

gated downwards (see ﬁgure 1). It is important to note

Figure 1: SiDiff: Search for Correspondences.

that the matching in the bottom-up phase is not very

successful at the lowest levels of the tree, since many

leave nodes are nearly identical (see the frequent oc-

currences of data type integer in class diagrams for

instance). To handle this problem the original algo-

rithm has been extended by a pre-phase which tries

to match nodes by comparing hash values computed

on the paths of the nodes in the composition tree. Af-

ter the matching has been found, differences are com-

puted and represented in a uniﬁed document.

The algorithm UMLDiff in (Xing and Stroulia,

2005) operates on class diagrams which have been

reverse engineered from Java source code and thus

works on a much more ﬁne-grained level than the

two algorithms presented above. The data model

used consists of a directed graph including a span-

ning tree of containment relations. In contrast to SiD-

iff, UMLDiff starts at the root nodes and compares the

nodes of the same logical level pairwise. On the way

down only those entities in subtrees are compared

whose root nodes have been matched on a higher level

(see ﬁgure 2). If two objects have the same name, they

are identiﬁed as equal. If not, their structural simi-

larity is considered, computed from the similarity of

names and other criteria speciﬁc of the considered en-

tity type. In the case of methods, these are parameter

types, ﬁelds they read or write and other methods they

call or are called by. If a computed structure similarity

of two entities exceeds a user-deﬁned threshold and

if it is the maximum value of all possible matching

candidates, the entities are matched as equal. After

all the leaves of the composition tree have been pro-

cessed, the remaining objects are compared in order

to ﬁnd moved objects. If there are two entities with

the same name in different subtrees, they are consid-

ered as moved. Then the algorithm tries to identify

moved and renamed elements by structural similarity.

The top-down traversal with the early restriction of

the search space to the subtrees of matched entities

and the method of matching the entities by name ﬁrst

make this algorithm efﬁcient. It presumes however,

that the two versions that are compared are not too

different, i.e. in particular that not many movements

and renamings have occurred.

The methods described above seem to work satis-

fyingly in terms of performance and error rate judging

ICSOFT 2007 - International Conference on Software and Data Technologies

Figure 2: UMLDiff: Search for Correspondences.

from the published results. In all three cases, how-

ever, the differences and matchings have only been

veriﬁed by hand. Due to the lack of metrics, it is

not possible to compare the different algorithms. It is

worth mentioning that the computation in Similarity

Flooding uses a global concept even if the underlying

model is quite simple. On the other hand SiDiff and

UMLDiff make a sequence of local decisions. Further

SiDiff uses conﬁguration ﬁles to adapt the algorithm

to the speciﬁc diagram types and thus offers a com-

promise between (R2) and (R3) whereas Similarity

Flooding, working on a high conceptual level, can-

not use diagram type-speciﬁc information. UMLDiff

gives up generality in favour of domain speciﬁc op-

timization. None of the three tools described above

supports merging.

6.2 Differencing with Unique Identiﬁers

The tools presented in this paragraph identify the

corresponding diagram elements by comparing their

unique identiﬁers.

In (Ohst et al., 2003) the two diagrams that should

be compared are presented as graphs consisting of a

spanning tree of composition relations as in the more

recent concept presented in (Kelter et al., 2005). In

a top-down traversal of each level in the spanning

tree the corresponding subtrees which are rooted at

nodes with identical identiﬁers are found. The cor-

responding nodes are compared with respect to their

attributes and relationships and the difference infor-

mation is recorded in an object created for the uniﬁed

document representing a symmetric difference. Then

the nodes in the matched subtrees are examined fur-

ther. A move operation is realized as composite op-

eration that deletes one object and inserts an object

with the same unique identiﬁer. To identify moved

subtrees, all the subtrees that could not be matched

are stored in sets and compared once again. If nodes

with the same identiﬁer exist in both sets, they repre-

sent a moved node. All other nodes have either been

deleted or created.

In (Mehra et al., 2005) component-based plug-ins

to the meta-CASE tool Pounamu for diagram version-

ing, differencing and merging are presented. Any dia-

gram type, which has been deﬁned in the meta-CASE

tool, can be compared to ﬁnd a directed delta. Instead

of traversing the graph-based structure, consisting of

shapes and connectors and their properties, the dia-

grams are compared in two steps: First all shapes are

matched by their identiﬁers. If the properties of cor-

responding shapes differ, an appropriate change oper-

ation is added to the directed delta. If a shape exists

in one diagram only, an insert or delete operation is

added. Since a connector is deﬁned by its source and

target shapes, if a shape is deleted all connected con-

nectors are also deleted and if a shape is inserted, the

connectors must be inserted too. In a second step all

connectors that have not been processed yet are com-

pared. Moves are not detected.

The set of feasible edit operations used in these

two approaches are not equal. In Pounamu only in-

sert and delete operations are considered, whereas in

(Ohst et al., 2003) a combined delete and insert sit-

uation is interpreted as move operation. Further in

(Ohst et al., 2003) a symmetric difference is com-

puted whereas in Pounamu a directed delta is deter-

mined. The two described tools also support merging

as shown in the following paragraph.

6.3 Merging

In this paragraph, three merging tools that offer dif-

ferent levels of user interaction are described.

In the CoObRA versioning framework (Schneider

et al., 2004) all edit operations that are executed on

the diagrams are logged by the tool. For this rea-

son no differences must be computed. CoObRA uses

three-way merging but gives priority to the version

that was committed ﬁrst. The workﬂow is illustrated

in ﬁgure 3. A developer has checked version v

out

of the repository into the local workspace to mod-

ify it by applying the operation sequence ∆

. But if

meanwhile the operation sequence ∆

has been ap-

plied to the version in the repository, the developer

fails to commit his changes. He has to update his lo-

cal version ﬁrst. This means applying the changes ∆

on the origin version v

to reach the actual version

stored in the repository, then trying to apply the

change operations in ∆

again. At this point, con-

Figure 3: Commit and Update in CoObRA.

DIFFERENCING AND MERGING OF SOFTWARE DIAGRAMS - State of the Art and Challenges

ﬂicts may occur if one or more operations in ∆

can

no longer be applied after the execution of ∆

. The

operation sequence ∆

∗

in the ﬁgure is a subset of ∆

expressing that some operations might not have been

applied. Conﬂicts are reported to the application in a

cryptic way, conﬂict solving is not supported. Further

the semantic correctness may be violated: if classes

with the same name are created in ∆

and in ∆

, the

merged version may contain both classes if it is not

ﬁltered by constraints in the application.

The Pounamu meta-CASE tool described in

(Mehra et al., 2005) offers a plug-in for merging,

where the merging is realized interactively. The set of

edit operations in the computed directed delta are of-

fered to the user who decides which changes to apply.

A difference highlighting plug-in shows the differ-

ences graphically based on the local version of the di-

agram. Additionally the edit operations are presented

in a list, where edit operations which are currently not

applicable are marked.

(Ohst et al., 2004; Ohst et al., 2003) use three-

way merging and split the merging process into three

steps. First a pre-merged document is created. In

the second step, the conﬂicts must be resolved man-

ually before the merged document is created in the

ﬁnal step. Conﬂicts occur if the same attribute has

been changed in both versions, if an object has been

modiﬁed in one version and deleted in the other ver-

sion and in all derived situations. In case of deletion-

modiﬁcation conﬂicts the user has to decide whether

the object should be deleted or modiﬁed. In the case

of change conﬂicts the user is asked which modiﬁ-

cation should be applied. The pre-merged document

is an extended uniﬁed document consisting of com-

mon parts, automatically merged parts and conﬂicts.

This pre-merged document can be modiﬁed in a tool

that supports conﬂict solving and undoing decisions,

even decisions that have been made automatically.

The merged class diagram may be inconsistent, con-

straints like uniqueness of names of classes, meth-

ods or attributes must be veriﬁed after merging (Ohst

et al., 2004).

Only in (Ohst et al., 2004) the two versions that

have to be merged with respect to a base document

have equal relevance. CoObRA gives priority to the

version stored in the repository, Pounamu to the local

changes. The differences used for merging are ob-

tained in different ways: in CoObRA there is a proto-

col of the operations, in Pounamu the differences are

computed as directed delta and in (Ohst et al., 2004)

a symmetric difference is calculated. In CoObRA,

user interaction in the merging process is not pos-

sible; modiﬁcations must be made manually on the

merging result. In the meta-CASE tool Pounamu the

developer can and must decide which change opera-

tions have to be applied. In (Ohst et al., 2004) the

non-conﬂicting transformations are applied automat-

ically leaving only the problematic decisions to the

user, including the possibility to interfere in the taken

decisions.

7 CONCLUSION

We have deﬁned requirements for algorithms and

tools for differencing and merging of software dia-

grams. Furthermore, we have explored several cru-

cial design decisions which tool developers have to

perform. We have also shown how these design deci-

sions have been resolved in a number of approaches

published in the literature.

The current state of the art may be characterized

as follows:

• There is a common agreement that text-based diff

and merge tools are not adequate for software di-

agrams.

• A number of commercial tools and research proto-

types provide support for differencing and merg-

ing. However, these approaches suffer from var-

ious shortcomings such as non-graphical user in-

terfaces, reliance on unique identiﬁers, or incon-

sistent merge results.

• There is no common agreement with respect to the

document model as the foundation for differenc-

ing and merging, metrics to be used for measuring

differences between versions, rules used for merg-

ing, etc.

• Published algorithms either assume unique iden-

tiﬁers or are based on heuristics. Evaluations of

these algorithms are based on human judgment,

and it is hard to compare these algorithms against

each other.

Thus, further research is needed to improve the

state of the art. However, it is difﬁcult - or even im-

possible - to meet all of the requirements deﬁned in

this paper. From the perspective of software conﬁgu-

ration management, it is important to go beyond text-

based version control. On the other hand, software

conﬁguration management systems need to support

version control for a wide variety of software docu-

ments. Moreover, they need to handle large volumes

of data. From this perspective, general approaches

based e.g. on MOF or XML are required. The expe-

riences gained with differencing and merging of pro-

gram versions indicate that accuracy and sophistica-

tion may have to be traded for generality and efﬁ-

ciency.

ICSOFT 2007 - International Conference on Software and Data Technologies

REFERENCES

(2005). MOF 2.0/XMI Mapping Speciﬁcation, v2.1. Object

Management Group, ﬁnal/05-09-01 edition.

(2005). MOF2 Versioning Final Adopted Speciﬁcation. Ob-

ject Management Group, ptc/05-08-01 edition.

(2007). Contributions to the workshop ”Versionierung

und Vergleich von UML-Modellen” on the confer-

ence of Software Engineering 2007 in Hamburg.

Softwaretechnik-Trends, 27(2). (to appear).

http://pi.informatik.uni−siegen.de/gi/fg211/VVUM07/.

Alanen, M. and Porres, I. (2003). Difference and union of

models. In Stevens, P., Whittle, J., and Booch, G., ed-

itors, UML 2003 - The Uniﬁed Modeling Language,

Modeling Languages and Applications, 6th Interna-

tional Conference, LNCS 2863, pages 2–17. Springer.

Buffenbarger, J. (1995). Syntactic software merging. In

Estublier, J., editor, Software Conﬁguration Manage-

ment: Selected Papers SCM-4 and SCM-5, LNCS

1005, pages 153–172.

Chawathe, S. S. and Garcia-Molina, H. (1997). Meaning-

ful change detection in structured data. In Peckman,

J. M., editor, Proceedings ACM SIGMOD Interna-

tional Conference on Management of Data, pages 26–

37. ACM Press.

Chen, P., Critchlow, M., Garg, A., der Westhuizen, C. V.,

and van der Hoek, A. (2003). Differencing and merg-

ing within an evolving product line architecture. In

van der Linden, F., editor, Proceedings of the Fifth In-

ternational Workshop on Product Family Engineering

(PFE-5), LNCS 3014, Siena, Italy. Springer Verlag.

Cobena, G., Abiteboul, S., and Marian, A. (2002). De-

tecting changes in XML documents. In International

Conference on Data Engineering, pages 41–52. IEEE

Computer Society.

Conradi, R. and Westfechtel, B. (1998). Version models for

software conﬁguration management. ACM Computing

Surveys, 30(2):232–282.

Engel, K.-D., Paige, R. F., and Kolovos, D. S. (2006). Us-

ing a model merging language for reconciling model

versions. In Rensink, A. and Warmer, J., editors,

ECMDA-FA, volume 4066 of Lecture Notes in Com-

puter Science, pages 143–157. Springer.

Horwitz, S., Prins, J., and Reps, T. (1989). Integrating non-

interfering versions of programs. ACM Transactions

on Programming Languages and Systems, 11(3):345–

387.

Hunt, J. and Szymanski, T. (1977). A fast algorithm for

computing longest common subsequences. Commu-

nications of the ACM, 20(5):350–353.

Kelter, U., Wehren, J., and Niere, J. (2005). A generic dif-

ference algorithm for UML models. In Liggesmeyer,

P., Pohl, K., and Goedicke, M., editors, Software En-

gineering 2005, LNI 64, pages 105–116. GI.

Lindholm, T. (2004). A three-way merge for XML docu-

ments. In Munson, E. V. and Vion-Dury, J.-Y., editors,

Proceedings of the 2004 ACM Symposium on Docu-

ment Engineering, pages 1–10. ACM.

Mehra, A., Grundy, J. C., and Hosking, J. G. (2005).

A generic approach to supporting diagram differenc-

ing and merging for collaborative design. In Red-

miles, D. F., Ellman, T., and Zisman, A., editors,

20th IEEE/ACM International Conference on Auto-

mated Software Engineering (ASE 2005), pages 204–

213. ACM.

Melnik, S., Garcia-Molina, H., and Rahm, E. (2002). Sim-

ilarity ﬂooding: A versatile graph matching algorithm

and ist application to schema matching. In Proceed-

ings 18th International Conference on Data Engineer-

ing, pages 117–128, San Jose, CA.

Mens, T. (2002). A state-of-the-art survey on software

merging. IEEE Transactions on Software Engineer-

ing, 28(5):449–462.

Ohst, D. and Kelter, U. (2002). A ﬁne-grained version and

conﬁguration model in analysis and design. In ICSM,

pages 521–527. IEEE Computer Society.

Ohst, D., Welle, M., and Kelter, U. (2003). Differences

between versions of UML diagrams. In Proceedings

ESEC/FSE-11, pages 227–236, New York, NY, USA.

ACM Press.

Ohst, D., Welle, M., and Kelter, U. (2004). Merging UML

documents. Internal Report, University of Siegen.

Rho, J. and Wu, C. (1998). An efﬁcient version model of

software diagrams. In Asia Paciﬁc Software Engineer-

ing Conference, pages 236–243. IEEE Computer So-

ciety Press.

Schneider, C., Z

undorf, A., and Niere, J. (2004). CoObRA

- a small step for development tools to collaborative

environments. In Workshop on Directions in Software

Engineering Environments; 26th international confer-

ence on software engineering. ICSE 2004, Scotland.

Sch

urr, A., Winter, A., and Z

undorf, A. (1999). The PRO-

GRES approach: Language and environment. In

Ehrig, H., Engels, G., Kreowski, H.-J., and Rozen-

berg, G., editors, Handbook on Graph Grammars

and Computing by Graph Transformation: Applica-

tion, Languages, and Tools, volume 2, pages 487–550.

World Scientiﬁc.

Soto, M. and M

unch, J. (2006). Process model difference

analysis for supporting process evolution. In Richard-

son, I., Runeson, P., and Messnarz, R., editors, Soft-

ware Process Improvement, 13th European Confer-

ence, EuroSPI 2006, LNCS 4257, pages 123–134.

Springer.

Tichy, W. F. (1984). The string-to-string correction problem

with block moves. ACM Transactions on Computer

Systems, 2(4):309–321.

Wiborg-Weber, D. (1999). CM strategies for RAD. In Es-

tublier, J., editor, System Conﬁguration Management:

9th International Symposium (SCM-9), LNCS 1675,

pages 204–216.

Xing, Z. and Stroulia, E. (2005). UMLDiff: an algo-

rithm for object-oriented design differencing. In Red-

miles, D. F., Ellman, T., and Zisman, A., editors,

20th IEEE/ACM International Conference on Auto-

mated Software Engineering (ASE 2005), pages 54–

65. ACM.

DIFFERENCING AND MERGING OF SOFTWARE DIAGRAMS - State of the Art and Challenges