which contains tag and left bracket, and then extract
subsequences. They believe that the similarity of
sub-sequences is equal to that of the XML structure.
In this paper, we concentrate on the problem of
pure structural similarity of UML models in XMI
format. That means the tag semantic information is
discarded. We present a modified Edit Distance
method, called Level Edit Distance (LED), to
calculate the similarity of two tree structures based
on only 1 primitive operation. Whereas the
traditional edit distance needs 3 primitive operations,
including change, insertion and deletion.
Additionally, LED calculates level distance at each
level and then sums them up with different weight to
get the final distance between trees. But Wen
calculates distance only on the sub-sequences.
2 MEASURE OF STRUCTURAL
SIMILARITY
2.1 Similarity Principles
An UML model can be exactly converted into a
XMI document, i.e. a well formed XML text so that
the similarity of UML models equals to that of XML
texts. A XML text contains both structural
information and semantic information. But in this
paper, we consider only structure information,
ignore semantic information. Because we aim to find
more similar models in a repository and try to
improve the reusability of various models.
Obviously, the high level model and general model
contains less semantic information so as to guarantee
their application to different backgrounds.
Additionally, they are more abstract and comply
with platform independent philosophy. In terms of
reusability, semantic information tends to obstruct
the model similarity. In fact, the abstract structures
of UML models in different application backgrounds
may be very similar even if identical. The most
differences in various user requirements are often
from semantic narrative texts. So we omit a model’s
semantic information and remain its structure
information in order to compare models in a higher
and more abstract level. We believe it is helpful to
improve the reusability of models. As a result, the
method proposed in this paper concentrates on the
pure structural similarity of UML models in XMI
format. Indeed, it is the similarity of trees.
The Fig. 1 shows 4 trees extracted from 4 XML
texts. However, the similarities among each other of
them are depended on our subject definition. There
are different principles so that the similarity
relationships among the 4 trees are various.
Figure 1: Pure structure of 4 XML documents.
According to the traversing sequence in a tree,
we define two principles to decide the relationships
among the tree structural similarities.
Principle 1: To compare two trees in deep first
way. Namely, two trees are compared from left to
right. For example, in the Fig. 1, according to this
principle, the similarity between tree a and b is
greater than that between a and d. Also, the
similarity between tree a and c is greather than that
between a and d, i.e.
sim(a,b) > sim(a,d) and sim(a,c) > sim(a,d)
Where sim(a,b) means the structural similarity
between tree a and b.
Principle 2: To compare two trees in broad first
way. Namely, two trees are compared from up to
bottom. It implies that the more differences at the
lower level, the more differences between two trees.
The root of a tree is level 0. According to this
principle, the relationships among the structural
similarities are:
sim(a,c) > sim(a,d) > sim(a,b)
Because the tree a and c are still identical at the
level 2 but tree a is different from d at that level.
Tree a and b are different from the level 1.
In this paper our method complies with the
principle 2, and the lower level will has a great
effect on the whole similarity. Indeed a 10 based
level weight is attached to each level in our method.
2.2 Similarity Algorithm
Wen’s method is a typical traditional Edit Distance,
which is the minimum operation cost to transform
one string to another with three primitive operations
including change, insertion and deletion.
Given two XML document d
1
and d
2
, the ED of
Wen’s method is defined as follows.
ICSOFT 2012 - 7th International Conference on Software Paradigm Trends
320