Rule 8 The link between a structure S2 with the
primary key KEY2 nested inside another structure S1
with the primary key KEY1 is preserved by adding
a choice reference structure formed by primary keys
KEY1||KEY2 inside the nested structure.
Rule 8 is contained in the preparation part of the
RA and is applied at the end of Part 1 when sequences
are still nested. References are included in the in-
ner structure to borrow its operator, thereby to pre-
serve the cardinality of the nested structure. If the
outer structure S1 contains only S2 and no additional
elements but is part of a structure S0 with primary
key KEY0, then the choice structure KEY0||KEY2 is
added inside S2.
Referring to example (d) from Figure 6, the refer-
ence node is formed by (eidREF||pidREF) as eid and
pid are primary keys.
company (employee (eid, project (pid, de-
scription), name)
R8
⇒ company (employee (eid,
(eidREF||pidREF, pid, description), name)
We represent the choice structure for references
using a double line || as this is evaluated differently
from a regular alternative construction. Only one ele-
ment of the alternative structure for references is go-
ing to be found (if any) in the other schema. Thus,
contrary to SimChoice previously defined, that must
determine how many alternative options from one
schema are found in the other schema, SimChoiceRef
must evaluate if there is any corresponding reference
(see Equation 11). Thus, only one ε
x
i
→y
j
is greater
than zero from the components of the maximum func-
tion, where x
i
and y
j
are reference alternatives from
XRTS1 and XRTS2, respectively.
SimChoiceRe f = Max(ε
x
i
→y
j
) ∗ ε
REF
∗ 100% (11)
A correct equivalence evaluation must also consider
(a) <!ELEMENT company (employee, project)>
<!ELEMENT employee (eid, pid, name)>
<!ELEMENT project (pid, description)>
eid and project.pid primary keys,
employee.pid is keyref to project.pid
(b) <!ELEMENT company (employee, project)>
<!ELEMENT employee (eid, pid, name)>
<!ELEMENT project (pid, description)>
(c) <!ELEMENT company (employee, project)>
<!ELEMENT employee (eid, name)>
<!ELEMENT project (pid, eid, description)>
employee.eid and pid are primary keys,
project.eid is keyref to employee.eid
(d) <!ELEMENT company (employee)>
<!ELEMENT employee (eid, project, name)>
<!ELEMENT project (pid, description)>
eid and project.pid primary keys
Figure 6: Simple possible equivalent schemas.
the existence of primary keys. If XRTS3 is defined as
Employee(eid), with eid primary key and XRTS4 as
Employee(eid), they are not 100% equivalent. Thus,
we revise the similarity formula (Equation 10) to mul-
tiply the node equivalence to the key equivalence for
primary keys. For example if eid is a primary key its
equivalence is the product ε
eid
∗ε
KEY
, where ε
KEY
= 1
if both nodes are primary keys, and < 1 if only one of
them is a primary key.
The examples from Figure 6, are reduced to the
structures detailed in Figure 7 with the KEY suffix
for primary keys and REF for references. By defining
ε
KEY
= 0.7 and ε
REF
= 0.6 and using the operators
equivalence defined in Table 2, schema (a) is simi-
lar to the the rest of the schemas in the proportions
presented in Figure 8, where x represents a generic
schema and ε
eid
is the short form for ε
(a)eid→(x)eid
.
6 EXAMPLE
RA applied to examples from Figures 1 and 2 gener-
ates the following output.
XRTS1: company1((eid | sin, name, address,
(pidREF* | task
nameREF)+) +, (pidKEY, descrip-
tion, budget | manager | location)*, (task
nameKEY,
date)+)
XRTN1: company1(eid+ | sin+, name+, address+,
pidREF* | task
nameREF+, pidKEY*, description*,
budget* | manager* | location*, task
nameKEY+,
date+)
XRTS2: company2(eidKEY, sin, name, address*, da-
teOfBirth?, (pidKEY, description?, manger| location,
eidREF || pidREF, (task, date)+)*)+
XRTN2: company2(eidKEY+, sin+, name+, ad-
dress*, dateOfBirth*, pidKEY*, description*, man-
ager* | location*, eidREF* || pidREF*, task*, date*)
We start by comparing XTRN trees to determine
if they are from the same domain. Consider the opera-
tors equivalence detailed in Table 2. We determine the
node’s similarity from the two schemas using Equa-
tion 2.
Sim
XRTN1→XRTN2
= (
Max(ε
eid+→+
,ε
sin+→+
)
2
+
ε
name+→+
+ ε
address+→∗
+ (ε
pid∗→∗
+
ε
task
name+→task∗
)/2 + ε
pid∗→∗
+ ε
description∗→∗
+
(ε
budget∗→φ
+ ε
manager∗→∗
+ ε
location∗→∗
)/3 +
ε
task
name+→task∗
+ ε
date+→∗
)/9∗ 100% = 96.20%
(a)company(eidKEY, pidREF, name, pidKEY, description )
(b)company(eid, pid, name, pid, description)
(c)company(eidKEY, name, pidKEY, eidREF, description)
(d)company (eidKEY, eidREF||pidREF, pid, description, name)
Figure 7: Reduced schemas.
ICEIS 2007 - International Conference on Enterprise Information Systems
58