
2) Percentage Indirect Child Nodes (rel_cI%):
the number of nodes in the subgraph rooted at a
specific node, as percentage of the number of nodes
in the ontology. Note that the higher is the value of
the measure for a node, the greater is its
interestingness. For example, node ooo95
(value=42.86%) is more interesting than oo185
(value=14.29%), since there are more nodes in the
subgraph rooted at ooo95. Also, now node ooo40
(value=64.29%) is more interesting than oo250
(value=14.29%) for the same reason. Note that in an
ontology with a deep concept structure, very general
concepts (owl:Thing in the extreme case) would get
higher value than very specific ones. One could
claim that a specific concept is the interesting one,
not the fact that something is an owl:Thing.
However, we consider interesting parts those rooted
at interesting concepts. Thus, according to rel_cI%,
large subgraphs are more interesting than smaller
ones, either the latter are disjoin or subsubgraphs.
Note that in implementation level interesting
concepts are searched within concepts with level
greater than a threshold t, i.e., l(i)>t.
3) Percentage Brother Nodes (rel_b%): the
number of Direct Child Nodes of the father node(s),
i.e., of immediate ancestor(s), of a specific node, as
percentage of the number of nodes in the ontology.
Note that the higher is the value of the measure for a
node, the greater is its interestingness. For example,
node oo250 (value=7.14%) is more interesting than
node oo180 (value=3.57%) and node ooo90
(value=3.57%), since it has more brother nodes
having both oo175 and oo170 as father nodes.
4) Mean Distance of Brother Nodes (mdisbr): the
mean distance of a specific node from its Brother
Nodes. The distance of two nodes d(x,y) is
calculated using the dissimilarity measure presented
in (Boutsinas and Papastergiou, 2008). The
dissimilarity between any two attribute values is
repesented by the distance between the
corresponding nodes of the tree structure as defined
by the following formula: d(X,Y) = 1/fl(X,Y) *
Average((l(X)-fl(X,Y))/max(p(X), (l(Y)-fl(X,Y))/
max(p(Y))) * ( p(X,Y)/ (max(p(X))+max(p(Y))),
where X and Y represent any two nodes, fl(X,Y) is
the level of the nearest common father node of X
and Y nodes, i.e. the level of the nearest common
predecessor, l(X) is the level of node X, i.e. the
depth of the node, max(p(X)) is the length of the
maximum path starting from the root to a leaf and
containing node X, p(X,Y) is the length of the
directed path (number of edges) connecting X and Y
(p(X,X)=0). If there is not a path connecting X and
Y then p(X,Y)=p(X,fl(X,Y))+p(Y,fl(X,Y)).
Mean Distance of Brother Nodes is calculated by
the following algorithm:
for each Brother Node i of node j
calculate d(i,j)
set count+=1, dsum+=d(i,j)
return dsum/count
Note that the lower is the value of d(X,Y) the
greater is the interestingness. For example, node
oo110 (value=0.003%) is more interesting than node
oo610 (value=0.0115%), since its father node is
located deeper in the ontology. Finally, note that
similarity (1-mdisbr) could be used instead of
dissimilarity, for compatibility with the rest
measures.
5) Network Density of range k (nden(k)):
Network Density of range k of a specific node i is
the number of nodes that are connected to or can
be reached from i, via a path of length at most k,
which does not include direction changes. Note that
we have implemented the calculation of nden(k)
dynamically. Note that the higher is the value of the
measure for a node, the greater is its interestingness.
For example, for node oo610 nden(2)=4, since there
are 2 ancestor nodes (ooo40, ooo50) and two
successor ones (oo650,oo700). Thus, it is more
interesting than node oo110 (nden(2)=2).
6) Percentage Incoming Paths (in%): the
indegree (d
in
(i)) of a node i, i.e., the number of edges
which have i as their end-node, as percentage of the
total incoming and outcoming paths, i.e., of indegree
plus outdegree of i. Note that the higher is the value
of the measure for a node, the greater is its
interestingness. For example, node ooo90
(value=66.67%) is more interesting than node oo250
(value=40%).
7) Percentage Outcoming Paths (out%): the
outdegree (d
out
(i)) of a node i, i.e., the number of
edges which have i as their start-node, as percentage
of the total incoming and outcoming paths, i.e., of
indegree plus outdegree of i. Note that the higher is
the value of the measure for a node, the greater is its
interestingness. For example, node oo250
(value=60%) is more interesting than node oo90
(value=33.33%).
8) Percentage Level distribution (n_l(i)%): Level
distribution of a specific node i is the number of
nodes belonging to the level of node i, i.e., l(i), (i.e.,
the length of the maximum path -number of edges-
from the root to node i), as percentage of the number
of nodes in the ontology. Note that the higher is the
value of the measure for a node, the greater is its
interestingness. For example, there are 4 nodes on
the same level with node oo100 (value=14.3%)
which is more interesting than node oo250
(value=10.7%), since there are 3 nodes on its level.
ON ALIGNING INTERESTING PARTS OF ONTOLOGIES
365