Edit distance (Levenshtein, 1966) is adopted here
because it is a good manner to calculate the
similarity for two words with high efficiency.
3 ONTOLOGY SIMILARITY
The goal of ontology similarity measurement is to
find out a pair of two ontologies’ concepts which
have the same meaning but described in different
ways. Here, we put forward a novel approach to get
the similarity between ontologies based on edit
distance.
3.1 Ontology Extraction
Ontology extraction is the first step in rapid data
integration, which generates ontology from different
databases and distributed network nodes. The
ontology extraction methods and steps are given as
follows.
(1) Class and subclass construction.
Class is an important element in ontology
construction. In accordance with the storage
characteristics of database, one class will be
generated from one data table. The class name of the
table is directly transferred to the class name.
Consider a data table structure. If existing a
subclass, it must emerge that one field is referred to
the primary key as its foreign key in the form of the
data table. Thus, the corresponding subclass will be
generated when the data table exists such condition.
For example, the field Table1_id as primary key is
the foreign key of the field column1 in Table1. If
there exists a record that column1 with value
column1_value and Table1_id with Table1_id_value,
then a subclass should be created as
Table1_id_value is the subclass of column1_value.
And the class names are respectively
Table1_id_value and column1_value.
In addition, the platform provides the way
through user defined, achieving the goal of
contenting to different demands for subclass
construction. For example, in the data table Table1
(column1, column2, column3, column4), the main
class Table1 has been generated before, then we can
carry out partition for certain field. If the user
divides the column3 into low, medium and high,
then the table can generate three different subclasses
Table1.low, Table1.medium, the Table1.High.
(2) Property construction.
Object property: If a field in a table (T1) depends
on second table (T2), an object property of the class
corresponding to T2 should be created. The
property’s range and domain should be also created
according to dependencies of such object properties.
For datatype property, it can be created from the
fields’ types. The process of datatype property can
be combined with the construction for individual
construction.
Sub-property is the supplement for property
construction. Since sub-property cannot be produced
directly from data table, two ways are provided to
create sub-properties: 1) manually define sub-
properties and 2) automatically extract property
hierarchy from user-defined property tables. The
former one is that the user can form sub-properties
by selecting one property as the sub-property of the
other’s. And the latter one can generate sub-
properties according the rules described in user-
defined property tables.
(3) Individual construction.
Each record in the data table corresponds to one
individual in ontology construction. Generally, it is
feasible to conduct ontology mapping by generate
the corresponding individuals for all records in the
table.
(4) Domain and Range.
The domain and the range of datatype property
are corresponding to data types of the fields in the
data table. If the field refers to other data table, the
range of this property is regarded as object property.
(5) Other construction.
In order to improve the accuracy of ontology
mapping, some aid information is also added into
computing process, such as complex class, property
feature and property restriction.
3.2 Similarity Calculation
Since several aspects for data sources should be
considered in rapid data integration such as table
name, column name etc., and the existing ontology
concept similarity algorithm can't meet the
integrated requirements. In order to solve this
problem, multiple features of one ontology are
involved into calculation, that is, several similarities
are calculated for one ontology when doing ontology
similarity analysis.
(1) Similarity between classes or subclasses.
23
11
1
//
(1 )( )
2
od
nn
opdp
ii
nSsum nSsum
CSi S
ββ
==
×+×
=× +−
∑∑
(1)
where
2
)(/
maxmin
1
ij
SCNDD
S
=
,
2
)(
2
ijoc
SOCNS
S
+
=
and
)(
3 ij
SDCNS
.
DATA2012-InternationalConferenceonDataTechnologiesandApplications
238