EFFICIENTLY LOCATING WEB SERVICES USING A
SEQUENCE-BASED SCHEMA MATCHING APPROACH
Alsayed Algergawy, Eike Schallehn and Gunter Saake
Computer Science Department, Magdeburg University, 39106 Magdeburg, Germany
Keywords:
Web service, WSDL, XML, Level matching, Schema matching, Pr¨ufer sequence.
Abstract:
Locating desired Web services has become a challenging research problem due to the vast number of available
Web services within an organization and on the Web. This necessitates the need for developing flexible,
effective, and efficient Web service discovery frameworks. To this purpose, both the semantic description and
the structure information of Web services should be exploited in an efficient manner. This paper presents a
flexible and efficient service discovery approach, which is based on the use of the Pr¨ufer encoding method to
construct a one-to-one correspondence between Web services and sequence representations. In this paper, we
describe and experimentally evaluate our Web service discovery approach.
1 INTRODUCTION
Web services are well-defined, reusable software
components that perform specific, encapsulated tasks
via standardized Web-oriented mechanisms. They
can be discovered, invoked, and composed. The re-
search community has identified two major areas of
interest: Web service discovery and Web service com-
position (Ma et al., 2008). This paper presents the is-
sue of locating Web services efficiently. As the num-
ber of Web services increases, the problem of locating
Web services of interest from a large pool becomes
a challenging research problem (Wang and Stroulia,
2003; Hao and Zhang, 2007; Bose et al., 2008). In
fact, to address this problem, several simple search
engines have been developed. However, these en-
gines provide only simple keyword search on Web
service descriptions. Recently, traditional attribute-
based matchmaking algorithms have been proposed.
In the Web service discovery context it becomes ap-
parent that keyword search and attribute-based mech-
anisms are insufficient due to the following reasons.
They do not capture the underlying semantic of Web
services and/or they partially satisfy the need of user
search. This is due to the fact that keywords are often
described by a natural language. As a result, the num-
ber of retrieved services with respect to the keywords
are huge and/or the retrieved services might be irrele-
vant to the need of their consumers (Ma et al., 2008).
More recently, this issue sparked a new research into
the Semantic Web where some research uses ontology
to annotate the elements in Web services (Atkinson
et al., 2007; Nayak and Lee, 2007). Nevertheless, in-
tegrating different ontologies may be difficult, while
the creation and maintenance of ontologies may in-
volve a huge amount of human effort.
To address these challenges, we propose a new
technique for effectively and efficiently locating Web
services on the Web. We start by analyzing the Web
service document specifications described in WSDL
and representing them as service trees. Then, we
identify and extract concrete and abstract parts from
each service tree. We notice that the concrete parts
from different WSDL documents have the same hi-
erarchical structure, but may have different names.
Therefore, we propose a level matching approach
to only linguistically compare elements at the same
level. However, the abstract parts from different
WSDL documents have differences in structure and
semantic. For this, we represent service tree abstract
parts (operations) as sequences using the Pr¨ufer en-
coding method (Prufer, 1918), and then apply our
sequence-based schema matching approach to the se-
quence representation. To validate the proposed ap-
proach, we conducted a set of experiments using real-
world data sets.
287
Algergawy A., Schallehn E. and Saake G. (2009).
EFFICIENTLY LOCATING WEB SERVICES USING A SEQUENCE-BASED SCHEMA MATCHING APPROACH.
In Proceedings of the 11th International Conference on Enterprise Information Systems - Databases and Information Systems Integration, pages
287-290
DOI: 10.5220/0001969702870290
Copyright
c
SciTePress
2 OVERVIEW OF THE
PROPOSED APPROACH
Our proposed approach is based on the exploitation of
the structure and semantic information from WSDL
documents. The objective is to develop an efficient
approach that measures the similarity between Web
services. The measured similarity is used as a guide
in locating the desired Web service. To realize this
goal, we first analyze WSDL documents and repre-
sent them as service trees using Java APIs for WSDL
(JWSDL) and a SAX parser for the contents of the
XML schema (the types element). Then, each service
tree is examined to extract its concrete part and its
abstract parts. We develop a level matching method
to measure the similarity between concrete parts from
different service trees. To measure the similarity be-
tween abstract parts, we propose a sequence-based
matching algorithm. Figure 1 illustrates the outline
of the proposed approach.
Figure 1: Web services similarity measure framework.
3 SIMILARITY MEASURING
3.1 Level Matching
Once obtaining the concrete part of each service tree,
we apply the level matching approach on every con-
crete part pair. The proposed approach linguistically
compares only nodes at the same level, as shown in
Figure 2(a). The level matching approach consid-
ers only semantic information of concrete elements.
It measures the elements’ (tag names) similarity by
comparing each pair of elements at the same level
based on their names, assuming that the same names
bear the same semantic meaning.
To compute the name similarity between two el-
ement names represented as strings, we first break
each string into a set of tokens T
1
and T
2
through a
customizable tokenizer using punctuation, upper case,
special symbols, and digits, e.g, getDataService
{get, Data, Service}. We then determine the name
similarity between the two sets of name tokens T
1
and
T
2
as the average best similarity of each token with
a token in the other set. The output of this stage for
any two service trees ST1 & ST2 are 3 (n
× m
) name
similarity matrices, NSimM, where n
is the number
of concrete elements of ST1 and m
is the number of
concrete elements of ST2 per level
3.2 Schema Matching
To compute the similarity between abstract parts (op-
erations), we should exploit both semantic and struc-
tural information of service trees. To achieve this
goal, we propose a sequence-based matching ap-
proach. The proposed approach consists of two
stages: Pr
¨
ufer Sequence Construction and Matching
Algorithms.
1
3.2.1 Pr¨ufer Sequence Construction
In this stage, we aim to represent abstract parts of
service trees as sequence representation using the
Pr¨ufer sequence method, which constructs a one-
to-one correspondence between service trees and
sequences. We capture the semantic information
of service trees in Label Pr¨ufer Sequences (LPSs),
and the structural information of them in Number
Pr¨ufer Sequences (NPSs). The two sequences form
so-called a Consolidated Pr¨ufer Sequences (CPS =
(NPS, LPS)) (Tatikonda et al., 2007). They are con-
structed by doing a post-order traversal that tags each
node in the abstract part of the service tree, as shown
in Figure 2(b), with a unique traversal number. NPS
is then constructed iteratively by removing the node
with the smallest traversal number and appending its
parent node number to the already structured partial
sequence. LPS is constructed similarly but by taking
the node labels of deleted nodes instead of their par-
ent node numbers. Therefore, NPS that is constructed
from unique post-order traversal numbers gives tree
structure information and LPS gives tree semantic in-
formation.
3.2.2 Matching Algorithms
In this stage, we aim to compute the similarity be-
tween abstract parts of service trees (operations). This
task can be stated as follows: Consider we have
two Web service document specifications WSDL1
1
For more details about our sequence-based schema
matching approach, see (Algergawy et al., 2008)
ICEIS 2009 - International Conference on Enterprise Information Systems
288
(a) Concrete parts of ST1 & ST2. (b) Abstract parts of ST1 & ST2.
Figure 2: Concrete & abstract parts of ST1 & ST2.
and WSDL2, each containing a set of operations.
OpSet1 = {op
11
, op
12
, ..., op
1k
} represents the op-
eration set belonging to WSDL1, while OpSet2 =
{op
21
, op
22
, ..., op
2k
} is the operation set of WSDL2.
The task at hand is to construct k × k
operation sim-
ilarity matrix, OpSimM. Each entry in the matrix
represents the similarity between operation op
1i
from
the first set and operation op
2j
from the second one.
The proposed matching algorithm operates on the se-
quence representations of service tree operations and
consists of three steps.
A linguistic matcher is used to compute a degree
of linguistic similarity for element pairs. The lin-
guistic matcher utilizes the same steps used in level
matching. It also makes use of other element proper-
ties, such as data type of elements. The output of this
phase are k×k
linguistic similarity matrices, LSimM,
where k is the number of operations in ST1 and k
is
the number of operations in ST2. Eq. 1 gives the en-
tries of a matrix, where DataType is a similarity func-
tion to compute the type/data type similarity between
nodes, and combine
l
is an aggregation function, such
as average, weighted sum to combine the name and
data type similarities.
LSimM[i, j] = combine
l
(Nsim(T
i
, T
j
), DataType(n
i
, n
j
))
(1)
We then use a structural matcher to compute the
structural similarity between abstract part elements.
This matcher is based on the node context, which is
reflected by its ancestors and its descendants. The de-
scendants of an element include both its immediate
children to reflect its basic structure and the leaves of
the subtrees rooted at the element to reflect the ele-
ment’s content. In this paper, we consider three kinds
of node contexts: child, leaf, and ancestor context. To
measure the structural similarity between two nodes,
we compute the similarity of their child, ancestor, and
leaf contexts, utilizing the structural properties car-
ried by sequence representations of service trees. The
output of this phase are k× k
structural similarity ma-
trices, SSimM. Eq. 2 gives entries of a matrix, where
child, leaf, and ancestor are similarity functions to
compute the child, leaf, and ancestor context similar-
ity between nodes respectively, and combine
s
is an ag-
gregation function to combine these similarities.
SSimM[i, j] = combine
s
(child(n
i
, n
j
), leaf(n
i
, n
j
), ancestor(n
i
, n
j
)) (2)
After computing both linguistic and structural
similarities between elements of every operation pair,
we combine them. The output of this phase are k × k
total similarity matrices, TSimM. Equation 3 gives
the entries of a matrix, where combine is an aggrega-
tion function to combine these similarities.
TSimM[i, j] = combine(LSimM[i, j], SSimM[i, j]) (3)
Web Service Operation Similarity Matrix. We
use k×k
total similarity matrices to construct the op-
eration similarity matrix, OpSimM. We compute the
total similarity between every operation pair by rank-
ing element similarities in their total similarity ma-
trix per element, selecting the best one, and averaging
these selected similarities. Each computed value rep-
resents an entry in the matrix, where OpSimM[i, j] is
the similarity between operation op
1i
from the first set
and operation op
2j
from the second set.
4 EXPERIMENTAL EVALUATION
Now we describe a set of experiments that validatethe
performance of our pro- posed algorithms. We used
a collection of Web services published by XMethods
2
and QWS data set (Al-Masri and Mahmoud, 2007).
We selected 43 WSDL documents from six different
categories. All the experiments share the same de-
sign: each service of the collection was used as the
basis for the desired service; this desired service was
then matched against the complete set to identify the
best target service. To evaluate the effectiveness of
our proposed approach, we use precision, recall, and
F-measure.
2
http://www.xmethods.net
EFFICIENTLY LOCATING WEB SERVICES USING A SEQUENCE-BASED SCHEMA MATCHING APPROACH
289
(a) Quality measures (abstract parts only). (b) Quality measures comparison.
Figure 3: Quality measures.
4.1 Experimental Results
The first possibility to match between Web services is
to measure the similarity between their operations. In
the first set of experiments, we match abstract parts
of each service tree from each category against the
abstract parts of all other service trees from all cat-
egories. Precision, recall, and F-measure are calcu-
lated and illustrated in Figure 3(a). As can be seen,
our proposed framework has the ability to identify the
desired Web service with recall (R) of 100% across all
tested categories and precision (P) ranging from 64%
to 87%. This reveals that our framework is almost
accurate with F-measure ranging from 78% to 93%.
The second possibility to match between Web ser-
vices is to exploit the similarity between concrete
parts as well as the similarity between their opera-
tions. In this set of experiments, we matched the
whole parts (both abstract and concrete) of each ser-
vice tree against all other service trees from all cate-
gories. We computed precision, recall, and F-measure
for this case, and we compared them against the re-
sults of the first possibility. Figure 3(b) reports the
results. The figure shows that exploiting the whole
WSDL document specifications improves the discov-
ery quality.
5 CONCLUSIONS
In this paper, we described a new approach for Web
service discovery based on schema matching tech-
niques. The proposed approach makes use of the
whole WSDL document specification and divides its
elements into a concrete part and abstract parts. We
devised a level matching approach for concrete parts,
while we developed a sequence-based schema match-
ing approach to compute the similarity between ab-
stract parts. We have conducted a set of experiments
to evaluate our approach. The initial results are en-
couraging. Further work will investigate the exten-
sion of the approach to integrate more semantic infor-
mation and to exploit the full WSDL syntax in order
to improve the approach performance.
REFERENCES
Al-Masri, E. and Mahmoud, Q. (2007). Qos-based discov-
ery and ranking of web services. In ICCCN 2007,
pages 529 – 534.
Algergawy, A., Schallehn, E., and Saake, G. (2008). A
Prufer sequence-based approach for schema match-
ing. In BalticDB&IS2008. Estonia.
Atkinson, C., Bostan, P., Hummel, O., and Stoll, D. (2007).
A practical approach to web service discovery and re-
trieval. In ICWS 2007, pages 241–248.
Bose, A., Nayak, R., and Bruza, P. (2008). Improving web
service discovery by using semantic models. In WISE
2008, pages 366–380. New Zealand.
Hao, Y. and Zhang, Y. (2007). Web services discovery based
on schema matching. In ACSC2007, pages 107–113.
Australia.
Ma, J., Zhang, Y., and He, J. (2008). Efficiently finding
web services using a clustering semantic approach. In
CSSSIA 2008, page 5. China.
Nayak, R. and Lee, B. (2007). Web service discov-
ery with additional semantics and clustering. In
IEEE/WIC/ACM International Conference on Web In-
telligence, WI2007, pages 555 – 558.
Prufer, H. (1918). Neuer beweis eines satzes uber permu-
tationen. Archiv fur Mathematik und Physik, 27:142–
144.
Tatikonda, S., Parthasarathy, S., and Goyder, M. (2007).
LCS-TRIM: Dynamic programming meets XML in-
dexing and querying. In VLDB’07, pages 63–74.
Wang, Y. and Stroulia, E. (2003). Flexible interface match-
ing for web-service discovery. In WISE 2003, pages
147–156. Italy.
ICEIS 2009 - International Conference on Enterprise Information Systems
290