USING SEMANTIC ANNOTATIONS OF WEB SERVICES FOR

ANALYZING INFORMATION DIFFUSION IN THE DEEP WEB

Shahab Mokarizadeh

, Peep K

ungas

and Mihhail Matskin

Royal Institute of Technology, Stockholm, Sweden

University of Tartu, Tartu, Estonia

Keywords:

Semantic Web Services, Information Diffusion, Deep Web, Linked Services, Web Services Network Analysis.

Abstract:

Since Web services represent a fragment of the Deep Web, Web service interface descriptions reﬂect the

content types available in the Deep Web. Therefore semantic annotations of these Web service interfaces, after

using them to link services to services networks, allow analysis of the structure of the Deep Web. In this work,

we investigate information diffusion, as one of highlighted Deep Web research directions, among networks

of Web services. We present a model for analyzing information diffusion between both individual service

providers and entire service industries. The proposed model is evaluated based on set of public Web services

interface description harvested from public registries. The model indicates high potential of the proposed

method in understanding the hidden structure of the Deep Web and interactions between individual service

providers or service industries.

1 INTRODUCTION

Web services represent a fragment of the Deep

Web (Bergman, 2001) since they facilitate access to

data, which is neither visible to search engines nor di-

rectly explorable. Semantic annotations of Web ser-

vice interfaces not only make the services searchable

by their thematic content but also allow, after us-

ing the annotations to link the services into services

networks, analysis of the underlying deep web con-

tent. In this work, we investigate information diffu-

sion, as one of highlighted Deep Web research direc-

tions (Geller et al., 2008), among networks of Web

services. Information diffusion is deﬁned as the com-

munication of knowledge over time among members

of a social system (Shi et al., 2009). The respec-

tive studies (Cha et al., 2009; Teng et al., 2009; Shi

et al., 2009) in the context of biosphere, microblogs,

and publication citation have turned to be useful for

revealing intrinsic properties of particular real world

phenomena. Similarly, the services that are published

in the Web not only offer capabilities but also indi-

rectly exploit the content and data published by other

Web services. This creates a kind of conceptual ecol-

ogy of knowledge where information is shared and

ﬂows along input and output parameters of service

operations. Our hypothesis is that analysis of infor-

mation diffusion in Web services networks can reveal

intrinsic properties of underlying Web services. An

example of such properties is the hidden reality of

how Web services in different service commodities

have been designed from information exchange per-

spective.

This paper presents a model for analyzing infor-

mation diffusion among commodities of Web services

given the network of Web services. The proposed ap-

proach relies on a set of semantically annotated and

categorized web services to ﬁrst construct a Web ser-

vices network, then transform it into a category (com-

modity) network, and ﬁnally compute a diffusion ma-

trix. The diffusion matrix captures the volume of po-

tential information ﬂow between Web services cate-

gories. The volume of information ﬂow reﬂects col-

laboration between different service industries. The

proposed approach is evaluated on set of public Web

services (in WSDL interfaces) exposed by the major

service industries. From semantic deep web perspec-

tive, our work follows deep web service annotation

approach to access deep web content and it addresses

deep web data fusion issue according to Geller et

al. (Geller et al., 2008).

The rest of this paper is organized as follows. In

Section 2 we introduce the foundations of Web ser-

vice categorization, semantic annotation and network

formation. In Section 3 we outline our model for

analyzing information diffusion in Web service net-

110

Mokarizadeh S., Küngas P. and Matskin M..

USING SEMANTIC ANNOTATIONS OF WEB SERVICES FOR ANALYZING INFORMATION DIFFUSION IN THE DEEP WEB.

DOI: 10.5220/0003931801100115

In Proceedings of the 8th International Conference on Web Information Systems and Technologies (WEBIST-2012), pages 110-115

ISBN: 978-989-8565-08-2

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

works. Section 4 describes our experimental settings

and analyses the results. Finally, conclusions and fu-

ture work are presented in Section 5.

2 PRELIMINARIES

We deﬁne information diffusion in terms of infor-

mation ﬂow from output parameter(s) of a Web ser-

vice operation to input parameter(s) of other Web ser-

vice operations in a Web services network. To this

end, we ﬁrst categorize and semantically annotate the

Web services under examination. Web service match-

making is the next step which leads to construction

of a Web service networks. Finally, we apply our in-

formation diffusion discovery model to estimate the

information ﬂow in the network.

2.1 Web Service Categorization

In Web service categorization step, we assign each in-

dividual Web service to its corresponding categories.

A category describes a general kind of a service

that is provided, for example “banking service” and

“weather service” (Heß and Kushmerick, 2003). In

the context of this paper we are only interested in cat-

egorizing Web services at higher category levels (e.g.

“E-Commerce”, “Weather”, etc.) rather than at lower

levels (e.g. “search for a ﬂight”, “get temperature”).

For instance, Logistics category in our categorization

scheme includes any Web service whose operations

are related in some way to transportation or postal

services such as DHL Service and Fedex Notiﬁcation

Service. In this regard, our categorization scheme is

similar to the approach exploited by Heß and Kush-

merick (Heß and Kushmerick, 2003) and Crasso et

al. (Crasso et al., 2008). We assume that there exists

a set D = {d

,...,d

} of Web service categories

where no structural relationship (e.g. taxonomic) is

assumed among members of D. It should be noted that

a Web service can be associated with multiple cate-

gories.

2.2 Semantic Annotation and Web

Service Matching

In this work we only require annotation of basic el-

ements of Web service operation input and output

parameters. These element names are either WSDL

message part names or XML schema leaf element

names. The reason is that the actual pieces of in-

formation, exchanged between services, are encoded

with these basic elements. The extracted terms are in-

gredients of our previously developed ontology learn-

ing component (Mokarizadeh et al., 2010) to generate

a reference domain ontology. The reference ontol-

ogy is formally presented as C = {c

...}, where c

represents an element in a reference ontology. In our

reference ontology, concepts are inter-related through

additional ontological relations (Mokarizadeh et al.,

2010).

Semantic annotations of Web services are ex-

ploited in order to ﬁnd semantic matching between

inputs and outputs of services. As the annotated el-

ements (i.e. terms) are in fact instances in the gen-

erated reference ontology, the instance matching pro-

cess is used to ﬁnd ontological relationships between

those instances. We employ a rule-based instance

matching method that has been already described and

evaluated in our previous work (Mokarizadeh et al.,

2011). The matching component takes as input a

pair of instances and produces a correspondence ele-

ment. Each correspondence element implies whether

a semantic relation holds between the two given in-

stances, according to a particular matching rule. The

presence of such semantic relation means that the un-

derlying output and input elements of Web service

operation parameters can be matched. The implicit

assumption here is that matching process is only per-

formed between pair of elements where one of them

represents an output element of a Web service oper-

ation and the second one depicts an input element of

another Web service operation. The results of match-

ing process is exploited in Web service network for-

mation which will be discussed in next sections.

2.3 Web Services Network Models

We distinguish Annotated, Semantic and Category

representations of Web service networks derived from

semantically annotated Web services.

Annotated Web Service Model. This network cap-

tures main elements of WSDL descriptions as nodes

and edges of a directed graph. The graph is further en-

riched with references to ontology elements and cat-

egory labels. A node P

in this model refers to input

and output parameters (i.e. the WSDL message part

names and XSD schema leaf element names) of Web

service operations. Every node is annotated with: 1)

a semantic label C

that points to an ontology element

in reference ontology C, and 2) category label D

that

refers to the afﬁliated category in category list D. Fi-

nally, nodes are connected by respective Web service

operations represented as directed edges from nodes

representing input elements towards the nodes depict-

ing the output elements. In fact, an instance of this

network model is nothing more than a collection of

discreet graphs constructed to facilitate understand-

USINGSEMANTICANNOTATIONSOFWEBSERVICESFORANALYZINGINFORMATIONDIFFUSIONINTHE

DEEPWEB

111

ing of the subsequent network transformation mecha-

nisms.

An illustrative example of this network model is

shown on the left side of Figure 1. Accordingly, the

network is formed by two web services (W S

and

W S

), each of which consists of one operation (OP

and OP

respectively). The services are classiﬁed un-

der category labels D

and D

. Basic elements are

denoted by nodes P

− P

and annotated with con-

cepts C

− C

. Moreover, the assigned category to

each Web service WSDL description is propagated to

their WSDL elements (not shown in Figure 1 for the

sake of readability).

Semantic Network Model. A Semantic network is a

loop-free directed graph and it is the semantically uni-

ﬁed representation of the underlying annotated Web

service network. A directed edge in this graph shows

direct dependency between source and target node

such that the concept represented by target node is

produced by a service operation only if the required

concepts, represented by source nodes, is given. A se-

mantic node C

in this model refers to a semantic con-

cept. This concept could represents uniﬁcation of one

or several ontological concepts in ontology C. Ev-

ery semantic node is further associated with category

vector

−→

Q denoting the weight of the semantic node in

different categories wrt its relative occurrence in the

categories.

Category Network Model. This model represents a

directed graph and it is used to capture the category

view of the underlying Web services network. Node

in this model represents an individual Web service

category while edges are denoting inter-category rela-

tionships (e.g. direction of information ﬂow). More-

over, edges are labeled with weights expressing the

volume of information ﬂowing from source to target

node. Unlike the previous models, self-loops are per-

mitted in this model.

3 INFORMATION DIFFUSION IN

WEB SERVICES NETWORKS

3.1 Web Services Network Formation

Applying of the proposed semantic annotation and

matching methods results in emergence of the respec-

tive annotated Web services network. This network is

the main input for construction of other two types of

networks—semantic and category networks. Trans-

formation mechanisms to create instances of seman-

tic and category networks are elaborated in the rest of

the section.

3.1.1 Semantic Network Formation

Transformation of an annotated Web services net-

work to a semantic network starts by replacing the

nodes with corresponding ontological concepts. Lets

consider again the example of the annotated network

in Figure 1. In this transformation process, the in-

put parameters P

and P

are replaced with ontologi-

cal concepts C

and C

respectively while C

and C

substitute the output parameters P

in a sim-

ilar manner. Part (b) in Figure 1 shows the trans-

formed network. Next, we exploit the results of

match-making process to unify the concepts repre-

senting matched output and input elements. This po-

tentially results in emergence of new nodes with uni-

ﬁed concept labels. Every emerging semantic node

also inherits the incoming and outgoing edges of the

parent matching nodes as well. Lets consider the

set {hC

i,hC

i} as the only possible matching

cases in the previous example. Thus as a result of uni-

ﬁcation, we will have a graph with source node C

1,3

and target node C

2,4

and three directed edges from

1,3

to C

2,4

. Next redundant edges are eliminated,

so that there will be only one edge connecting two

nodes. Part (c) of Figure 1 illustrates the result of this

transformation. Meanwhile the associated categories

of the nodes in the respective annotated network are

propagated to corresponding semantic nodes. Each

node in the semantic network might be associated to

several categories. We model the afﬁliated categories

of semantic node C

as a normalized category vector

−→

= {q

,...,q

}, where every item q

represents

the weight of concept C

in the category D

∈ D. The

concept weights are calculated as follows:

frequency of C

in D

∑

i=1

frequency of C

in D

(1)

where n refers to the size of category set D. Returning

back to network presented in part (c) in Figure 1, both

semantic nodes C

1,3

and C

2,4

are associated with both

and D

as the result of weight propagation. The

normalized category vector for C

1,3

according to (1)

−−→

1,3

= {q

= 0.5,q

= 0.5} and for semantic node

2,4

−−→

2,4

= {q

= 0.67,q

= 0.33}.

3.1.2 Category Network Formation

Transformation of a semantic network into a cate-

gory network starts with replacing semantic nodes

with their afﬁliated category labels. Meanwhile, we

propagate the category weights from semantic nodes

to the corresponding edges. The category propaga-

tion mechanism works as follows. Let us assume that

there exists a directed edge (C

) in the semantic

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

112

Figure 1: Transformation of annotated Web services to category network. (a):Annotated network, (c): Semantic network, (e):

Category network, (b) and (d): Intermediate networks.

network. In addition let us also assume that C

is af-

ﬁliated with category D

with weight q

u,s

and simi-

larly, C

is associated to category D

with weight q

v,t

By replacing the semantic nodes with respective cat-

egories, we obtain the partial category weight for di-

rected edge (D

) as follows:

u,v(D

)

= q

u,s

∗ q

v,t

(2)

We refer to ω

u,v(D

)

as partial weight since the

transformation may result in appearance of multiple

edges between the same pair of category nodes. In the

last step, we restructure the network so that identical

nodes (i.e. nodes with the same labels) are uniﬁed.

Consequently, the category weights of identical edges

(i.e those having the same source and target nodes)

are augmented into one single representative edge and

weight vector. In other words, for every directed edge

) in the category network, the actual weight is

computed as follows:

)

∑

∀ edge (u,v)in Semantic Network

u,v(D

)

(3)

To illustrate application of the preceding, let us con-

sider again the semantic network in part (c) of Fig-

ure 1. The result of ﬁrst stage of transformation

is depicted in the graph shown in part (d) of Fig-

ure 1. Accordingly, the semantic nodes C

1,3

and C

2,4

are replaced with their associated category labels D

and D

. As the category weights of semantic nodes

are available in

−→

and

−→

, we apply (2), which re-

sults in the following edge weights: ω

)

= 0.5 ∗

0.67, ω

)

= 0.5 ∗ 0.33, ω

)

= 0.5 ∗ 0.67 and

)

= 0.5 ∗ 0.33. Next, by unifying the identi-

cal edges and augmenting the category weights,the

Category network presented in (e) of Figure 1 is

constructed. Since the transformation resulted only

one instance for each category edge, the actual weight

for each edge will be equal to the partial weight. Part

(e) of Figure 1 illustrates the ﬁnal constructed Cate-

gory network.

3.2 Measuring Information Diffusion

In order to measure density of information ﬂow be-

tween different Web service categories, we adopt the

approach exploited by Shi et al. (Shi et al., 2009) in

the context of analyzing information diffusion in ci-

tation networks. We regard category weights as dif-

fused information volume from source toward target

category nodes. In order to make the information ﬂow

between different categories in one scale and make

comparison, we follow Z-score normalization prin-

ciples. To this end, we ﬁrst compute the sum of all

weights for all outgoing edges from each category in

the network and populate a matrix A with these val-

ues. We then normalize (i.e. divide) the volume (i.e.

sum) of weighted edges between any pair of nodes by

the rate we would expect if the volume of weights of

incoming and outgoing edges were the same.

Let us assume that W

)

is the actual weight

of edge (D

) obtained by utilization of (3), W

i∗

∑

)

is the sum of all weights of all links from

category i, W

∗ j

∑

)

is the sum of all weights

of all links to category j and W =

∑

i, j

)

is the

sum of all weights of all links in matrix A. Then the

expected volume of weights, assuming indifference to

ones in their own category and others, from category

i to category j is E[W

i j

] = W

i∗

×W

∗ j

/W .

We deﬁne the category weight as a Z-score that

measures standard deviations with respect to expected

i j

. Here we have learned that W  W

i∗

and W 

∗ j

, hence we approximate the standard deviation by

E[W

i j

]. In this way, for every entry in matrix A,

USINGSEMANTICANNOTATIONSOFWEBSERVICESFORANALYZINGINFORMATIONDIFFUSIONINTHE

DEEPWEB

113

we obtain a normalized value, which we refer to as

diffusion weight (φ):

i j

= (W

i j

−

i∗

×W

∗ j

)



i∗

×W

∗ j

(4)

A high proximity (φ

i j

) between categories i and j re-

veals a strong tendency for semantic concepts asso-

ciated to category i to be resulted from invocation of

services which take semantic input concepts associ-

ated to category j.

4 EXPERIMENTAL SETTINGS

AND RESULTS

4.1 Data

We evaluated the proposed approach for measuring

information ﬂow in a collection of public Web ser-

vices from different categories. This collection con-

sists of around 30000 Web services’ descriptions

in WSDL language and they have been harvested

from different public repositories during the period of

2005–2011. From this set of descriptions, we man-

ually identiﬁed the categories of 1107 Web services

according to the classiﬁcation made by SOA Trader

website

. We acknowledge that we haven’t done any

evaluation over the accuracy of this categorization.

The extracted categories (26 items) together with the

quantity of Web services in each category are summa-

rized in Table 1. Additionally, each category is asso-

ciated to an identiﬁer. This identiﬁer allows to locate

each category in the computed information ﬂow ma-

trix presented in Figure 2.

In order to facilitate creation of semantic net-

works, we extracted top 30000 most recurrent terms

(XSD schema leaf element names or WSDL message

part names) from all WSDL documents in our dataset.

This limit was mainly set to reduce the amount of

computational resources needed to perform the ex-

periments and to make evaluation process a manage-

able task for a human expert. This collection of most

frequent terms was ﬁrst syntactically normalized and

processed. Next a reference ontology is automatically

generated based on the mechanism explained earlier

in Section 2.2. The generated ontology then is used

to semantically annotate input and output parameters

of Web service operations. The ontology embodies

11610 ontological concepts and it annotates around

66% of entire targeted WSDL elements. Next, we ex-

ploited the result of match-making mechanism to au-

tomatically discover matching Web service elements.

http://www.soatrader.com/web-services/

Based on previous evaluation results (Mokarizadeh

et al., 2011), our annotation and matching mechanism

can achieve the accuracy of around 27% in terms of

F-measure metric. The F-measure is deﬁned as the

weighted harmonic mean of precision and recall.

The result of Web service match-making (i.e. the

correspondence elements) provides ingredients for

generating the semantic network and category net-

work formation. The general characteristics of all

three types of networks (annotated, semantic and cat-

egory) are shown in Table 2.

Table 1: The number of global Web services in each cate-

gory.

Index Category #Size Index Category #Size

1 Travel 46 14 Weather 125

2 B2B 21 15 Business 8

3 E-Health 1 16 Finance 159

4 Statistics 4 17 Interoperability 3

5 Communication 154 18 Location 33

6 Human Resources 5 19 Science 4

7 News 74 20 E-Commerce 113

8 Utilities 21 21 Security 1

9 Data 5 22 Logistics 19

10 Test 11 23 Bioinformatics 227

11 Dictionaries 6 24 GIS 16

12 Contacts 6 25 Internet 19

13 Entertainment 5 26 Industry 4

Table 2: General characteristics of exploited networks.

Network Type #Nodes #Edges AverageOut Degree

Annotated 8062 302065 37.47

Semantic 4050 157411 38.87

Category 26 588 22.62

4.2 Results

By applying (4) to the resulted category networks,

we obtain a diffusion weight matrix visualized at Fig-

ure 2. The row and column numbers in the matrix are

indexes to locate the corresponding category names in

Table 1. The accumulated density in the diagonal line

of the matrix reveals that some communities in this

collection mainly provide input for their own services

and consume mostly the information provided by the

same community. This is because Web services in

these communities exploit frequently domain-speciﬁc

concepts as input and output parameters. We refer

to this behavioral model as self-referential pattern.

However, only small number of communities, namely

B2B, Communication, Business, Finance and Loca-

tion exhibit noticeable self-referential behavior.

Based on the entries in the matrix, the smallest in-

formation ﬂow volume belongs to E-Commerce com-

munity. This is because the concepts representing the

output parameters of services in this community is

rarely appearing as input parameters of services oper-

ating in other communities. Moreover, it can be seen

that this community follows also the self-referential

pattern. Hence, it can be inferred that E-Commerce is

WEBIST2012-8thInternationalConferenceonWebInformationSystemsandTechnologies

114

Figure 2: Visualization of matrices of category weights be-

tween different communities of Web services. Each entry

is shaded according to a normalized Z-score representing

whether the density of information ﬂow is higher or less

than expected at random. Darker shading indicates higher

Z-scores. The diagonal line represents information ﬂow

within same category.

the most isolated community as it receives and de-

livers the least amount of information compared to

other communities. From another perspective, iso-

lated communities are potential candidates for devel-

oping new value-added services. By this, we mean

services that can make a bridge between the isolated

communities and the rest of the world, provided that

logically developing such a service is meaningful and

brings business value for either of parties. The afore-

mentioned heuristics are quite compatible with the an-

alytical rules suggested by Cui et al. (Cui et al., 2009)

for pinpointing service composition opportunities in a

large-scale Web services network.

The implicit assumption in the aforementioned

analysis is that the utilized annotation and match-

making scheme determines (sufﬁciently) accurate se-

mantics of parameters and performs precise matching.

The imperfection or bias in the annotation scheme or

match-making approaches potentially leads to signif-

icant deviation from actual results which could even

falsify our current results.

5 CONCLUSIONS AND FUTURE

WORK

In this paper, we proposed a model for using seman-

tic annotations of Web service interface descriptions

to measure information diffusion among categories of

Web services. The experimental results demonstrate

that the proposed model can be effectively used to rea-

son about information diffusion patterns between cat-

egories of Deep Web resources, more speciﬁcally be-

tween public Web services. The main priority of our

future work is targeted towards increasing the quality

(both semantic annotation and categorization) of eval-

uated dataset to analyze further the identiﬁed patterns.

REFERENCES

Bergman, M. K. (2001). The deep web: Surfacing hidden

value. World Wide Web Internet And Web Information

Systems, 7(1):1–17.

Cha, M., Mislove, A., and Gummadi, K. P. (2009). A

measurement-driven analysis of information propaga-

tion in the ﬂickr social network. In Proceedings of

the 18th international conference on World Wide Web,

WWW ’09, pages 721–730, USA. ACM.

Crasso, M., Zunino, A., and Campo, M. (2008). Awsc:

An approach to web service classiﬁcation based on

machine learning techniques. Inteligencia Artiﬁ-

cial, Revista Iberoamericana de Inteligencia Artiﬁ-

cial, 12(37):25–36.

Cui, L. Y., Kumara, S., Yoo, J. J.-W., and Cavdur, F. (2009).

Large-scale network decomposition and mathematical

programming based web service composition. In Pro-

ceedings of the 2009 IEEE Conference on Commerce

and Enterprise Computing, pages 511–514.

Geller, J., Chun, S. A., and Jung, Y. (2008). Toward the

semantic deep web. Computer, 41(9):95 –97.

Heß, A. and Kushmerick, N. (2003). Learning to attach se-

mantic metadata to web services. In ISWC2003, pages

258–273. Springer.

Mokarizadeh, S., K

ungas, P., and Matskin, M. (2010). On-

tology learning for cost-effective large-scale semantic

annotation of web service interfaces. In EKAW, pages

401–410.

Mokarizadeh, S., K

ungas, P., and Matskin, M. (2011).

Evaluation of a semi-automated semantic annota-

tion approach for bootstrapping the analysis of large-

scale web service networks. In Web Intelligence

and Intelligent Agent Technology, pages 388–395.

IEEE/WIC/ACM.

Shi, X., Tseng, B. L., and Adamic, L. A. (2009). Informa-

tion diffusion in computer science citation networks.

CoRR, abs/0905.2636.

Teng, W.-G., Pai, W.-M., and Chen, K.-C. (2009). Explor-

ing information diffusion patterns with social relation-

ships in the blogosphere. In ICCI ’09, pages 422–427.

USINGSEMANTICANNOTATIONSOFWEBSERVICESFORANALYZINGINFORMATIONDIFFUSIONINTHE

DEEPWEB

115