SEMANTIC WEB BASED PROACTIVE SEARCH FOR

ENTERPRISE

Li Li, Feng Liu and Wu Chou

Avaya Inc. 233 Mt. Airy Road, Basking Ridge, NJ 07920, U.S.A.

Keywords: Semantic web, REST web service, Proactive search, Text annotation, Software agent, Implicit social

network, Expert finder.

Abstract: This paper presents an approach and a software architecture based on agent and web service technologies to

support proactive search to enrich enterprise communication and collaboration. In particular, we combine

software agents and REST web services to deliver relevant information found from RDF databases to the

users without interrupting their workflows. The relevant information includes text annotations, implicit

social networks, and recommended experts. We discuss how service composition can be used to efficiently

combine results from distributed functions to support independent and scalable semantic web development.

Initial experimental results indicate the proposed approach is feasible and efficient.

1 INTRODUCTION

Enterprise communication and collaboration

typically involves email, meeting schedule, voice,

video, chat, wiki, blogs, and various forms of

documents, such as design documents and bug

reports related to products and service offerings. As

different organizations adopt technologies in

different paces, these rich sets of digital content

often exist in different formats managed by different

systems that are often not connected to each other.

For example, emails between group members are

stored in special format at local disks or email

servers. The product design documents are managed

by some special proprietary relational database,

whereas the project progress is tracked on a separate

group wiki site. As a consequence, a lot of valuable

information is buried in disparate computers that are

not readily accessible.

As the number of media types and the amount of

digital contents increase, it can cost significant

overhead and a reduction in productivity, as users

may have to spend extra effort searching for the

relevant information. For instance, when scheduling

a project planning meeting between several groups,

people typically receive invitations in email that

have a subject, time and location, a short description,

and some attachments and links. If people need to

find out more background information about the

participants, the previous history of contacts on this

subject, or new products related to this project,

people have to do searches using several special

applications. As every participant repeats almost

identical searches to become informed, the

productivity of the enterprise is reduced as the

number of participants increase.

To address this problem, we present a

knowledge agent based approach for enterprise,

derived from two related technologies - semantic

web and proactive search.

The semantic web in our approach refers to a set

of technologies based on the web architecture and

knowledge representation languages including RDF

(RDF) and OWL (OWL 2004). The semantic web

technologies offer a set of solutions to address the

heterogeneous data problem in enterprises, whereas

URI provides a uniform identification mechanism

for data in enterprises, and RDF provides a uniform

representation language about the relations between

those identified data. In addition, HTTP provides a

uniform protocol to access the distributed data, and

SPARQL provides a declarative way to query those

data. Moreover, ontologies offer a way to integrate

RDF graphs from different sources. Because

semantic web technologies are based on Description

Logic (RDF 2004), they also offer a framework to

reason and inference about the data.

Despite these advantages, a challenge to adopt

semantic web for enterprise is how to transform the

raw enterprise data into RDF. It is ideal but

649

Li L., Liu F. and Chou W..

SEMANTIC WEB BASED PROACTIVE SEARCH FOR ENTERPRISE.

DOI: 10.5220/0003474806490656

In Proceedings of the 7th International Conference on Web Information Systems and Technologies (SWAT-2011), pages 649-656

ISBN: 978-989-8425-51-5

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

unrealistic to force all enterprise systems and

applications to expose their data according to a

predefined ontology. Instead, we need to allow

organizations to evolve their semantic web

incrementally and independently. To support this

path, we adopt REST (Fielding 2000, Richardson

2007) web service paradigm as our semantic web

infrastructure, because REST is optimized for such

distributed hypertext systems. Unlike conventional

approaches to semantic web that aim to support

linking and querying of raw RDF triples (Linked

Data), we focus on developing knowledge based

web services that can enhance enterprise

communications. In particular, we investigate how

to develop a scalable and robust REST architecture

that can share and compose distributed knowledge

web services across organizations.

Proactive search pushes relevant information to

users without user’s asking for it specifically. It is a

departure from current interactive search paradigm

in several aspects. In interactive searches, a user

composes a specific query, enter it into a search box,

select results and integrate them into his application

manually. Albeit being quite flexible, interactive

search has some disadvantages and limitations.

Firstly, the interactive search mode usually forces a

user to leave his current activity and work on a

separate search activity. Secondly, the query does

not carry the context from which the search is

launched. Thirdly, the results of interactive search

depend on the quality and accuracy of user’s query.

Fourthly, to integrate the search result back into the

user’s workflow and context, it typically requires

user’s manual operation.

In proactive search, a user’s communication

activity is treated as the query to the search engine,

thereby providing the necessary context for more

accurate results. Instead of asking a user to select the

results, proactive search integrates relevant

information directly into the communication activity

in a nonintrusive way. As a result, the user can focus

on his business activities without taking detours to

seek for relevant information. For example, an

incoming email or an outgoing email can be treated

as query to the proactive search engine. The relevant

information found about the topic, people or

products mentioned in the emails is integrated into

the emails as hyperlinks. The disadvantage of

proactive search is that the input is limited to current

user’s activity and context. The second challenge in

adopting semantic web technologies is how to

determine what is relevant given a context as this is

significantly more complex than most queries. In

open domain search, these are very difficult

problems. However, as enterprises have more

organized and predictable activities and workflows

than individual users, we can use those patterns in

enterprise data to help tackle these tough problems.

Proactive search can be supported by client-

server architecture as interactive search. However,

the clients in proactive search assume more

responsibility than in interactive search. In proactive

search, the clients are software agents that monitor

user’s activities and invoke the corresponding

knowledge base web services to obtain the right

information at the right time.

The rest of the paper is organized as follows.

Section 2 presents the overall architecture of our

semantic web based system. Section 3 briefly

discusses the knowledge transformation process.

Section 4 presents some functions and services built

into our approach and architecture. Section 5

discusses the agents and applications based on the

described functions and services. Section 6 is

dedicated to implementation and experimental

results. Section 7 reviews some related work, and we

conclude this paper with Section 8.

2 OVERALL ARCHITECTURE

To support semantic web based proactive search, we

need to provide customized semantic web based

functions that are targeted to different business

environments. For example, in a call center, we need

functions that classify emails, annotate important

concepts in emails, and suggest relevant responses.

In a group collaboration application, we need

functions that bring up contact history on a subject

and show common interest between participants.

However, due to the limitation of SPARQL, many of

these functions cannot be implemented as SPARQL

queries to RDF databases. For this reason, we decide

to expose these functions as REST services that are

sharable and reusable across organizations. REST

services encourage distributed and independent

development of services, which is one of our design

goals. Besides connecting different applications, our

REST composition approach allows us to distribute

a semantic function that is too large for one machine

to multiple machines in parallel, and use service

composition to aggregate the distributed logic.

On the client side, our software agents are

embedded in user’s communication and

collaboration applications. These agents monitor

user’s activities, retrieve relevant information from

the REST services, and inject the relevant

information into the collaboration environment.

WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies

650

Figure 1 illustrates the high level components in our

REST composition architecture.

Figure 1: High level components of REST architecture.

By driving the agent states with hypertext from

the REST services, this REST architecture offers the

following degrees of freedom for adopting

independent changes. At the server side, a new

service can be deployed without having to

reconfigure all the agents. An existing service can be

upgraded without breaking those agents that use the

service. On the client side, agents can acquire

different “skills” required for different environments

by following the hyperlinks to different REST

services. Although in this architecture, the agents do

not need to directly communicate with each other,

they can still collaborate indirectly by sharing their

states through the servers. Agents and servers can

also use content negotiation to find the best

representation for a given situation.

Each server in our architecture builds the REST

services from the knowledge base in layers as

depicted in Figure 2.

Figure 2: Web Server Stack.

The bottom layer is the raw RDF triples collected

from various sources of enterprise data, including

relational databases and web pages. On top of this, it

is a layer of the knowledge derived from the RDF

triples - some of those triples are collected off-line,

and some of them are derived dynamically based on

queries. A function takes an input query and

produces some outputs based on the derived

knowledge. The basic REST services expose these

functions as resources, and the composed REST

services invoke the basic or composed services at

local and remote servers to carry out a task.

Service composition is a process that implements

a service by combining outputs from other services.

This process can be used to break a large semantic

database into small ones and distribute a related

function into a set of servers that form a tree

structure. The servers in the leaf nodes offer the

basic services, while the servers in the interior nodes

offer partially composed services. As the result, the

server at the root node offers the completely

composed services. Because the services are

stateless, a composed service invokes its children

services in parallel and merges the results for its

parent. This process is illustrated in, Figure 3 where

“local” means some local services involved in the

compositions.

Notice that this composition architecture is

different from the conventional computer cluster

architecture which has a fixed entry point and a

specified topology. Instead, in our case, each

distributed function may have a different entry point

and topology of its own. For example, Figure 3

illustrates two distributed functions with entry points

Server 1 and Server 3 respectively. Server 1 is the

entry point to the server tree consisting of Servers 1,

2, 4, 6 and Server 3 is the entry point to the server

tree consisting of Servers 2, 3, 4, 5.

Figure 3: Service compositions of two distributed

functions with solid and dotted lines respectively.

3 RDF TRANSFORMATION

As enterprise data often exist in different forms and

formats, they have to be transformed into semantic

web first. This transformation contains two steps.

First the data are transformed into web resources

that have unique URI. Second the metadata are

extracted and transformed into RDF triples, often

with the help of public and private ontologies. For

Server1

Server2

Server3

Agent1

Agent2

Agent3

Web Server Stack

Raw RDF triples

Derived Knowledge

Functions

Basic Services

Composed Services

Server1

(local)

Server2

(local)

Server3

(local)

Server4

Server5

(local)

Server6

SEMANTIC WEB BASED PROACTIVE SEARCH FOR ENTERPRISE

651

structured data, such as relational databases, this

transformation is straightforward as outlined in

(RDB2RDF). In our study, we have transformed a

relational database about documents with 160,699

records into 3,182,721 triples following the Dublin

Core (Dublin Core) ontology. The subject is the

document URI in all triples. The following are some

sample triples with sensitive information replaced by

generic strings:

<uri_1>

<http://purl.org/dc/elements/1.1/title>

“Title 1” .

<uri_1>

<http://purl.org/dc/elements/1.1/creato

r> “Author 1” .

Our experience shows some problems for well-

defined data. First, it turns out that many important

relations (predicates) are not in Dublin Core or other

ontologies that we know of. We created and added

private ontologies to cover them, but they cannot

interoperate easily outside this domain. Second,

many data fields, such as author names, are not

properly entered in the database. Many names have

variations that make matching and cross-reference

from other RDF databases difficult.

4 FUNCTIONS AND SERVICES

Within our proposed REST architecture, we develop

several functions and services for semantic web

based proactive searches.

4.1 Entity Annotation

Entity annotation function takes an incoming text

and produces a set of annotations for the entities in

the text based on current knowledge. An annotation

is a 4-tuple (phrase, start, length,

link) that identifies the phrase being annotated,

the starting position and the length (both in

characters), and the concept related to the phrase.

When clicked, the link will open a web page

showing the detail information.

To support this function, we first index the RDF

triples on selected predicates. This dramatically

reduces the indexing and search space from entire

literals to the selected literals. Another technique to

save memory and improve efficiency is to avoid

creating separate index. Instead, we pre-process

RDF triples by tokenizing the literals into phrases so

that they would match the tokenized input. The

outcome of this process is a many-to-many mapping

from indexed phrases to concepts.

The annotation algorithm is a modification of the

left-to-right maximum tokenization algorithm (Guo

1997) developed for Chinese language processing.

The algorithm aims to find the longest token

sequence from left to right that matches an indexed

phrase in the RDF triples and record the

corresponding concept. If more than one concept is

found, the server creates a link representing a list of

matches. Unlike the traditional tokenization

algorithm that covers the entire text, our algorithm

skips unmatched tokens. The high-level components

of this function are illustrated in Figure 4.

Figure 4: Components of the annotation function.

This function is exposed as a REST service on a

designated resource. There are two ways to invoke

the service: HTTP GET and HTTP POST. GET is

used for short text, and POST is used for long text.

The REST service returns related annotations in the

following formats: 1) JSON for agents in web

browsers; 2) HTML for direct rendering in web

browsers; 3) HTML tables for embedding and

debugging in web browsers; and 4) Prolog terms for

efficient service composition.

When this function is distributed on a tree of

servers (Figure 3), the parent server sends a copy of

the text to all its children in parallel, which will

return the annotation results (or faults). Once the

parent server receives all the results (within a

timeout interval), it merges what it received so far

into a coherent annotation and sends it back to its

parent or the client if it is the root server. To

maintain the longest phrase condition, the merge

process removes all covered phrases. In other words,

if a server returns an annotation for phrase x and

another server returns a phrase that contains x as a

substring, then the first annotation is removed from

the merged annotation. The merged annotation

therefore will contain only longest phrases.

4.2 Implicit Social Network

In enterprise, people communicate and collaborate

on daily basis. These activities form a social network

RDF triples

text

Annotation

Function

Hypertext

Annotations

Tokenization,

Index

WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies

652

that is dynamic and implicit with rich relations

associated with social contents, e.g. email, IM, co-

authorship, etc. This implicit social network can be

discovered by inspecting the artefacts of these

activities, such as email exchanges and authorships

of project documents, etc. Because there are very

rich relations between people in an organization, our

semantic web approach is well suited for

representing and discovering such implicit social

networks. In our study, we find the following

relationships being important for a person in implicit

social networks (Table 1).

Table 1: Relations in implicit social network.

Relation Comment

collaborators Collaborators of this person

followers People interested in this person

citations Artefacts that cite this person

products Artefacts made by this person

expertise Expertise of this person

To cope with the dynamic nature of the implicit

social network and to save memory, these relations

are derived by rules in response to incoming queries.

The input of this function is a URI identifying a

person, and the output is the relations about the

persons found in the knowledge base. Because this

information is to be consumed by human users in

our system, the current output supports only HTML

in which relations are represented as hypertext. The

high-level components of this function are illustrated

in Figure 5.

Figure 5: Implicit Social Network function.

When calculating these relations, a relation with

more recent and frequent activities are valued more

than one with infrequent activities in the past, as

people in the organizations can take different roles

over time. The activities in a relation are therefore

weighted by an exponential decay function to reflect

this time-based relevance value. Assume that a

relation R is derived from a set of n activities each

with timestamp T

, then the time value of this

relation, with respect to the current timestamp T, is

calculated as follows, where C is a normalizing

factor and λ is a scaling factor:

()

(,, , ) , { |1 }

tv T C R C e R T i n

−−

=≤≤

∑

Similar to the annotation function, when this

function is distributed, the parent server sends a

copy of the URI to its children servers in parallel,

which return the relations (or faults). The parent

server then merges the results using set unions.

4.3 Expert Finder

In many enterprise systems, there is a need to find

experts with certain skills, so that a problem can be

directed to the most qualified persons. In an

organizational environment, since we value team

work and influence as much as individual skills, we

need to find experts who not only have required

expertise, but also have high reputation and

authority.

The expertise of a person can be evaluated based

on the products he produces or contributes. For

example, if a person designed several web servers, it

is reasonable to assume he is an expert in that area.

In current system, the expertise of a person is

represented as a vector where each dimension

corresponds to a skill and the value indicates the

strength of the skill. To speed up the matching

process, a training process is used to compute and

save the expertise vectors for each person in the

knowledge base. For a person p with an expertise

vector e, the relevance of p with respect to a given

problem description vector x can be calculated using

the vector space based semantic model as the angle

between x and e:

relevance(p) = cos(x, e)

The reputation of a person can be evaluated

based on how other people evaluate his work. This is

calculated with bounded recursion and loop

detection based on the implicit social network as

follows, where rating(p) reflects the total evaluation

a person received:

reputation(p) = rating(p)+∑reputation(followers(p))

The authority of a person captures how much

power a person has in an organization. It can also be

calculated with bounded recursion and loop

detection based on implicit social network as

follows, where level(p) corresponds to the power

level a person has in an organizational hierarchy:

authority(p)=level(p)+∑authority(collaborators(p))

The input to the expert finder function is a text

describing a problem, and the output is a ranked list

RDF DB

URI Inference

Engine

Social Relations

Inference Rules

SEMANTIC WEB BASED PROACTIVE SEARCH FOR ENTERPRISE

653

of experts. The input text is first converted to a

vector to search for persons whose expertise is

above a threshold. The candidates in the list are then

re-ranked by averaging the normalized relevance,

reputation and authority scores. The high-level

components of this function are illustrated in Figure

Figure 6: Expert finder function.

When this function is distributed, a parent server

sums up the scores returned from its children servers

for the same person. The returned experts are then

merged and re-ranked to pass back up to the parent

or agent.

5 SOFTWARE AGENTS AND

APPLICATIONS

The proposed architecture, functions, and services

were used and applied to several enterprise

communication and collaboration systems. All the

applications use the same REST services, but they

differ in how the agents behave.

5.1 Browser Agent

The browser agent in our study is a Firefox

extension that monitors and annotates the web page

a user is viewing. A user can activate and deactivate

the agent from the browser menus and ask the agent

to annotate the current page or restore the original

page. The user can also configure the agent to use a

different REST service by entering a different URI.

Figure 7 is a screenshot of the interface to our

browser agent in front of the Firefox browser

window.

5.2 Call Center Agent

This software agent in our study assists human

agents in call centers by finding relevant information

in incoming contacts (e.g. emails) to save human

agents from searching for them. Each incoming

email was intercepted by the software agent to find

and identify the important concepts and recommend

experts related to the email using our REST services.

The email was then enriched to embed found

information as hyperlinks into the original email,

and forwarded to the system. When a human agent

receives this enriched email, he can click the links to

obtain the detailed information.

Figure 7: A software agent embedded in Firefox browser.

Figure 8 is a screenshot of an annotated email in our

prototype Call Center system as seen by a human

agent. For privacy, any personal identification

information is whitened out.

Figure 8: An annotated email in call center client.

5.3 Google Wave Agent

We developed a special software agent for Google

Wave using Google Robot API (Google Wave) to

monitor and annotate multi-party group chat in real-

time. To enable this agent function, a user just needs

to invite our software agent to the current Wave

conversation. When a user clicks a button to finish

RDF DB

Problem Expert Finder Ranked experts

n-gram vector model,

social Network

WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies

654

his chat, the agent will invoke the REST services

and annotate the chat text with hyperlinks that point

to the relevant concepts and information to bridge

the semantic gap in collaboration. These hyperlink

annotations are propagated to all participants of the

Wave session in near real-time. Any user can click a

link in the chat window to visit it.

6 IMPLEMENTATION AND

EXPERIMENTS

We implemented a prototype REST server system

using SWI-Prolog (SWI-Prolog). The HTTP servers

were Prolog images (executable program) compiled

for different machines. Our current RDF database

contains triples from three sources as listed in the

following table (Table 2).

Table 2: Size of knowledge sources.

Source Triples

Wikipedia (2006/03/26) 47,054,407

database 3,182,721

product 75

total 50,237,203

To simulate distributed REST services that

contain different knowledge bases, the Wikipedia

RDF file was broken up at random into 10 small sets

each with up to 5 million triples. This resulted in 12

text files in N-Triples format. These 12 files are

loaded into SWI-Prolog, indexed and converted into

binary SWI-Prolog RDF database format for

efficient loading. The following table (Table 3)

compares the size (KB) of text files with the size

(KB) of binary databases. This table shows that for

most large files, the binary format has an over 80%

size reduction.

Table 3: Comparison of size reductions.

Name Text Binary Ratio

WP_0 753243 148159

19.67%

WP_1 753268 149143

19.80%

WP_2 753986 147896

19.62%

WP_3 753976 147493

19.56%

WP_4 754396 149513

19.82%

WP_5 753430 148539

19.72%

WP_6 753364 148594

19.72%

WP_7 753890 149200

19.79%

WP_8 753979 149268

19.80%

WP_9 309799 67885

21.91%

database 380004 73624

19.37%

product 10 7

70.00%

These binary databases were loaded into the

memory of different machines according to their

capacity. This distributed configuration allows us to

recruit different number of non-dedicated machines,

ranging from powerful servers to even notebook

computers. The smallest system that provides

satisfactory performance for the entire 50+ million

triples consisted of two Linux machines (3.0 GHz

CPU/4 GB RAM and 1.6 GHz CPU/4 GB RAM),

each with about 25+ million triples.

To test the performance of the distributed

servers, we selected 3 Wikipedia binary databases

WP_{0,1,2}, and distributed them into three server

trees in a LAN environment. The first server tree

had one root node containing all 15 million triples;

the second server tree had one root with 5 million

and one child with 10 million triples; the third tree

had one root and two children, each with 5 million

triples. In all these trees, the root server was a

Windows 2003 Server machine with Dual Core (3.0

GHz and 2.99 GHz) and 2GM RAM and the child

servers consisted of two Linux machines mentioned

above. To test the performance of these trees, a test

text of 1142 characters was sent 10 times to the root

server which returns 30 annotations. The average

service execution time was recorded using Prolog

time/1 predicate on the root server and

summarized in the following table (Table 4) with

standard deviations. The execution time includes

time for local function, service composition as well

as logging.

Table 4: Performance of three server trees.

Server tree Avg. Time (second)

1 node 0.406 (0.0003)

2 nodes 0.390 (0.0179)

3 nodes 0.401 (0.0321)

Our results showed that a distributed function

may outperform its local version when it is

distributed over faster machines. When the two

Linux machines with more RAM were used, the

average service execution time on the root was

improved slightly (2 and 3 vs. 1). Also the test

showed that parallel distribution of a function to two

nodes created only a small overhead compared to

distribution to one node (3 vs. 2).

7 RELATED WORK

There have been active researches in how to process

large scale RDF databases (Cai 2004, Urbani 2009,

Ianni 2009, Husain 2009, Large Triple Stores). But

SEMANTIC WEB BASED PROACTIVE SEARCH FOR ENTERPRISE

655

the focus of these efforts is different from ours. First,

their focus is limited to efficient storage and

retrieval of large datasets whereas ours is to support

general computing and inference over the datasets.

Second, the approaches have been based on a

homogeneous architecture where a set of computers

either use a single protocol or form a fixed topology,

whereas our REST composition service based

approach does not assume any single protocol or

topology.

There has been some work on using RDF

database to annotate text (Schönhofen 2008,

Ferragina 2010). But these systems are special cases

of proactive search that we propose. In addition,

they do not propose a general architecture to support

distributed functions.

Our REST service architecture is also different

from the conventional 3-tier web architecture

consisting of data, business logic and presentation.

In our architecture, the presentation is not consumed

by end users but by agents. Unlike business logic

that accesses local data, our logic can access

distributed functions through service composition.

8 CONCLUSIONS

The contributions of this paper are summarized

below:

• We proposed a software architecture by

combing software agents and REST web

services to support distributed and scalable

semantic web development;

• We demonstrated that this architecture can

effectively support proactive search to

enrich enterprise communication and

collaborations;

• We demonstrated that service composition

is a feasible approach to efficiently

combine distributed functions;

• We implemented several agents in different

use cases and a prototype system with 50+

million RDF triples;

The future work will be focused on collecting

more RDF data and develop more advanced

algorithms, functions and services.

ACKNOWLEDGEMENTS

We would like to thank Mr. Jan Wielemaker for

answering our technical questions about SWI-

Prolog. We also thank Mr. Jack Barnard for

providing access to a large document database.

REFERENCES

RDF. Resource Description Framework (RDF),

http://www.w3.org/RDF/, last accessed, 10-Feb-11.

OWL 2004. OWL Web Ontology Language Overview,

W3C Recommendation, 10 February 2004,

http://www.w3.org/TR/owl-features/, last accessed 10-

Feb-11.

RDF 2004. RDF Semantics, W3C Recommendation, 10

February 2004, http://www.w3.org/TR/rdf-mt/, last

accessed 10-Feb-11.

Fielding, Roy T., Architectural Styles and the Design of

Network-Based Software Architectures, Ph.D.

Dissertation, 2000,

http://www.ics.uci.edu/~fielding/pubs/dissertation/top.

htm, last accessed 10-Feb-11.

Richardson, L.; Ruby S., Restful Web Services, O’Reilly,

2007.

Linked Data, http://linkeddata.org/, last accessed 10-Feb-

11.

RDB2RDF, Use Cases and Requirements for Mapping

Relational Databases to RDF, W3C Working Draft, 8

June 2010, http://www.w3.org/TR/rdb2rdf-ucr/, last

accessed 10-Feb-11.

Dublin Core, Dublin Core Metadata Initiative,

http://dublincore.org/, last accessed 10-Feb-11.

Guo, J., Longest Tokenization, Computational Linguistics

and Chinese Language Processing, Vol. 2, No. 2,

August 1997, pp 25-46.

Google Wave, http://wave.google.com/about.html, last

accessed 10-Feb-11.

SWI-Prolog, http://www.swi-prolog.org/, last accessed 10-

Feb-11.

Cai, M.; Frank, M., RDFPeers: A Scalable Distributed

RDF Repository based on A Structured Peer-to-Peer

Network. WWW 2004 Proceedings of the 13

International Conference on WWW, pp 650-657.

Urbani, J. et al, Scalable Distributed Reasoning using

MapReduce, ISWC 2009, Vo. 5823, pp 634-649.

Ianni, G. et al, Dynamic Querying of Mass-Storage RDF

Data with Rule-Based Entailment Regimes, ISWC

2009, pp 310-327.

Husain, M. F. et al, Storage and Retrieval of Large RDF

Graph Using Hadoop and MapReduce, Lecture Notes

in Computer Science, 2009, Vol. 5931, pp 680-686.

Large Triple Stores: http://www.w3.org/

wiki/LargeTripleStores, last accessed 10-Feb-11.

Schönhofen, P., Annotating documents by Wikipedia

concepts, 2008 IEEE/WIC/ACM International

Conference on Web Intelligence and Intelligent Agent

Technology.

Ferragina, P.; Scaiella, U., TAGME: On-the-fly

Annotation of Short Text Fragments (by Wikipedia

Entitities), CIKM’ 10 Proceedings of the 19th ACM

international conference on Information and

knowledge management, pp 1625-1628.

WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies

656