A DISTRIBUTED SOFTWARE ENVIRONMENT FOR
COLLABORATIVE WEB COMPUTING
Antonio Pintus, Raffaella Sanna and Stefano Sanna
CRS4 - Center for Advanced Studies, Research and Development in Sardinia, Italy
Keywords: Distributed, DHT, DART, Web, mobile, collaboration, Web services.
Abstract: This paper describes an extensible core software element of a distributed, peer-to-peer system, which
provides several facilities in order to help the implementation of collaborative, Web-based, distributed
information storing and retrieval applications based on a decentralized P2P model. Moreover, after an
architectural introduction of the core distributed software module, the Core Node, this paper describes a real
application, named DART Node, based on it and designed and implemented within the DART (Distributed
Agent-based Retrieval Tools) project, which carries out the idea of the design and implementation of a
distributed, semantic and collaborative Web search engine, including mobile devices integration use cases.
1 INTRODUCTION
The Internet is evolving in many directions and what
is usually called “Web 2.0” summarizes only some
of them. While data providers have been
decentralized (users superseded traditional
publishers), infrastructure is still centralized, held
and controlled by a few companies.
Managing large amounts of data and supporting
collaborative participation at infrastructure level, are
two of the key concepts on which are been focused
the studies and investigations conducted during the
DART research project (Distributed Agent-based
Retrieval Tools, http://www.dart-project.org)
(Angioni et al., 2007).
The main goal of DART is to realize a flexible,
P2P, collaborative, scalable, fault-tolerant and self-
organized system, which achieves a collaborative
storage and retrieval of large volumes of resources,
for the implementation of a distributed, semantic and
collaborative search engine prototype.
This paper presents and describes the main
software components of this P2P distributed system.
2 THE CORE NODE GENERAL
ARCHITECTURE
The DART system can be viewed as a federation of
nodes called Core Nodes, whose modular
architecture is described in this section.
2.1 DHT Layer and DHT Abstraction
Layer (DAL)
Distributed Hash Tables (DHT) are considered state-
of-art approach to massively distributed and storage-
oriented systems (Balakrishnan, 2003). By means of
DHTs it's possible to realize networks of
cooperating nodes with a deterministic resource
localization and an efficient requests' routing.
The Core Node is mainly a DHT node, based on
and extending the PAST framework (Peer-to-Peer
Archival Storage, http://freepastry.org) (Druschel,
2001), a large-scale persistent and global storage
system based on the Pastry routing algorithm
(Rowstron et al., 2001), so it basically supports data
insert and lookup operations. Moreover, being a
Pastry node, it is also able to route messages with a
generic payload.
The DHT Abstraction Layer of the Core Node is
a Java API layer (Figure 1) that wraps all the low-
level DHT network operations and API provided by
PAST and FreePastry frameworks. It exposes an
interface which simplifies DHT operations and
263
Pintus A., Sanna R. and Sanna S. (2008).
A DISTRIBUTED SOFTWARE ENVIRONMENT FOR COLLABORATIVE WEB COMPUTING.
In Proceedings of the Tenth International Conference on Enterprise Information Systems - SAIC, pages 263-266
DOI: 10.5220/0001720902630266
Copyright
c
SciTePress
message sending over the network. All the higher
level layers in the Core Node architecture rely on
this API.
Figure 1: General Architecture of the Core Node.
2.2 Web Container
The Core Node is equipped with a full configurable
embedded Java Servlet Container, which allows to
fulfil HTTP requests for both static and dynamic
contents and applications, like Web Services. This
way, it is possible to create and deploy Web
applications in the node.
2.3 Services
The node API (Section 2.1) can be exploited in order
to design and implement more domain-focused
distributed software applications, as in the DART
scenario (Section 3). Anyway, in order to grant the
interoperability between the node and external
heterogeneous software systems, a services layer, or
another standard mechanism, become necessary.
Two types of services are provided in the node:
RESTful Services and Web Services (SOAP), which
wrap and use the underlying DHT Abstraction
Layer.
2.3.1 REST Services and Web Services
The Node provides RESTful Services, which expose
an interface for all the basic system functionalities,
like data storing and retrieval requests, or messages
sending.
REST interface is fundamental to allows an
access to DART network to mobile devices and
embedded systems. Moreover, REST interface
simplifies the design and implementation of RIA
using AJAX and standard web browsers.
Although REST can be successfully adopted for
fast integration of simple components, it is not
suitable for complex architectures. REST lacks in
formal descriptions of services interfaces and in
embedded security management. So, the Core Node,
also provides a more formal interface, using
standards like WSDL, SOAP and XML Schema, for
exposed services which are equivalent to RESTful
services mentioned in the previous section.
3 A SEARCH ENGINE
APPLICATION: THE DART
NODE
The DART research project is focused on studying,
developing and testing patterns and integrated tools
to improve the quality of search engines results with
the main objective to satisfy user needs. Among the
others, interesting research fields such semantic-
based indexing, P2P crawling, public Web resources
indexing, location-aware information retrieval and
virtual assistance, are exploited and merged
(Angioni et al., 2007).
Studies and investigations conducted in DART,
have led to the development of a software
application prototype: the DART Node.
3.1 The DART Node Architecture
The DART Node is based on the Core Node,
inheriting all the basic functionalities and extending
them. At run-time, the DART Node automatically
discovers other nodes and collaborates with them in
the P2P network, performing Web crawling tasks
and storing a portion of the content crawled by all
the nodes.
Semantic issues are faced by a Semantic Module
(Figure 2), which works on crawled data and
performs a semantic and geographical
categorization. To achieve this goal, a semantic
analysis process on structured and unstructured parts
of documents is performed (Angioni et al., 2007).
3.2 Collaborative Crawling System
The collaboration between nodes in crawling
activities, helps to crawl the Web in a more effective
manner, reducing network traffic and avoiding
duplication of tasks and nodes overload.
The DART Node adopts a simple policy for
distributed crawling, called “partition by URL”,
where the partitioning scheme is determined by the
ICEIS 2008 - International Conference on Enterprise Information Systems
264
way URLs are published into the DHT, hashing the
entire URL. Each node is responsible for crawling
the URLs published in its partition of the DHT. (Loo
et al., 2004). Crawling distribution is achieved
through a special messages exchange between
nodes.
Future work in DART collaborative crawling
system may consider the adoption of a topic oriented
collaborative crawling (Chung et al., 2002).
3.3 Indexer Module
This module (Figure 2) is capable to collect data
coming from (potentially) several data providers, for
example from Web crawlers, and to store them in
the DHT through the DHT Abstraction Layer.
The Indexer works using a queue with the
adoption of a producer-consumer paradigm. For
textual data types, the Indexer can use the Semantic
Module in order to perform a classification of data
before the storing step.
Figure 2: General Architecture of the DART Node.
3.4 DART Node Services
The DART Node exposes RESTful and Web
Services providing the following macro-
functionalities to potential remote clients:
Query Services: provide an interface for full
life-cycle management of queries submitted to the
search engine;
Semantic Services: by means of the Semantic
Module they provide operations to perform semantic
classifications of textual resources;
Event Services: provide access to
functionalities related to the DART event delivering
and notification system, still under development (see
Section 4.1).
Services are implemented using JAX-WS 2.1
and Java Servlet frameworks.
3.5 Mobile Devices Applications
The network of DART nodes is able to store and
asynchronously retrieve any kind of data, taking
advantage of systematic and redundant distribution
provided by DHT. RESTful services have been
designed to be accessible through mobile phones and
embedded systems. These devices have two key
roles: data provisioner and data provider. As data
provisioner, a mobile device performs queries on the
DART network, to retrieve data and display it to the
user. As data provider, a device collects data using
sensors and readers and stores such information to
be afterwards searched and retrieved by data
provisioner. Consumer appliances, like cellular
phones, act as provisioner; same devices and
embedded systems act as data providers (the latter
are intended to collect and automatically publish
data through the DART Node without human
actions).
Although mobile browsers have been enhanced
to access seamlessly standard web sites, they are not
suitable to perform asynchronous background
operations, access local peripherals and storage.
Asynchronous access to services is crucial to
improve user experience and to avoid continuous
network operations over expensive cellular
networks. At the same time, the ability to read data
coming from sensors and surrounding appliances
(such as RFID readers, GPS and accelerometers) is
mandatory to implement mobile data providers.
Mobile DART Node is a stand alone application
for Java ME enabled mobile phones that connects to
one or more DART Nodes, submits queries, checks
for results and retries them asynchronously,
basically adopting a pull mechanism.
Mobile DART Node does not replace embedded
browser: it runs as a bridge between the DART
network and the browser, caching and sorting
results, performing auto-updates on queries. Once
results have been collected, it provides a summary of
them: when the user selects a result, its URL is
passed to the web browser for rendering.
Mobile phones equipped with RFID reader and
GPS can run Mobile DART Node Data Provider
(DP) variant, which allows to publish data through
the DART Node REST interface. The Mobile DART
Node DP populates the DART network with
association about objects (identified by radio tags)
and places.
A DISTRIBUTED SOFTWARE ENVIRONMENT FOR COLLABORATIVE WEB COMPUTING
265
4 WORK IN PROGRESS
4.1 Distributed Event Delivery System
The Core Node, thanks to its architecture can be
used in a profitable way to build a distributed,
collaborative and failure-resistant Event Delivery
System. At the moment, a so described system, is
under design and development, adopting a
publish/subscribe model and involving mobile
sensor-equipped devices.
4.2 The Node in a Service Oriented
Architecture (SOA) Context
The service layer of the Core Node, in particular the
exposed Web Services, points out the chance of an
inclusion of the Node and its derived applications in
a SOA context.
Moreover, the distributed system itself can also set
up a redundant Web Service Registry (also including
semantic issues) which can be used for service
publication aims.
5 CONCLUSIONS
The Internet has evolved to a collaborative basis,
where information is collected from multiple sources
and assembled by users. Collaboration at
infrastructure level is still to come. The DART
Project aims to propose and realize a flexible,
distributed, collaborative, scalable, fault tolerant and
self-organized system for a semantic search engine.
The proposed software architecture realizes the
abstraction layer to DHT framework and exposes
storage and retrieval functionalities through SOAP
and REST web services. DART network is suitable
for both text documents, multimedia content and
environmental data coming from distributed sensors.
Mobile integration interfaces are core parts of basic
architecture and extended prototypes are being
developed and tested in real environment.
ACKNOWLEDGEMENTS
The architecture and the prototypes described in this
paper belongs to the DART - Distributed Agent-
based Retrieval Tools Project at CRS4, partially
funded by the Italian Ministry of University and
Scientific Research (Contract grant n. 11582).
REFERENCES
Angioni, M., Demontis, R., Deriu, M., De Vita, E., Lai,
C., Marcialis, I., Paddeu, G., Pintus, A., Piras, A.,
Sanna, R., Soro, A., Tuveri, F., 2007. A Collaborative,
Semantic and Context-Aware Search Engine. In Proc.
of ICEIS 2007 – 9th International Conference on
Enterprise Information Systems.
Angioni, M., Demontis, R., Deriu, M., De Vita, E., Lai,
C., Marcialis, I., Pintus, A., Piras, A., Soro, A., Tuveri,
F., 2007. DART: The Distributed Agent-Based
Retrieval Toolkit. In Proc. of 2007 WSEAS
International Conference on Computer Engineering
and Applications (CEA07).
Angioni, M., Demontis, R., Deriu, M., De Vita, E., Lai,
C., Marcialis, I., Pintus, A., Piras, A., Soro, A., Tuveri,
F., 2007. User Oriented Information Retrieval in a
Collaborative and Context Aware Search Engine. In
WSEAS Transactions on Computer Research Journal.
Rowstron, A., Druschel, P., 2001. Pastry: Scalable,
distributed object location and routing for large-scale
peer-to-peer systems. In Proc. of IFIP/ACM
International Conference on Distributed Systems
Platforms (Middleware).
Druschel, P., Rowstron, A., 2001. PAST: A large-scale,
persistent peer-to-peer storage utility. In Proc. of The
8th Workshop on Hot Topics in Operating Systems
(HotOS-VIII).
Angioni, M., Demontis, R., Tuveri, F., 2007. Enriching
WordNet to Index and Retrieve Semantic Information.
In Proc. of 2nd International Conference on Metadata
and Semantics Research.
Loo, B. T., Cooper, O., Krishnamurthy, S., 2004.
Distributed Web Crawling over DHTs. In UCB/CSD-
04-1305, EECS Department, University of California,
Berkeley.
Chung, C., Clarke, C. L. A., 2002. Topic Oriented
Collaborative Crawling. In Proc. of CIKM’02,
Conference on Information and Knowledge
Management.
Balakrishnan, H., Kaashoek, M.F., Karger, D., Morris, R.,
Stoica, I., 2003. Looking up data in P2P systems. In
Communications of the ACM, February 2003.
ICEIS 2008 - International Conference on Enterprise Information Systems
266