tion, renaming, deletion, etc. It relies on the usage
of custom-made, or customized authoring tools, i.e.,
referential integrity is not transparently provided to
the user or developer. It allows little parallelism and
admits manual re-conciliation of data. So, it is more
addressed at helping web developers than to preserve
referential integrity on a wide scale basis. It does not
try to reclaim storage space occupied by useless, i.e.,
unreachable documents.
Hyper-G (Kappe, 1995) is a ”second-generation”
hyper-media system that aims to correct web defi-
ciencies and provide a rich set of new services and
features. With regard to referential integrity, it is en-
forced using of a propagation algorithm that is de-
scribed as scalable. Hyper-G is proposed as an alter-
native to the WWW. Our system is integrated within
and mostly transparent to the current WWW architec-
ture.
The W3Objects (Ingham et al., 1996) approach is
also based on the application of a distributed garbage
collector to the world wide web. It also imposes a new
model extending the WWW based on objects. There-
fore it also lacks transparency. It is not complete, i.e.,
it is not able to reclaim distributed cycles of garbage.
In (Moreau and Gray, 1998) a community of agents
is used to maintain link integrity on the web. As
in our work, they do not attempt to replace the web
but extend it with new behavior. Agents cooperate to
provide versioning of documents and maintain links
according to a distributed garbage collector. Each
site manages tables documenting import and export
of documents. It does not address the semantic issues
raised when preserving dynamic content.
Thus, existing solutions to referential integrity ei-
ther do not aim at recycling unreachable documents
or are not correct concerning dynamic content or are
not integrated with the standard web.
7 CONCLUSIONS AND FUTURE
WORK
In this paper we presented a new way of enforcing ref-
erential integrity in the WWW, including dynamically
generated web content. The fundamental aspects of
the system are the following: i) it prevents storage
waste and memory leaks, deleting any resources no
longer reachable, namely, addressing the collection
of distributed cycles of unreachable web content; ii)
it is safe concerning dynamically generated web con-
tent; iii) it does not require the use of any specific
authoring tools; iv) it integrates with the hierarchical
structure of today’s web-proxies and caching; v) it is
mostly transparent to user browsers and web servers.
Concerning future research directions, we intend
to address further the fault-tolerance of our system,
i.e., which design decisions must be taken so that it
can remain safe, live and complete in spite of process
crashes and permanent communication failures.
REFERENCES
Bergman, M. K. (2001). The deep web: Surfacing hidden
value. The Journal of Electronic Publishing, 7(1).
Bush, V. (1945). As we may think. The Atlantic Monthly,
(July).
Chiang, C.-Y., Liu, M. T., and Muller, M. E. (1999).
Caching neighborhood protocol: a foundation for
building dynamic web caching hierarchies with proxy
servers. In International Conference on Parallel Pro-
cessing.
Creech, M. L. (1996). Author-oriented link management.
In Fifth International WWW Conference, France.
HostPulse (2002). Broken-link checker,
www.hostpulse.com.
Ingham, D., Caughey, S., and Little, M. (1996). Fix-
ing the “Broken-Link’’ problem: the W3Objects ap-
proach. Computer Networks and ISDN Systems, 28(7–
11):1255–1268.
Kappe, F. (1995). A Scalable Architecture for Maintaining
Referential Integrity in Distributed Information Sys-
tems. Journal of Universal Computer Science, 1(2).
Lamport, L. (1978). Time, clocks, and the ordering of
events in a distributed system. Communications of the
ACM, 21(7):558–565.
Lawrence, S., Pennock, D. M., Flake, G. W., Krovetz, R.,
Coetzee, F. M., Glover, E., Nielsen, F. A., Kruger, A.,
and Giles, C. L. (2001). Persistence of web references
in scientific research. IEEE Computer, vol 34(2).
LinkAlarm (1998). Linkalarm, http://www.linkalarm.com/.
Moreau, L. and Gray, N. (1998). A community of agents
maintaining link integrity in the world wide web. In
Proceedings of the 3rd International Conference on
the Practical Applications of Agents and Multi-Agent
Systems (PAAM-98), London, UK.
Nelson, T. H. (1972). As we will think. In On-line 72 Con-
ference.
O’Neill, E. T., Lavoie, B. F., and Bennett, R. (2003). Trends
in the evolution of the public web 1998 - 2002. D-Lib
Magazine, 9(4).
Reich, V. and Rosenthal, D. (2001). Lockss: A permanent
web publishing and access system. D-Lib M’zine, 7.
Richer, N. and Shapiro, M. (2000). The memory behavior
of the WWW, or the WWW considered as a persistent
store. In POS 2000, pages 161–176.
Shapiro, M., Dickman, P., and Plainfoss
´
e, D. (1992). Ro-
bust, dist. references and acyclic garbage collection.
In Symp. on Principles of Dist. Computing, pages
135–146, Vancouver (Canada). ACM.
Swaminathan, N. and Raghavan, S. (2000). Intelligent
prefetch in www using client behavior characteriza-
tion. In 8th International Symposium on Modeling,
Analysis and Simulation of Computer and Telecom-
munication Systems, pages 13–19.
Veiga, L. and Ferreira, P. (2003). Complete distributed
garbage collection, an experience with rotor. In IEE
Research Journals - Software, number 150(5).
XenuLink (1997). Linksleuth http://home.snafu.de/tilman/.
TURNING THE WEB INTO AN EFFECTIVE KNOWLEDGE REPOSITORY
161