7 CONCLUSIONS
We have presented the design of a decentralized
system for hosting large scale wiki web sites like
Wikipedia using a collaborative approach. Publicly
available statistics from Wikipedia show that central-
ized page management is the source of its scalability
issues. Our system decentralizes this functionality by
distributing the pages across a network of computers
provided by individuals and organizations willing to
contribute their resources to help hosting the wiki site.
This is done by finding a placement of pages in which
the capacity of the nodes is not exceeded and the load
is balanced, and subsequently routing page requests
to the appropriate node.
To solve the page placement problem we use a
gossiping protocol to construct an overlay network
that resembles a random graph. This provides each
node with a sample of peers with which to commu-
nicate. On top of this overlay, we try to balance the
load on the nodes by executing an optimization al-
gorithm that moves pages trying to minimize a cost
function that measures the quality of the page place-
ment. Routing requests to the nodes hosting the pages
is done by implementing a Distributed Hash Table
with the participating nodes. We further refine the
system by replicating pages such that failures can be
tolerated.
We also outline our security strategy, which is
based on the use of a central certification authority
and digital signatures for all operations in the system
to protect the system from attacks performed by un-
trusted nodes acting maliciously.
In the future, we plan to evaluate our architecture
by performing simulations using real-world traces.
We will also carry out a thorough study of its security
aspects and work on the issue of providing incentives
to motivate potential collaborators to participate.
REFERENCES
Akamai Technologies (2006). http://www.akamai.
com.
Alexa Internet (2006). Alexa web search - top
500. http://www.alexa.com/site/ds/
top_sites?ts_mode=global.
Anderson, D. P., Cobb, J., Korpela, E., Lebofsky, M.,
and Werthimer, D. (2002). SETI@home: an experi-
ment in public-resource computing. Commun. ACM,
45(11):56–61.
Castro, M., Druschel, P., Ganesh, A., Rowstron, A., and
Wallach, D. S. (2002). Secure routing for structured
peer-to-peer overlay networks. SIGOPS Oper. Syst.
Rev., 36(SI):299–314.
Cholvi, V., Felber, P., and Biersack, E. (2004). Efficient
search in unstructured peer-to-peer networks. In Proc.
SPAA Symposium, pages 271–272.
Freedman, M. J., Freudenthal, E., and Mazires, D. (2004).
Democratizing content publication with Coral. In
Proc. NSDI Conf.
Jelasity, M., Montresor, A., and Babaoglu, O. (2003). To-
wards secure epidemics: Detection and removal of
malicious peers in epidemic-style protocols. Techni-
cal Report UBLCS-2003-14, University of Bologna,
Bologna, Italy.
Leuf, B. and Cunningham, W. (2001). The Wiki Way: Col-
laboration and Sharing on the Internet. Addison-
Wesley Professional.
Lv, Q., Cao, P., Cohen, E., Li, K., and Shenker, S. (2002).
Search and replication in unstructured peer-to-peer
networks. In Proc. Intl. Conf. on Supercomputing,
pages 84–95.
Markoff, J. and Hansell, S. (2006). Hid-
ing in plain sight, Google seeks more
power. New York Times. http://www.
nytimes.com/2006/06/14/technology/
14search.html?pagewanted=1&ei=
5088&en=c96a72bbc5f90a47&ex=
1307937600&partner=rssnyt&emc=rss.
O’Hanlon, C. (2006). A conversation with Werner Vogels.
Queue, 4(4):14–22.
Petersen, K., Spreitzer, M., Terry, D., Theimer, M., and
Demers, A. (1997). Flexible update propagation for
weakly consistent replication. In Proc. SOSP Conf.
Pierre, G. and van Steen, M. (2006). Globule: a collabora-
tive content delivery network. IEEE Communications
Magazine, 44(8):127–133.
Popescu, B. C., van Steen, M., Crispo, B., Tanenbaum,
A. S., Sacha, J., and Kuz, I. (2005). Securely repli-
cated Web documents. In Proc. IPDPS Conf.
Ratnasamy, S., Francis, P., Handley, M., Karp, R., and
Schenker, S. (2001). A scalable content-addressable
network. In Proc. SIGCOMM Conf., pages 161–172.
Rowstron, A. I. T. and Druschel, P. (2001). Pastry: Scal-
able, decentralized object location, and routing for
large-scale peer-to-peer systems. In Proc. Middleware
Conf., pages 329–350.
Stoica, I., Morris, R., Liben-Nowell, D., Karger, D. R.,
Kaashoek, M. F., Dabek, F., and Balakrishnan, H.
(2003). Chord: a scalable peer-to-peer lookup proto-
col for internet applications. IEEE/ACM Trans. Netw.,
11(1):17–32.
Voulgaris, S., Gavidia, D., and Steen, M. (2005). CY-
CLON: Inexpensive membership management for un-
structured P2P overlays. Journal of Network and Sys-
tems Management, 13(2):197–217.
Wang, L., Park, K., Pang, R., Pai, V. S., and Peterson, L. L.
(2004). Reliability and security in the CoDeeN con-
tent distribution network. In Proc. USENIX Technical
Conf., pages 171–184.
Wikipedia (2006). Wikipedia, the free encyclope-
dia. http://en.wikipedia.org/w/index.
php?title=Wikipedia.
A DECENTRALIZED WIKI ENGINE FOR COLLABORATIVE WIKIPEDIA HOSTING
163