INFORMATION UNIQUENESS IN WIKIPEDIA ARTICLES
Nikos Kirtsis, Sofia Stamou
1
, Paraskevi Tzekou and Nikos Zotos
Computer Engineering and Informatics Department, Patras University 26500 Greece
1
Department of Archives and Library Science, Ionian University, 49100 Greece
Keywords: Wikipedia, Information Uniqueness, Content Duplication, Quality Assessment.
Abstract: Wikipedia is one of the most successful worldwide collaborative efforts to put together user generated con-
tent in a meaningfully organized and intuitive manner. Currently, Wikipedia hosts millions of articles on a
variety of topics, supplied by thousands of contributors. A critical factor in Wikipedia’s success is its open
nature, which enables everyone edit, revise and /or question (via talk pages) the article contents. Consider-
ing the phenomenal growth of Wikipedia and the lack of a peer review process for its contents, it becomes
evident that both editors and administrators have difficulty in validating its quality on a systematic and co-
ordinated basis. This difficulty has motivated several research works on how to assess the quality of
Wikipedia articles. In this paper, we propose the exploitation of a novel indicator for the Wikipedia articles’
quality, namely information uniqueness. In this respect, we describe a method that captures the information
duplication across the article contents in an attempt to infer the amount of distinct information every article
communicates. Our approach relies on the intuition that an article offering unique information about its sub-
ject is of better quality compared to an article that discusses issues already addressed in several other
Wikipedia articles.
1 INTRODUCTION
Wikipedia is one of the most popular social media
web sites that allows users to create content in a col-
laborative manner. As of October 2009, the English
Wikipedia hosts over 3 million
1
articles and it keeps
growing as new material is being supplied by nu-
merous editors who work on a volunteer basis. The
open participation that Wikipedia offers to editors
has lead to its remarkable growth but at the same
time it has raised doubts about the quality of its con-
tents, considering that anyone can freely create new
or modify existing content.
In light of its success the question of how to
asess the Wikipedia articles’ quality is of paramount
importance. To that end, several methods have been
proposed for measuring Wikipedia’s quality (Stvilia,
et al., 2005a; Blumenstock, 2008a), most of which
rely on the examination of the articles’ internal char-
acteristics, such as their contextual elements (Stvilia,
et al., 2005b), their linkage in the Wikipedia graph
(Kamps and Koolen, 2009), their length (Blumen-
stock, 2008b), their factual accuracy (Giles, 2005),
1
http://en.wikipedia.org/wiki/Special:Statistics.
the formality of their language (Emigh and Herring,
2005) and many more.
Additionally, Wikipedia has launched detailed
editing and linking guidelines
2
for perspective edi-
tors and has also set precise specifications about
what should and what should not be part of an arti-
cle. Existing quality assessment methods combined
with Wikipedia editing instructions could be inte-
grated into a generalized validation mechanism that
would assist administrators detect articles that need
be repaired, enriched or modified to reach good
quality levels.
One issue that has not been explicitly addressed
in existing Wikipedia quality assessment techniques
is the uniqueness of information that Wikipedia arti-
cles communicate for their subjects. Until now, an
article’s quality is determined in isolation from other
article contents and as such it is error-prone to mis-
leading quality judgments. To tackle this, we pro-
pose a novel indicator of quality for the article con-
tents, namely information uniqueness. To capture
information uniqueness, we present a method that
quantifies the information shared across Wikipedia
2
http://en.wikipedia.org/wiki/Wikipedia:How_to_edit_a_ page.
137
Kirtsis N., Stamou S., Tzekou P. and Zotos N.
INFORMATION UNIQUENESS IN WIKIPEDIA ARTICLES.
DOI: 10.5220/0002841401370143
In Proceedings of the 6th International Conference on Web Information Systems and Technology (WEBIST 2010), page
ISBN: 978-989-674-025-2
Copyright
c
2010 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
entries and discriminates articles of unique informa-
tion about their discussed topics from articles that
duplicate information contained in other Wikipedia
entries of relevant topics. The innovation of our ap-
proach is that we assess an article’s quality in con-
text with other topically relevant articles, based on
the intuition that a qualitative article should discuss
new information that is not part of other articles,
rather than reproduce the contents of other entries.
In the course of our study, we make the follow-
ing contributions:
We quantify information uniqueness in the
contents of Wikipedia articles. In this respect, we
rely on articles pertaining to related topics, as these
are determined by Wikipedia editors, and we exam-
ine the degree to which the articles’ body is con-
tained (i.e., duplicated) in the body of other topi-
cally-relevant articles. Articles’ textual containment
is quantified based on the articles’ overlapping lexi-
cal and semantic elements. Our computations help
us infer the articles’ contextual completeness and
quality and may prove useful towards establishing
validation policies for the article contents.
We estimate information uniqueness between
Wikipedia articles and the external (non-wiki)
sources to which they point. In particular, we com-
pute the amount of new content external resources
add to the articles’ body. Our computations help us
infer the appropriateness of external links to com-
plementing their corresponding article contents and
may prove useful towards establishing the Wikipe-
dia external links' update policy.
The remainder of the paper is organized as fol-
lows. We start our discussion with an overview on
related work. In Section 3, we describe how infor-
mation uniqueness can serve as implicit indicator of
quality for the Wikipedia article contents. In Section
4, we present an experiment we carried out in which
we assessed the amount to which Wikipedia articles
communicate distinct and complete information
about their discussed subjects. Experimental results
shed light on the article features to be considered in
future Wikipedia quality assessment efforts. We
conclude the paper in Section 5.
2 RELATED WORK
There has been a large body of work on how to as-
sess the quality of Wikipedia articles. In this respect,
researchers have suggested a number of article fea-
tures that signify quality, such as their survival pe-
riod (Cross, 2006), the number and frequency of
their edits (Wilkinson and Huberman, 2007), their
revision history (Adler and de Alfaro, 2007), the
amount of outbound citations to 'trusted' material,
e.g. scientific publications (Nielsen, 2007), the dedi-
cation of their editors (Riehle, 2005) and so forth. A
complete analysis of the Wikipedia features is pro-
vided in (Voss, 2005).
Besides studying the article contextual elements
that signal quality, researchers have also explored
the properties and evolution of the Wikipedia link
graph for assessing the impact of global and local
link topology structure on information retrieval tasks
(Buriol et al., 2006; Koolen and Kamps, 2009). The
above studies reveal two interesting trends regarding
Wikipedia’s link structure: first that Wikipedia links
are good indicators of relevance for the article topics
and second that there is a strong correlation between
the link degree and the document length in the
Wikipedia collection.
Although our work relates to the above studies
that investigate the Wikipedia article contents and
links for inferring their quality, it is different in two
significant ways. First, unlike previous works that
estimate the quality of every article individually, we
study the quality of an article in context with other
Wikipedia articles of relevant topics. Moreover, in
contrast to existing works that confine their investi-
gations to the articles’ contextual elements, we also
examine the impact of external links to their corre-
sponding articles’ quality. In our investigations, we
estimate the information uniqueness in the articles’
referential resources, in order to assess the value
external links add to the article contents. Overall, we
believe that our study complements existing works
on measuring the quality of Wikipedia articles.
3 INFORMATION UNIQUENESS
In this section, we present our approach for captur-
ing the contextual and referential uniqueness across
topically-related Wikipedia articles. We begin our
discussion by justifying the motivation for our study
and we continue with the description of the methods
we propose for measuring information distinctive-
ness in Wikipedia articles.
3.1 Motivation
Wikipedia is the largest free-content online encyclo-
paedia. Although there are no specialized qualifica-
tions set forth in order for someone to become edi-
tor, there exist precise editing instructions with re-
spect to what should or could be part of an article,
WEBIST 2010 - 6th International Conference on Web Information Systems and Technologies
138
how to organize the article contents and how to pro-
vide links from Wikipedia articles to external (out-
side Wikipedia) resources. The quest in establishing
structural and contextual editing guidelines is to help
Wikipedia contributors supply content that is reli-
able, neutral, verifiable via cited sources, complete
and useful for the covered subjects. Above all, it is
critical that articles communicate encyclopaedic
knowledge that would distinguish Wikipedia as an
encyclopaedia from other information sources.
However, Wikipedia’s open nature, lacking co-
ordinated editing, is susceptible to transforming its
hosting articles from information sources to infor-
mation loops. This is mainly because editors might
rely on common resources for editing articles on
similar topics, or they might link articles to external
resources that have borrowed content from Wikipe-
dia. In the former situation, different articles might
contain identical textual fragments, which if re-
produced across several articles might result in se-
vere content duplication, which in turn would en-
gage users to reading the same text for different top-
ics; thus entering in information loops. Considering
the findings of (Buriol et al., 2006) that topically-
relevant Wikipedia articles are densely linked, the
above scenario is likely to occur as Wikipedia con-
tent and links proliferate. In the second situation, an
article might reproduce verbatim the body of its
linked external sources and vice versa. In this case,
readers visiting refereed material for acquiring sup-
plemental information for the article topics would
end up re-reading the same text (or pieces of it).
Evidently, both situations hinder users from ob-
taining unique information in the contents of differ-
ent articles and harm the overall Wikipedia quality.
Driven by the desire to capture information unique-
ness across Wikipedia articles, we carried out the
present study. The aim of our work is to explore the
information loops in Wikipedia articles in order to
deduce the amount of unique information in their
contents. We believe that the findings of our study
will give useful insights regarding the articles’ qual-
ity and will assist Wikipedia administrators deter-
mine effective article revision and maintenance poli-
cies. In the following paragraphs, we discuss the
details of our approach for quantifying information
uniqueness in Wikipedia articles.
3.2 Information Uniqueness in Article
Contents
To capture information uniqueness in the contents of
Wikipedia articles, we rely on articles dealing with
related topics and we compute the degree to which
different articles duplicate the same informational
extracts in their body. Our speculation is that the
degree of the articles’ information uniqueness is
conversely analogous to the degree of their content
duplication, in the sense that the more content two
articles share in common the less unique information
they communicate.
The method we employ for estimating informa-
tion uniqueness in the Wikipedia collection operates
upon topically-related articles. The reason for focus-
ing on topically related articles is because informa-
tion duplication is mainly pronounced for documents
dealing with common or similar subjects (Davison,
2000). Therefore, trying to capture content duplica-
tion across the entire Wikipedia collection would
significantly increase the overhead and the computa-
tional complexity of our measurements without add-
ing much value to the delivered results.
To identify topically-related articles, we rely on
the Wikipedia categories to which every article has
been assigned by its editors and we deem articles to
be topically-related if they share at least one com-
mon category. By deducing the articles’ topical re-
latedness based on their assigned categories, we en-
sure both consistency and accuracy in their topical
descriptions as the latter are collectively supplied by
humans and we obviate the need to re-categorize the
articles.
Based on the article categories, we organize them
into topical clusters and we process the documents
in every cluster in order to identify duplicate content
in their body. Having organized Wikipedia articles
into topical clusters, we download the contents of
every cluster, we parse them to remove mark-up and
apply tokenization to identify word tokens in the
articles’ textual body. Based on the tokenized arti-
cles in every topic, we estimate the articles’ lexical
and semantic elements duplication from which we
subsequently derive the amount of the articles’ in-
formation uniqueness.
To identify lexical content duplication across the
articles’ body, we estimate Containment within the
articles’ text. For our measurements, we firstly lexi-
cally analyze the articles’ textual body into canoni-
cal sequences of tokens and represent them as con-
tiguous sub-sequences of w tokens, called shingles
and computed via the w-shingling technique (Broder
et al., 1997). Then, we eliminate identical shingles
from every article and we compute the containment
of an article’s text in the body of the remaining arti-
cles clustered in the same topic. Containment be-
tween article pairs is determined as the ratio of an
article’s shingles that are contained in the shingles of
another article, given by:
Where S(a
i
) denotes the shingles of article a
i
and
S(a
j
) denotes the shingles of article a
j
. Containment
INFORMATION UNIQUENESS IN WIKIPEDIA ARTICLES
139
values range between 0 and 1; with 0 indicating that
the two articles share no text in common and 1 indi-
cating that the textual body of an article in contained
(i.e., appears identical) in the body of another article.
Based on the above, we compute the amount of text
in every article that is contained in the body of other
articles, as given by:
()
()
()
()
,
ij
ij
i
Sa Sa
Containment a a
Sa
=
I
(1)
()
()
()
1
, ,..., ,
N
iijin
aN
i
Containment a Containment a a a a
N
=
(2)
Where N indicates the number of articles considered.
Eventually, we derive the lexical uniqueness of an
article based on the amount of the article shingles
that are not contained in other articles, i.e., they are
unique among the shingles generated for the articles
in a topic. Formally, the lexical uniqueness LU of an
article is quantified as:
() ()
1
ii
L
U a Containment a=−
(3)
Based on the above, we compute uniqueness in the
articles’ lexical contents, so that the less text of an
article is contained in the body of other articles the
increased its lexical uniqueness. However, lexical
uniqueness alone does not suffice for capturing in-
formation uniqueness given that articles might use
distinct surface terms to verbalize the same informa-
tion. To tackle such cases, we need to capture the
semantic uniqueness in the article contents. In other
words, we need to estimate the degree to which dif-
ferent articles contain semantically equivalent ele-
ments in their body and then infer their semantic
uniqueness by relying on the amount of their un-
shared semantic information.
To estimate the semantic uniqueness of an arti-
cle’s content, we firstly need to annotate every word
token of an article with a suitable sense. For assign-
ing sense labels to the articles’ words, we utilize
WordNet (Fellbaum, 1998) as our sense repository
and operate as follows. We map all words of an arti-
cle to their corresponding WordNet synsets. Word
tokens matching a single synset are annotated with
that synset’s gloss (i.e., sense), whereas words
matching several synsets are labelled with the sense
that exhibits the maximum average similarity to the
senses identified for the remaining article words. In
our work, we estimate semantic similarity based on
the Wu and Palmer (1994) metric. Having annotated
every word of an article with an appropriate sense,
we quantify the semantic equivalence between pairs
of articles as the fraction of senses they share in
common, i.e.:
()
(
)
(
)
()
()
,
ij
ij
ij
Senses a Senses a
Equivalence a a
Senses a Senses a
=
I
U
(4)
Where Senses (a
i
) denotes the senses assigned to
the word tokens of article a
i
and Senses (a
j
) denotes
the senses assigned to the word tokens of article a
j
.
Equivalence values range between 0 and 1; with 0
indicating that the semantic content of the two arti-
cles is different and 1 indicating that the content of
the two articles is semantically equivalent. Based on
the above, we compute for every article the amount
of semantic content its shares with other articles as:
()
()
()
1
, ,..., ,
N
iijin
aN
i
E
quivalence a Equivalence a a a a
N
=
(5)
Then, we estimate for every article the semantic
uniqueness of its contents, based on the amount of
the article semantics that are not shared in other arti-
cles. Formally, the semantic uniqueness SU of an
article is quantified as:
(
)()
1
ii
SU a Equivalence a=−
(6)
Finally, we estimate the article’s overall informa-
tion uniqueness (IU) with respect to other topically-
related articles based on the combination of the arti-
cle’s lexical and semantic uniqueness, given by:
(
)
(
)
(
)
()
()
()
1
ii i
I
Ua a LUa aSUa=• +
(7)
Where a is a weighting factor that determines the
impact of the article’s lexical and semantic elements
on its overall information uniqueness. Setting the
value of a= 0.5 would weight the article’s lexical
and semantic properties equally, whereas lowering a
i.e., a < 0.5 would promote semantic uniqueness as
an indicator of the article’s content distinctiveness
and increasing a i.e., a > 0.5 would promote lexical
uniqueness as an indicator of the article’s content
distinctiveness. In any case, the weighting factor
ensures that information uniqueness values are nor-
malized and range between 0 and 1; with 0 indicat-
ing total information duplication (lack of unique
content) and 1 indicating total information unique-
ness (lack of duplicate content).
Based on the above, we measure how distinct is the
information that an article communicates for its un-
derlying topic compared to the information con-
tained in the body of other articles dealing with the
WEBIST 2010 - 6th International Conference on Web Information Systems and Technologies
140
same topic. Recall that, decreased levels of informa-
tion uniqueness increase the probability of encoun-
tering information loops in the Wikipedia collection.
Therefore, the more unique information Wikipedia
articles contain the better quality Wikipedia bears.
3.3 Information Uniqueness in Article
Referential Sources
Besides capturing information uniqueness across the
Wikipedia articles, we also estimate how unique is
the information contained in every article compared
to the information contained in its referential exter-
nal resources. Considering that Wikipedia articles
provide links (refer the reader) to non-wiki sources,
the information of which should be relevant and
complementary to the article contents, it is useful to
assess whether the information contained in the arti-
cle references is part of the article contents or not. In
the former case, reading the contents of the article
references would enter the user to information loops
and would diminish the references’ added value for
the article contents. In the latter case, reading the
contents of the article references would point the
user to new (not part of the article) information and
would empower the references’ positive contribution
towards complementing the article contents.
To estimate how distinct is the information of an
article with respect to the contents of its referenced
sources, we start by downloading the textual body of
the article’s external links, which we then parse and
tokenize. Afterwards, we merge together the token-
ized contents of all the pages pointed by an article to
formulate a super-document that we lexically ana-
lyse and represent into shingles. Having represented
both the article contents and the contents of all the
documents pointed by the article as shingles, we
compute their lexical overlap by applying the Con-
tainment measure (eqn.1). Based on the amount of
overlapping lexical content between an article and
its referential sources, we deduce the article’s Lexi-
cal Uniqueness (eqn.3). In addition, by semantically
annotating the words contained in an article’s refer-
ential sources we can estimate the fraction of senses
shared between the article and its referenced sources
and deduce their semantic Equivalence (eqn.5) upon
which we rely for quantifying the article’s Semantic
Uniqueness (eqn.6) with respect to its external re-
sources. Finally, we combine the two measurements
and compute the article’s Information Uniqueness
(eqn.7) compared to the contents of its referential
sources.
4 EXPERIMENTS
To capture information uniqueness in the Wikipedia
collection, we relied on the October 2009 dump of
the English Wikipedia, a 24.5 GB xml corpus, which
we processed in order to extract the articles it con-
tained and for every article extract its associated
Wikipedia categories and the URLs of its external
resources. Having processed the Wikipedia corpus,
we sampled 100,000 random articles associated with
common Wikipedia categories and used them as the
dataset against which we would apply our informa-
tion uniqueness measurements.
Based on the identified article categories, we or-
ganized the sampled articles into clusters, with every
cluster grouping together articles of a common cate-
gory. Then, we processed the contents of every arti-
cle in each cluster to represent the articles’ textual
elements as shingles, i.e., contiguous sequences of w
tokens (with w= 10). Based on the comparative lexi-
cal and semantic analysis of the article shingles, we
quantified the information uniqueness of every arti-
cle with respect to other articles organized in the
same cluster. Results are presented in Section 4.1.
For our sampled articles we also downloaded the
contents of their external links, which we processed
as previously described. Then, we comparative ana-
lyzed the contents of every article and its corre-
sponding external resources in order to quantify the
amount of distinct information the article contains
compared to the information contained in its referen-
tial sources. Obtained results are discussed in Sec-
tion 4.2.
4.1 Unique Information in Wikipedia
Figure 1 plots the information uniqueness in the ex-
amined Wikipedia articles. The x-axis illustrates the
amount of articles and the y-axis shows the amount
of unique information in the articles’ body, esti-
mated by setting the value of a = 0.5.
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
500 10000 19500 29000 38500 48000 57500
Number of articles
Information Uniqueness
Figure 1: Unique information across sampled articles.
INFORMATION UNIQUENESS IN WIKIPEDIA ARTICLES
141
As the figure shows, most of the examined articles
contain unique information, i.e., their textual and
semantic elements are not duplicated across other
articles. According to results, 72.8% on average the
information in the body of all examined articles is
unique and 27.2% of the information is duplicated in
the body of other articles.
Although not schematically illustrated due to
space constrains, a closer look at obtained results
reveals that on average 88.55% of the articles’ lexi-
cal elements are unique, i.e., they are not contained
in the body of other articles and 74.51% of the arti-
cles’ semantic content is unique. Results suggest that
the information contained in most of the examined
articles is unique and as such as may deduce the
good overall quality of the Wikipedia corpus.
4.2 New Information brought by
Wikipedia
Figure 2 plots the information uniqueness between
the examined Wikipedia articles and the body of
their respective linked-to external resources. The x-
axis shows the amount of articles and the y-axis
shows the amount of unique information the articles
contain compared to the contents of their external
resources.
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
100 4000 7900 11800 15700 19600 23500
Number of articles
Information Uniqueness
Figure 2: Unique information between the sampled articles
and the contents of their external resources.
As the figure shows the information contained in
most Wikipedia articles differs from the information
delivered in their respective external resources. Re-
sults show that on average 89.78% of the informa-
tion in all examined articles is unique compared to
the information contained in the body of their linked
non-wiki sources. Results suggest that the informa-
tion offered to Wikipedia readers via external links
adds value to the article contents as it mainly con-
cerns new information that is not part of the article.
Overall, we may conclude that the vast majority
of Wikipedia articles comply with the requirement
that they should provide useful content that can be
verified via cited external resources.
5 CONCLUDING REMARKS
In this paper, we have proposed the exploitation of
information uniqueness as a novel indicator of qual-
ity for the Wikipedia articles. The application of our
approach to a subset of the Wikipedia corpus re-
vealed that most articles contain unique information
about their subjects, thus signifying the overall good
quality of Wikipedia. Our measurements, especially
those indicating articles of high information duplica-
tion, could help Wikipedia administrators set better
linking instructions between distinct articles so that
editors provide links between articles instead of re-
peating their contents. Moreover, our findings could
assist Wikipedia administrators implement contex-
tual filtering mechanisms that would enable editors
determine whether external links contain supplemen-
tal or superfluous information for the article con-
tents. In the future, we plan to extend our prelimi-
nary study and capture information uniqueness
across the entire Wikipedia collection in order to
quantify its overall quality.
REFERENCES
Adler, N.T., de Alfaro, L., 2007. A content-driven reputa-
tion system for the Wikipedia. In Proceedings of the
16
th
International World Wide Web Conference.
Blumenstock, J.E, 2008a. Automatically assessing the
quality of Wikipedia articles. UBCiSchool Report.
2008-021
Blumenstock, J.E, 2008b. Size matters: word count as a
measure of quality in Wikipedia. In Proceedings of the
17
th
Intl. WWW Conference.
Buriol, J., Castillo, C., Donato, D., Leonardi, S., Millozzi,
S., 2006. Temporal evolution of the wiki graph. In
Proceedings of the Web Intelligence Conference.
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.
1997. Syntactic clustering of the web. In Proceedings
of the 6
th
Intl. WWW Conference, pp. 391-404
Cross, T., 2006. Puppy smoothies: improving the reliabil-
ity of open, collaborative wikis. First Monday, 11 (9).
Davison, D. 2000. Topical locality on the web. In Pro-
ceedings of the 23
rd
Intl. SIGIR Conference.
Emigh, W., Herring, S., 2005. Collaborative authoring on
the web: a genre analysis of online enclyclopedias. In
Proceedings of the HICSS Conference.
Fellbaum, Ch. 1998. WordNet: An Electronic Lexical Da-
tabase. MIT Press.
Giles, J., 2005. Internet encyclopaedias go head to head. In
Nature, 438:900-901.
WEBIST 2010 - 6th International Conference on Web Information Systems and Technologies
142
Kamps, J., Koolen, M., 2009. Is Wikipedia ;link structure
different? In Proceedings of the 2
nd
Intl. WSDM Con-
ference.
Koolen, M., Kamps, J., 2009. What’s in a link? From
document importance to topical relevance. In Proceed-
ings of the 2
nd
International Conference on Theory of
Information Retrieval, pp. 313-321.
Nielsen, F.A., 2007. Scientific Citations in Wikipedia. In
Computing Research Repository.
Rieche, D., 2005. How and why Wikipedia works? an
interview. In Proceedings of the ACM Wikisym.
Stvilia, B., Twidale, M.B., Smith, L.C., Gasser, L. 2005a.
Assessing information quality of a community-based
encyclopaedia. In Proceedings of the International
Conference on Information Quality.
Stvilia, B., Twidale, M.B., Gasser, L., Smith, L.C.. 2005b.
Information quality discussions in Wikipedia. In Pro-
ceedings of the International Conference on Knowl-
edge Management.
Voss, J., 2005. Measuring Wikipedia. In Proceedings of
the 10
th
International Conference of the International
Society for Scientometrics and Infometrics.
Wilkinson, D.M., Huberman, B.A., 2007. Assessing the
value of cooperation in Wikipedia. First Monday,
12(4).
Wu, Z., Palmer, M. 1994. Verb semantics and lexical se-
lection. In Proceedings of the 32
nd
ACL Conference,
pp. 133-138.
INFORMATION UNIQUENESS IN WIKIPEDIA ARTICLES
143