extracted from Wikipedia that reflects recorded activ-
ity of its editors.
The goal is to experimentally study the depen-
dence between editors’ versatility as defined in Sec-
tion 2 and the quality of articles they co-edit. In the re-
ported experiments the quality of articles is modelled
based on the information available in the data. More
precisely, we utilise two kinds of information regard-
ing the articles’ quality: some articles are marked as
featured and, independently, some as good. We treat
this information as “gold-truth” in our experiments.
3.1 Data
The data covers sample of 2714 contributors to
German-language edition in 2013. We used the
Wikipedia API for retrieving the list of contributors
and their activity logs, and database dumps for the
page (article) list and category graph.
Considering the categories mentioned in the Sec-
tion 2, we utilise the fact that each Wikipedia arti-
cle can be mapped to one of the eight main content
categories: Art & Culture, Economy, History, Knowl-
edge, Religion, Society, Sport, Technology. Techni-
cally, the mapping to categories was computed so that
they were encountered by the algorithm traversing the
category graph using given article as a root node and
iterating over neighbors up to 1000 times. If the ar-
ticle was mapped to more than one category, contri-
bution size was split equally among them, so that we
could use valid totals after per-user aggregation.
3.2 Experimental Results
We analysed four groups of editors: N,G,F,GF that
denote editors who co-edited: none good nor featured
article, at least one good, at least one featured and at
least one article that is both good and featured, respec-
tively. Notice that the four groups represent a graded
“hierarchy” of high-quality editors, with the GF rep-
resenting the highest-quality editors in some way. For
each of the four groups we computed some statistics
concerning versatility measure V () (Equation 1), in-
cluding mean, median and quartiles. The results are
presented on Figure 1, where one can observe a no-
ticeable regularity that indicates clear positive con-
nection between editors versatility and the quality of
their work. More precisely, the aggregated versatil-
ity statistics for the groups N,G,F,FG are strictly in-
creasing.
Furthermore, we observed that the distribution of
user versatility has a negative skew (Figure 2), with
median value at 2.29 bits (out of 3-bit maximum).
Users co-authoring at least one featured article score
Figure 2: Distribution of Editor’s Versatility.
2.31 on mean versatility measure, compared to 2.00
of those who co-authored only non-featured articles.
3.3 Versatility, Quality and Productivity
We also computed for each editor, their productivity
defined as the total amount of text (in Bytes) commit-
ted to the articles they co-edited. We divided editors
into two groups: F (at least one co-edited featured
article) and X \ F and made scatterplots of versatil-
ity vs productivity for these two groups (see Figure
3). Again, one can notice that the authors of featured
articles are noticeably more versatile than others.
Since the results on Figure 3 might suggest that
versatility and productivity are somehow correlated,
we additionally repeated analogous (to that reported
in Section 3.2) experiment on comparison of article
quality and editors’ productivity (Figure 4).
Finally, since the results of this experiment also
seem to indicate some positive influcence of produc-
tivity on quality we finally decided to compare the in-
fluence of versatility and productivity on quality in a
more quantitative way. For this reason we built the
logistic model with versatility and productivity as ex-
planatory variables. Table 1 shows no significant role
Table 1: Explaining quality with logistic model.
Estimate Std. Error z value Pr(> |z|)
(Intercept) -3.566e+00 2.720e-01 -13.111 < 2e − 16∗
versatility 1.434e+00 1.214e-01 11.820 < 2e − 16∗
productivity 4.822e-07 6.017e-07 0.801 0.423
vers. * prod. 5.474e-07 2.865e-07 1.911 0.056
KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
428