THE TOP-TEN WIKIPEDIAS - A Quantitative Analysis Using WikiXRay

Felipe Ortega, Jesus M. Gonzalez-Barahona, Gregorio Robles

2007

Abstract

In a few years, Wikipedia has become one of the information systems with more public (both producers and consumers) of the Internet. Its system and information architecture is relatively simple, but has proven to be capable of supporting the largest and more diverse community of collaborative authorship worldwide. In this paper, we analyze in detail this community, and the contents it is producing. Using a quantitative methodology based on the analysis of the public Wikipedia databases, we describe the main characteristics of the 10 largest language editions, and the authors that work in them. The methodology (which is almost completely automated) is generic enough to be used on the rest of the editions, providing a convenient framework to develop a complete quantitative analysis of the Wikipedia. Among other parameters, we study the evolution of the number of contributions and articles, their size, and the differences in contributions by different authors, inferring some relationships between contribution patterns and content. These relationships reflect (and in part, explain) the evolution of the different language editions so far, as well as their future trends.

References

  1. Amor, J. J., Gonzalez-Barahona, J. M., Robles, G., and Herraiz, I. (2005a). Measuring libre software using debian 3.1 (sarge) as a case study: preliminary results. In Upgrade Magazine.
  2. Amor, J. J., Robles, G., and Gonzalez-Barahona, J. M. (2005b). Measuring woody: The size of debian 3.0. In Technical Report. Grupo de Sistemas y Comunicaciones, Universidad Rey Juan Carlos. Madrid, Spain. Grupo de Sistemas y Comunicaciones, Universidad Rey Juan Carlos. Madrid, Spain.
  3. Buriol, L. S., Castillo, C., Donato, D., and Millozzi, S. (2006). Temporal evolution of the wikigraph. In Proceedings of the Web Intelligence Conference, Hong Kong. IEEE CS Press.
  4. Ghosh, R. A. and Prakash, V. V. (2000). The orbiten free software survey. In First Monday.
  5. Gigles, J. (2005). Internet encyclopedias go head to head. In Nature Magazine.
  6. Gini, C. (1936). On the measure of concentration with especial reference to income and wealth. In Cowless Comission.
  7. Godfrey, M. and Tu, Q. (2000). Evolution in open source software: A case study. In Proceedings of the International Conference on Software Maintenance (pp. 131- 142). San Jos, California.
  8. Gonzalez-Barahona, J. M., Ortuno-Perez, M., de-las HerasQuiros, P., Gonzalez, J. C., and Olivera, V. M. (2001). Counting potatoes: the size of debian 2.2. In Upgrade Magazine, II(6) (pp. 60-66).
  9. Gonzalez-Barahona, J. M., Robles, G., Ortuno-Perez, M., Rodero-Merino, L., Centeno-Gonzalez, J., MatellanOlivera, V., Castro-Barbero, E., and de-las HerasQuiros, P. (2004). Analyzing the anatomy of GNU/Linux distributions: methodology and case studies (Red Hat and Debian). Free/Open Software Development. Stefan Koch, editor, (pp. 27-58). Idea Group Publishing, Hershey, Pennsylvania, USA.
  10. Koch, S. and Schneider, G. (2002). Effort, cooperation and coordination in an open source software project: Gnome. In Information Systems Journal, 12(1) pp. 27-42.
  11. Lehman, M. M., Ramil, J. F., and Sandler, U. (1997). Metrics and laws of software evolution the nineties view. In METRICS 97: Proceedings of the 4th International Symposium on Software Metrics, page 20.
  12. Mockus, A., Fielding, R. T., and Herbsleb, J. D. (2002). Two case studies of open source software development: Apache and mozilla. In ACM Transactions on Software Engineering and Methodology, 11(3) (pp. 309-346).
  13. Raymond, E. S. (1998). The cathedral and the bazaar. In First Monday, 3(3).
  14. Robles, G. (2006). Empirical software engineering research on libre software: Data sources, methodologies and results. Doctoral Thesis. Universidad Rey Juan Carlos, Mostoles, Spain.
  15. Viegas, F. B., Wattengberg, M., and Dave, K. (2004). Studying cooperation and conflict between authors with history flow visualizations. In Proceedings of the SIGCHI conference on Human factors in computing systems, pp.575-582. Viena, Austria.
  16. Voss, J. (2005). Measuring wikipedia. In Proceedings of the 10th International Conference of the International Society for Scientometrics and Infometrics 2005, Stockholm.
Download


Paper Citation


in Harvard Style

Ortega F., M. Gonzalez-Barahona J. and Robles G. (2007). THE TOP-TEN WIKIPEDIAS - A Quantitative Analysis Using WikiXRay . In Proceedings of the Second International Conference on Software and Data Technologies - Volume 3: ICSOFT, ISBN 978-989-8111-07-4, pages 46-53. DOI: 10.5220/0001330100460053


in Bibtex Style

@conference{icsoft07,
author={Felipe Ortega and Jesus M. Gonzalez-Barahona and Gregorio Robles},
title={THE TOP-TEN WIKIPEDIAS - A Quantitative Analysis Using WikiXRay},
booktitle={Proceedings of the Second International Conference on Software and Data Technologies - Volume 3: ICSOFT,},
year={2007},
pages={46-53},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001330100460053},
isbn={978-989-8111-07-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Software and Data Technologies - Volume 3: ICSOFT,
TI - THE TOP-TEN WIKIPEDIAS - A Quantitative Analysis Using WikiXRay
SN - 978-989-8111-07-4
AU - Ortega F.
AU - M. Gonzalez-Barahona J.
AU - Robles G.
PY - 2007
SP - 46
EP - 53
DO - 10.5220/0001330100460053