SCHEMA EVOLUTION IN WIKIPEDIA - Toward a Web Information System Benchmark

Carlo A. Curino, Hyun J. Moon, Letizia Tanca, Carlo Zaniolo

2008

Abstract

Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an in-depth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known example of a large family of web information systems built using the open-source software MediaWiki. Our study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis suggests the need for developing better methods and tools to support graceful schema evolution. Therefore, we briefly discuss documentation and automation support systems for database evolution, and suggest that the Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.

References

  1. Almeida, R. B., Mozafari, B., and Cho, J. (2007). On the evolution of wikipedia. In Int. Conf. on Weblogs and Social Media.
  2. Bernstein, P. A. (2003). Applying model management to classical meta data problems. In CIDR.
  3. Bernstein, P. A., Green, T. J., Melnik, S., and Nash, A. (2006). Implementing mapping composition. In VLDB.
  4. Curino, C., Moon, H., Tanca, L., and Zaniolo, C. (2008a). Pantha rei data set [online]: http://yellowstone.cs.ucla.edu/schemaevolution/index.php/Main Page.
  5. Curino, C. A., Moon, H. J., and Zaniolo, C. (2008b). Graceful database schema evolution: the prism workbench. In UCLA Tech. Rep., 2008. Submitted for publication.
  6. Franconi, E., Grandi, F., and Mandreoli, F. (2001). Schema evolution and versioning: A logical and computational characterisation. Database Schema Evolution and Meta-Modeling, pages 85-99.
  7. Galante, R. d. M., dos Santos, C. S., Edelweiss, N., and Moreira, A. F. (2005). Temporal and versioning model for schema evolution in object-oriented databases. Data & Knowledge Engineering, 53(2):99-128.
  8. Marche, S. (1993). Measuring the stability of data models.European Journal of Information Systems, 2(1):37-47.
  9. Moon, H. J., Curino, C. A., Deutsch, A., Hou, C.-Y., and Zaniolo, C. (2008). Managing and querying transaction-time databases under schema evolution. In UCLA Tech. Rep., 2008. Submitted for publication.
  10. Moro, M. M., Malaika, S., and Lim, L. (2007). Preserving XML Queries during Schema Evolution. In WWW, pages 1341-1342.
  11. Ram, S. and Shankaranarayanan, G. (2003). Research issues in database schema evolution: the road not taken. In Boston University School of Management, Department of Information Systems, Working Paper No: 2003-15.
  12. Rizzi, S. and Golfarelli, M. (2007). X-time: Schema versioning and cross-version querying in data warehouses. In ICDE, pages 1471-1472.
  13. Roddick, J. (1995). A Survey of Schema Versioning Issues for Database Systems. Information and Software Technology, 37(7):383-393.
  14. Shneiderman, B. and Thomas, G. (1982). An architecture for automatic relational database system conversion. ACM Transactions on Database Systems, 7(2):235- 257.
  15. Sjoberg, D. I. (1993). Quantifying schema evolution. Information and Software Technology, 35(1):35-44.
  16. Velegrakis, Y., Miller, R. J., and Popa, L. (2003). Mapping adaptation under evolving schemas. In VLDB.
Download


Paper Citation


in Harvard Style

A. Curino C., J. Moon H., Tanca L. and Zaniolo C. (2008). SCHEMA EVOLUTION IN WIKIPEDIA - Toward a Web Information System Benchmark . In Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8111-36-4, pages 323-332. DOI: 10.5220/0001713003230332


in Bibtex Style

@conference{iceis08,
author={Carlo A. Curino and Hyun J. Moon and Letizia Tanca and Carlo Zaniolo},
title={SCHEMA EVOLUTION IN WIKIPEDIA - Toward a Web Information System Benchmark},
booktitle={Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2008},
pages={323-332},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001713003230332},
isbn={978-989-8111-36-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - SCHEMA EVOLUTION IN WIKIPEDIA - Toward a Web Information System Benchmark
SN - 978-989-8111-36-4
AU - A. Curino C.
AU - J. Moon H.
AU - Tanca L.
AU - Zaniolo C.
PY - 2008
SP - 323
EP - 332
DO - 10.5220/0001713003230332