Scalable Versioning for Key-Value Stores

Martin Haeusler

2016

Abstract

Versioning of database content is rapidly gaining importance in modern applications, due to the need for reliable auditing, data history analysis, or due to the fact that temporal information is inherent to the problem domain. Data volume and complexity also increase, demanding a high level of scalability. However, implementations are rarely found in practice. Existing solutions treat versioning as an add-on instead of a first-class citizen, and therefore fail to take full advantage of its benefits. Often, there is also a trade-off between performance and the age of an entry, with newer entries being considerably faster to retrieve. This paper provides three core contributions. First, we provide a formal model that captures and formalizes the properties of the temporal indexing problem in an intuitive way. Second, we provide an in-depth discussion on the unique benefits in transaction control which can be achieved by treating versioning as a first-class citizen in a data store as opposed to treating it as an add-on feature to a non-versioned system. We also introduce an index model that offers equally fast access to all entries, regardless of their age. The third contribution is an opensource implementation of the presented formalism in the form of a versioned key-value store, which serves as a proof-of-concept prototype. An evaluation of this prototype demonstrates the scalability of our approach.

References

  1. Codd, E. F., Codd, S. B., and Salley, C. T. (1993). Providing olap (on-line analytical processing) to user-analysts: An it mandate. Codd and Date, 32.
  2. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. (2007). Dynamo: amazon's highly available key-value store. In ACM SIGOPS Operating Systems Review, volume 41, pages 205-220. ACM.
  3. Easton, M. C. (1986). Key-sequence data sets on indelible storage. IBM Journal of Research and Development, 30(3):230-241.
  4. Felber, P., Pasin, M., Riviere, E., Schiavoni, V., Sutra, P., Coelho, F., et al. (2014). On the Support of Versioning in Distributed Key-Value Stores. In 33rd IEEE SRDS 2014, Nara, Japan, October 6-9, 2014, pages 95-104.
  5. ISO (2011). SQL Standard 2011 (ISO/IEC 9075:2011).
  6. Jensen, C. S., Dyreson, C. E., B öhlen, M., Clifford, J., Elmasri, R., Gadia, S. K., et al. (1998). Temporal Databases: Research and Practice, chapter The consensus glossary of temporal database concepts - February 1998 version, pages 367-405. Springer Berlin Heidelberg, Berlin, Heidelberg.
  7. Lakshman, A. and Malik, P. (2010). Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35-40.
  8. Lomet, D., Barga, R., Mokbel, M., and Shegalov, G. (2006). Transaction time support inside a database engine. In Proceedings of the 22nd ICDE, pages 35-35.
  9. Lomet, D. and Salzberg, B. (1989). Access Methods for Multiversion Data. SIGMOD Rec., 18(2):315-324.
  10. Nascimento, M., Dunham, M., and Elmasri, R. (1996). MIVTT: An index for bitemporal databases. In Wagner, R. and Thoma, H., editors, Database and Expert Systems Applications, volume 1134 of Lecture Notes in Computer Science, pages 779-790. Springer Berlin Heidelberg.
  11. Ramaswamy, S. (1997). Efficient indexing for constraint and temporal databases. In Database TheoryICDT'97, pages 419-431. Springer.
  12. Salzberg, B. (1988). File Structures: An Analytic Approach. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
  13. Salzberg, B. and Tsotras, V. J. (1999). Comparison of access methods for time-evolving data. ACM Computing Surveys (CSUR), 31(2):158-221.
  14. Saracco, C., Nicola, M., and Gandhi, L. (2012). A matter of time: Temporal data management in DB2 10. IBM developerWorks.
  15. Shi, Z. and Shibasaki, R. (2000). GIS Database RevisionThe Problems and Solutions. International Archives of Photogrammetry and Remote Sensing, 33(B2; PART 2):494-501.
  16. Snodgrass, R. T. (1986). Temporal databases. IEEE Computer, 19:35-42.
  17. Urbano, F. and Cagnacci, F. (2014). Spatial Database for GPS Wildlife Tracking Data: A Practical Guide to Creating a Data Management System with PostgreSQL/PostGIS and R. Springer Science & Business Media.
Download


Paper Citation


in Harvard Style

Haeusler M. (2016). Scalable Versioning for Key-Value Stores . In Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-193-9, pages 79-86. DOI: 10.5220/0005938700790086


in Bibtex Style

@conference{data16,
author={Martin Haeusler},
title={Scalable Versioning for Key-Value Stores},
booktitle={Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2016},
pages={79-86},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005938700790086},
isbn={978-989-758-193-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Scalable Versioning for Key-Value Stores
SN - 978-989-758-193-9
AU - Haeusler M.
PY - 2016
SP - 79
EP - 86
DO - 10.5220/0005938700790086