VCODEX: A DATA COMPRESSION PLATFORM

Kiem-Phong Vo

2007

Abstract

Vcodex is a platform to compress and transform data. A standard interface, data transform, is defined to represent any algorithm or technique to encode data. Although primarily geared toward data compression, a data transform can perform any type of processing including encryption, portability encoding and others. Vcodex provides a core set of data transforms implementing a wide variety of compression algorithms ranging from general purpose ones such as Huffman or Lempel-Ziv to structure-driven ones such as reordering fields and columns in relational data tables. Such transforms can be reused and composed together to build more complex compressors. An overview of the software and data architecture of Vcodex will be presented. Examples and experimental results show how compression performance beyond traditional approaches can be achieved by customizing transform compositions based on data semantics.

References

  1. T. Bell and M. Powell. The Canterbury Corpus, http://corpus.canterbury.ac.nz. Technical Report, 2001.
  2. J. Bentley, D. Sleator, R. Tarjan, and V. Wei. A Locally Adapative Data Compression Scheme. Comm. of the ACM, 29:320-330, 1986.
  3. A. Buchsbaum, G.S. Fowler, and R. Giancarlo. Improving Table Compression with Combinatorial Optimization. J. of the ACM, 50(6):825-51, 2003.
  4. M. Burrows and D.J. Wheeler. A Block-Sorting Lossless Data Compression Algorithm. Report 124, Digital Systems Research Center, 1994.
  5. S. Deorowicz. Improvements to Burrows-Wheeler Compression Algorithm. Software-Practice and Experience, 30(13):1465-1483, 2000.
  6. P. Deutsch. DEFLATE Compressed Data Format Specification version 1.3. In http://www.ietf.org. IETF RFC1951, 1996.
  7. G.S. Fowler, A. Hume, D.G. Korn, and K.-P. Vo. Migrating an MVS Mainframe Application to a PC. In Proceedings of Usenix'2004. USENIX, 2004.
  8. J. Gailly and M. Adler. Zlib, http://www.zlib.net. Technical report, 2005.
  9. D.A. Huffman. A Method for the Construction of Minimum-Redundancy Codes. Proc. of the IRE, 40(9):1098-1101, Sept 1952.
  10. J.J. Hunt, K.-P. Vo, and W.F. Tichy. Delta Algorithms: An Empirical Analysis. ACM Transactions on Software Engineering and Methodology, 7:192-214, 1998.
  11. D.W. Jones. Practical Evaluation of a Data Compression Algorithm. In Data Compression Conference. IEEE Computer Society Press, 1991.
  12. David G. Korn and Kiem-Phong Vo. SFIO: Safe/Fast String/File IO. In Proc. of the Summer 7891 Usenix Conference, pages 235-256. USENIX, 1991.
  13. D.G. Korn, J. MacDonals, J. Mogul, and K.-P. Vo. The VCDIFF Generic Differencing and Compression Data Format. Internet Engineering Task Force, www.ietf.org, RFC 3284, 2002.
  14. D.G. Korn and K.-P. Vo. Engineering a Differencing and Compression Data Format. In Proceedings of Usenix'2002. USENIX, 2002.
  15. H. Liefke and D. Suciu. Xmill: an efficient compressor for xml data. In Proc. of SIGMOD, pages 153-164, 2000.
  16. G. Manzini and M. Rastero. A Simple and Fast DNA Compression Algorithm. Software-Practice and Experience, 34:1397-1411, 2004.
  17. J. Seward. Bzip2, http://www.bzip.org. Technical report, 1994.
  18. W. F. Tichy. RCS-a system for version control. SoftwarePractice and Experience, 15(7):637-654, 1985.
  19. B.D. Vo and K.-P. Vo. Using Column Dependency to Compress Tables. Data Compression Conference, 2004.
  20. B.D. Vo and K.-P. Vo. Compressing Table Data with Column Dependency. Theoretical Computer Science, accepted for publication, 2006.
  21. K.-P. Vo. The Discipline and Method Architecture for Reusable Libraries. Software-Practice and Experience, 30:107-128, 2000.
  22. I.H. Witten, M. Radford, and J.G. Cleary. Arithmetic Coding for Data Compression. Comm. of the ACM, 30(6):520-540, June 1987.
  23. J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE Trans. on Information Theory, 23(3):337-343, May 1977.
  24. J. Ziv and A. Lempel. Compression of Individual Sequences via Variable-Rate Coding. IEEE Trans. on Information Theory, 24(5):530-536, 1978.
Download


Paper Citation


in Harvard Style

Vo K. (2007). VCODEX: A DATA COMPRESSION PLATFORM . In Proceedings of the Second International Conference on Software and Data Technologies - Volume 2: ICSOFT, ISBN 978-989-8111-06-7, pages 81-89. DOI: 10.5220/0001344600810089


in Bibtex Style

@conference{icsoft07,
author={Kiem-Phong Vo},
title={VCODEX: A DATA COMPRESSION PLATFORM},
booktitle={Proceedings of the Second International Conference on Software and Data Technologies - Volume 2: ICSOFT,},
year={2007},
pages={81-89},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001344600810089},
isbn={978-989-8111-06-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Software and Data Technologies - Volume 2: ICSOFT,
TI - VCODEX: A DATA COMPRESSION PLATFORM
SN - 978-989-8111-06-7
AU - Vo K.
PY - 2007
SP - 81
EP - 89
DO - 10.5220/0001344600810089