BSBC: TOWARDS A SUCCINCT DATA FORMAT FOR XML STREAMS

Stefan Böttcher, Rita Hartel, Christian Heinzemann

Abstract

XML data compression is an important feature in XML data exchange, particularly when the data size may cause bottlenecks or when bandwidth and energy consumption limitations require reducing the amount of the exchanged XML data. However, applications based on XML data streams also require efficient path query processing on the structure of compressed XML data streams. We present a succinct representation of XML data streams, called Bit-Stream-Based-Compression (BSBC) that fulfills these requirements and additionally provides a compression ratio that is significantly better than that of other queriable XML compression techniques, i.e. XGrind and DTD subtraction, and that of non-queriable compression techniques like gzip. Finally, we present an empirical evaluation comparing BSBC with these compression techniques and with XMill that demonstrates the benefits of BSBC.

References

  1. R. J. Bayardo, D. Gruhl, V. Josifovski, and J. Myllymaki., 2004. An evaluation of binary xml encoding optimizations for fast stream based XML processing. In Proc. of the 13th international conference on World Wide Web.
  2. S. Böttcher, R. Steinmetz, N. Klein, 2007. XML Index Compression by DTD Subtraction. International Conference on Enterprise Information Systems (ICEIS).
  3. S. Böttcher and R. Steinmetz, 2007. Data Management for Mobile Ajax Web 2.0 Applications. DEXA.
  4. P. Buneman, M. Grohe, Ch. Koch, 2003. Path Queries on Compressed XML. VLDB.
  5. M. Burrows and D. Wheeler, 1994. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation.
  6. G. Busatto, M. Lohrey, and S. Maneth, 2005. Efficient Memory Representation of XML Dokuments, DBPL.
  7. K. Selçuk Candan, Wang-Pin Hsiung, Songting Chen, Jun'ichi Tatemura, Divyakant Agrawal, 2006. AFilter: Adaptable XML Filtering with Prefix-Caching and Suffix-Clustering. VLDB.
  8. J. Cheney, 2001. Compressing XML with multiplexed hierarchical models. In Proceedings of the 2001 IEEE Data Compression Conference (DCC 2001).
  9. J. Cheng, W. Ng: XQzip, 2004. Querying Compressed XML Using Structural Indexing. EDBT.
  10. P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan, 2006. Compressing and Searching XML Data Via Two Zips. In Proceedings of the Fifteenth International World Wide Web Conference.
  11. M. Girardot and N. Sundaresan. Millau, 2000. An Encoding Format for Efficient Representation and Exchange of XML over the Web. Proceedings of the 9th International WWW Conference.
  12. D.A. Huffman, 1952. A method for the construction of minimum-redundancy codes. In: Proc. of the I.R.E.
  13. H. Liefke and D. Suciu, 2000. XMill: An Efficient Compressor for XML Data, Proc. of ACM SIGMOD.
  14. J. K. Min, M. J. Park, C. W. Chung, 2003. XPRESS: A Queriable Compression for XML Data. In Proceedings of SIGMOD.
  15. W. Ng, W. Y. Lam, P. T. Wood, M. Levene, 2006: XCQ: A queriable XML compression system. Knowledge and Information Systems.
  16. D. Olteanu, H. Meuss, T. Furche, F. Bry, 2002: XPath: Looking Forward. EDBT Workshops.
  17. A. Schmidt, F. Waas, M. Kersten, M. Carey, I. Manolescu, and R. Busse, 2002. XMark: A benchmark for XML data management. Hong Kong, China.
  18. P. M. Tolani and J. R. Hartisa, 2002. XGRIND: A queryfriendly XML compressor. In Proc. ICDE.
  19. B. B. Yao and M. T. Özsu, 2002. XBench - A family of benchmarks for XML DBMS.
  20. N. Zhang, V. Kacholia, M. T. Özsu, 2004. A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML. ICDE
Download


Paper Citation


in Harvard Style

Böttcher S., Hartel R. and Heinzemann C. (2008). BSBC: TOWARDS A SUCCINCT DATA FORMAT FOR XML STREAMS . In Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-26-5, pages 13-21. DOI: 10.5220/0001518000130021


in Bibtex Style

@conference{webist08,
author={Stefan Böttcher and Rita Hartel and Christian Heinzemann},
title={BSBC: TOWARDS A SUCCINCT DATA FORMAT FOR XML STREAMS},
booktitle={Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2008},
pages={13-21},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001518000130021},
isbn={978-989-8111-26-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - BSBC: TOWARDS A SUCCINCT DATA FORMAT FOR XML STREAMS
SN - 978-989-8111-26-5
AU - Böttcher S.
AU - Hartel R.
AU - Heinzemann C.
PY - 2008
SP - 13
EP - 21
DO - 10.5220/0001518000130021