Schema-based Parallel Compression and Decompression of XML Data

Stefan Böttcher, Matthias Feldotto, Rita Hartel

Abstract

Whenever huge amounts of XML data have to be transferred from a web server to multiple clients, the transferred data volumes can be reduced significantly by sending compressed XML instead of plain XML. Whenever applications require querying a compressed XML format and XML compression or decompression time is a bottleneck, parallel XML compression and parallel decompression may be of significant advantage. We choose the XML compressor XSDS as starting point for our new approach to parallel compression and parallel decompression of XML documents for the following reasons. First, XSDS generally reaches stronger compression ratios than other compressors like gzip, bzip2, and XMill. Second, in contrast to these compressors, XSDS not only supports XPath queries on compressed XML data, but also XPath queries can be evaluated on XSDS compressed data even faster than on uncompressed XML. We propose a String-search-based parsing approach to parallelize XML compression with XSDS, and we show that we can speed-up the compression of XML documents by a factor of 1.4 and that we can speed-up the decompression time even by a factor of up to 7 on a quad-core processor.

References

  1. Adiego, J., Navarro, G., & Fuente, P. d. (2004). LempelZiv Compression of Structured Text. Data Compression Conference (S. 112-121). Snowbird, UT, USA: IEEE Computer Society.
  2. Arion, A., Bonifati, A., Manolescu, I., & Pugliese, A. (2007). XQueC: A query-conscious compressed XML database. ACM Trans. Internet Techn. , 7 (2).
  3. Bayardo Jr., R. J., Gruhl, D., Josifovski, V., & Myllymaki, J. (2004). An evaluation of binary XML encoding optimizations for fast stream based xml processing. In S. I. Feldman, M. Uretsky, M. Najork, & C. E. Wills (Hrsg.), Proceedings of the 13th international conference on World Wide Web (S. 345-354). New York, NY, USA: ACM.
  4. Böttcher, S., Hartel, R., & Heindorf, S. (2012). XPath evaluation for Schema-compressed XML data. To appear in: Australasian Database Conference (ADC 2012). Melbourne, Australia.
  5. Böttcher, S., Hartel, R., & Messinger, C. (2010). Searchable Compression of Office Documents by XML Schema Subtraction. Database and XML Technologies - 7th International XML Database Symposium, XSym 2010 (S. 103-112). Singapore: Springer.
  6. Böttcher, S., Hartel, R., & Weber, S. (2012). Efficient String-based XML Stream Prefiltering. To appear in: Australasian Database Conference (ADC 2012). Melbourne, Australia.
  7. Böttcher, S., Steinmetz, R., & Klein, N. (2007). XML index compression by DTD subtraction. ICEIS 2007 - Proceedings of the Ninth International Conference on Enterprise Information Systems, Volume DISI, (S. 86- 94). Funchal, Madeira, Portugal.
  8. Buneman, P., Grohe, M., & Koch, C. (2003). Path Queries on Compressed XML. Proceedings of 29th International Conference on Very Large Data Bases (S. 141-152). Berlin, Germany: Morgan Kaufmann.
  9. Busatto, G., Lohrey, M., & Maneth, S. (2005). Efficient Memory Representation of XML Documents. Database Programming Languages, 10th International Symposium, DBPL 2005 (S. 199-216). Trondheim, Norway: Springer.
  10. Cheney, J. (2001). Compressing XML with Multiplexed Hierarchical PPM Models. Proceedings of the IEEE Data Compression Conference (DCC 2001) (S. 163). Snowbird, Utah, USA: IEEE Computer Society.
  11. Cheng, J., & Ng, W. (2004). XQzip: Querying Compressed XML Using Structural Indexing. Advances in Database Technology - EDBT 2004, 9th International Conference on Extending Database Technology (S. 219-236). Heraklion, Crete, Greece: Springer.
  12. Gilchrist, J. Parallel BZIP2 (PBZIP2). http://compression.ca/pbzip2/.
  13. Girardot, M., & Sundaresan, N. (2000). Millau: an encoding format for efficient representation and exchange of XML over the Web. Computer Networks , 33, 747-765.
  14. Howard, P. G., & Vitter, J. S. (1996). Parallel Lossless Image Compression Using Huffman and Arithmetic Coding. Inf. Process. Lett. , 59, 65-73.
  15. Liefke, H., & Suciu, D. (2000). XMILL: An Efficient Compressor for XML Data. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (S. 153-164). Dallas, Texas, USA: ACM.
  16. Min, J.-K., Park, M.-J., & Chung, C.-W. (2003). XPRESS: A Queriable Compression for XML Data. In A. Y. Halevy, Z. G. Ives, & A. Doan (Hrsg.), Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (S. 122-133). San Diego, California, USA: ACM.
  17. Ng, W., Lam, W. Y., Wood, P. T., & Levene, M. (2006). XCQ: A queriable XML compression system. Knowl. Inf. Syst. , 421-452.
  18. Schmidt, A., Waas, F., Kersten, M. L., Carey, M. J., Manolescu, I., & Busse, R. (2002). XMark: A Benchmark for XML Data Management. VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, (S. 974-985). Hong Kong, China.
  19. Subramanian, H., & Shankar, P. (2005). Compressing XML Documents Using Recursive Finite State Automata. In J. Farré, I. Litovsky, & S. Schmitz (Hrsg.), Implementation and Application of Automata, 10th International Conference, CIAA 2005 (S. 282- 293). Sophia Antipolis, France: Springer.
  20. Tolani, P. M., & Haritsa, J. R. (2002). XGRIND: A Query-Friendly XML Compressor. Proceedings of the 18th International Conference on Data, ICDE (S. 225- 234). San Jose, CA: IEEE Computer Society.
  21. Werner, C., Buschmann, C., Brandt, Y., & Fischer, S. (2006). Compressing SOAP Messages by using Pushdown Automata. 2006 IEEE International Conference on Web Services (ICWS 2006) (S. 19-28). Chicago, Illinois, USA: IEEE Computer Society.
  22. Yao, B. B., Özsu, M. T., & Khandelwal, N. (2004). XBench Benchmark and Performance Testing of XML DBMSs. ICDE 2004, (S. 621-632).
  23. Zhang, N., Kacholia, V., & Özsu, M. T. (2004). A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML. Proceedings of the 20th International Conference on Data Engineering, ICDE 2004 (S. 54-65). Boston, MA, USA: IEEE Computer Society.
Download


Paper Citation


in Harvard Style

Böttcher S., Feldotto M. and Hartel R. (2013). Schema-based Parallel Compression and Decompression of XML Data . In Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8565-54-9, pages 77-86. DOI: 10.5220/0004366300770086


in Bibtex Style

@conference{webist13,
author={Stefan Böttcher and Matthias Feldotto and Rita Hartel},
title={Schema-based Parallel Compression and Decompression of XML Data},
booktitle={Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2013},
pages={77-86},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004366300770086},
isbn={978-989-8565-54-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Schema-based Parallel Compression and Decompression of XML Data
SN - 978-989-8565-54-9
AU - Böttcher S.
AU - Feldotto M.
AU - Hartel R.
PY - 2013
SP - 77
EP - 86
DO - 10.5220/0004366300770086