Stefan Böttcher, Rita Steinmetz, Niklas Klein


Whenever XML is used as format to exchange large amounts of data or even for data streams, the verbose behavior of XML is one of the bottlenecks. While compression of XML data seems to be a way out, it is essential for a variety of applications that the compression result can be queried efficiently. Furthermore, for efficient path query evaluation, an index is desired, which usually generates an additional data structure. For this purpose, we have developed a compression technique that uses structure information found in the DTD to perform a structure-preserving compression of XML data and provides a compression of an index that allows for efficient search in the compressed data. Our evaluation shows that compression factors which are close to gzip are possible, whereas the structural part of XML files can be compressed even better.


  1. Arion, A., Bonifati, A., Costa, G., D'Aguanno S., Manolescu, I., Pugliese, A., 2003. XQueC: Pushing queries to compressed XML data. In Proc. VLDB.
  2. Buneman, P., Choi, B., Fan, W., Hutchison, R., Mann, R., Viglas. S., 2005. Vectorizing and Querying Large XML Repositories. In ICDE 2005.
  3. Buneman P., Grohe, M., Koch, C., 2003. Path Queries on Compressed XML. In VLDB 2003.
  4. Busatto, G., Lohrey, M., Maneth, S., 2005. Efficient Memory Representation of XML Documents. In DBPL 2005.
  5. Cheng, J., Ng, W., 2004. XQzip: Querying Compressed XML Using Structural Indexing. In EDBT 2004.
  6. Cloksin W.F., Mellish, C.S., 1997. Programming in Prolog, Springer. Berlin, 4th Edition.
  7. Fredkin, E., 1960. Trie Memory. In Communications of the ACM.
  8. Huffman, D., 1952. A Method for Construction of Minimum-Redundancy Codes. In Proc. of IRE.
  9. Liefke, H., Suciu, D., 2000. XMill: An Efficient Compressor for XML Data. In Proc. of ACM SIGMOD.
  10. Min, J.K., Park, M.J., Chung, C.W., 2003. XPRESS: A Queriable Compression for XML Data. In Proceedings of SIGMOD.
  11. Ng, W., Lam, W.-Y., Wood, P.T., Levene, M., 2006 XCQ: A Queriable XML Compression System. In Knowledge and Information Systems, Springer-Verlag.
  12. Olteanu, D., Meuss, H., Furche, T., Bry, F., 2002. XPath: Looking Forward. In EDBT Workshops 2002.
  13. Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu I., Busse, R., 2002. XMark: A benchmark for XML data management. In VLDB 2002.
  14. Su, H., Rundensteiner, E.A., Mani, M., 2005. Semantic Query Optimization for XQuery over XML Streams. In VLDB 2005.
  15. Sundaresan N., Moussa, R., 2001. Algorithms and programming models for efficient representation of XML for Internet applications. In WWW 2001.
  16. Tolani, P.M., Hartisa, J.R., 2002. XGRIND: A queryfriendly XML compressor. In Proc. ICDE 2002.
  17. Yao, B.B., Ozsu, M.T., Kennleyside, J., 2002. XBench - A family of benchmarks for XML DBMSs. In Proceedings of EEXTT.
  18. Extensible Markup Language (XML) 1.0, 2000. http://www.w3.org/TR/2000/REC-xml-20001006 Ziv, J., Lempel, A., 1977. A Universal Algorithm for Sequential Data Compression. In IEEE Transactions on Information Theory.

Paper Citation

in Harvard Style

Böttcher S., Steinmetz R. and Klein N. (2007). XML INDEX COMPRESSION BY DTD SUBTRACTION . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-972-8865-88-7, pages 86-94. DOI: 10.5220/0002365900860094

in Bibtex Style

author={Stefan Böttcher and Rita Steinmetz and Niklas Klein},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},

in EndNote Style

JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
SN - 978-972-8865-88-7
AU - Böttcher S.
AU - Steinmetz R.
AU - Klein N.
PY - 2007
SP - 86
EP - 94
DO - 10.5220/0002365900860094