XML INDEX COMPRESSION BY DTD SUBTRACTION
Stefan Böttcher, Rita Steinmetz, Niklas Klein
2007
Abstract
Whenever XML is used as format to exchange large amounts of data or even for data streams, the verbose behavior of XML is one of the bottlenecks. While compression of XML data seems to be a way out, it is essential for a variety of applications that the compression result can be queried efficiently. Furthermore, for efficient path query evaluation, an index is desired, which usually generates an additional data structure. For this purpose, we have developed a compression technique that uses structure information found in the DTD to perform a structure-preserving compression of XML data and provides a compression of an index that allows for efficient search in the compressed data. Our evaluation shows that compression factors which are close to gzip are possible, whereas the structural part of XML files can be compressed even better.
References
- Arion, A., Bonifati, A., Costa, G., D'Aguanno S., Manolescu, I., Pugliese, A., 2003. XQueC: Pushing queries to compressed XML data. In Proc. VLDB.
- Buneman, P., Choi, B., Fan, W., Hutchison, R., Mann, R., Viglas. S., 2005. Vectorizing and Querying Large XML Repositories. In ICDE 2005.
- Buneman P., Grohe, M., Koch, C., 2003. Path Queries on Compressed XML. In VLDB 2003.
- Busatto, G., Lohrey, M., Maneth, S., 2005. Efficient Memory Representation of XML Documents. In DBPL 2005.
- Cheng, J., Ng, W., 2004. XQzip: Querying Compressed XML Using Structural Indexing. In EDBT 2004.
- Cloksin W.F., Mellish, C.S., 1997. Programming in Prolog, Springer. Berlin, 4th Edition.
- Fredkin, E., 1960. Trie Memory. In Communications of the ACM.
- Huffman, D., 1952. A Method for Construction of Minimum-Redundancy Codes. In Proc. of IRE.
- Liefke, H., Suciu, D., 2000. XMill: An Efficient Compressor for XML Data. In Proc. of ACM SIGMOD.
- Min, J.K., Park, M.J., Chung, C.W., 2003. XPRESS: A Queriable Compression for XML Data. In Proceedings of SIGMOD.
- Ng, W., Lam, W.-Y., Wood, P.T., Levene, M., 2006 XCQ: A Queriable XML Compression System. In Knowledge and Information Systems, Springer-Verlag.
- Olteanu, D., Meuss, H., Furche, T., Bry, F., 2002. XPath: Looking Forward. In EDBT Workshops 2002.
- Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu I., Busse, R., 2002. XMark: A benchmark for XML data management. In VLDB 2002.
- Su, H., Rundensteiner, E.A., Mani, M., 2005. Semantic Query Optimization for XQuery over XML Streams. In VLDB 2005.
- Sundaresan N., Moussa, R., 2001. Algorithms and programming models for efficient representation of XML for Internet applications. In WWW 2001.
- Tolani, P.M., Hartisa, J.R., 2002. XGRIND: A queryfriendly XML compressor. In Proc. ICDE 2002.
- Yao, B.B., Ozsu, M.T., Kennleyside, J., 2002. XBench - A family of benchmarks for XML DBMSs. In Proceedings of EEXTT.
- Extensible Markup Language (XML) 1.0, 2000. http://www.w3.org/TR/2000/REC-xml-20001006 Ziv, J., Lempel, A., 1977. A Universal Algorithm for Sequential Data Compression. In IEEE Transactions on Information Theory.
Paper Citation
in Harvard Style
Böttcher S., Steinmetz R. and Klein N. (2007). XML INDEX COMPRESSION BY DTD SUBTRACTION . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-972-8865-88-7, pages 86-94. DOI: 10.5220/0002365900860094
in Bibtex Style
@conference{iceis07,
author={Stefan Böttcher and Rita Steinmetz and Niklas Klein},
title={XML INDEX COMPRESSION BY DTD SUBTRACTION},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2007},
pages={86-94},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002365900860094},
isbn={978-972-8865-88-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - XML INDEX COMPRESSION BY DTD SUBTRACTION
SN - 978-972-8865-88-7
AU - Böttcher S.
AU - Steinmetz R.
AU - Klein N.
PY - 2007
SP - 86
EP - 94
DO - 10.5220/0002365900860094