Authors:
Stefan Böttcher
1
;
Rita Steinmetz
1
and
Niklas Klein
2
Affiliations:
1
Computer Science, University of Paderborn, Germany
;
2
ComTec, University of Kassel, Germany
Keyword(s):
XML, Compression, Index, Data Streams.
Related
Ontology
Subjects/Areas/Topics:
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Web Databases
Abstract:
Whenever XML is used as format to exchange large amounts of data or even for data streams, the verbose behavior of XML is one of the bottlenecks. While compression of XML data seems to be a way out, it is essential for a variety of applications that the compression result can be queried efficiently. Furthermore, for efficient path query evaluation, an index is desired, which usually generates an additional data structure. For this purpose, we have developed a compression technique that uses structure information found in the DTD to perform a
structure-preserving compression of XML data and provides a compression of an index that allows for efficient search in the compressed data. Our evaluation shows that compression factors which are close to gzip are possible, whereas the structural part of XML files can be compressed even better.