Papers Papers/2022 Papers Papers/2022



Paper Unlock

Authors: Stefan Böttcher ; Rita Hartel and Christoph Krislin

Affiliation: University of Paderborn, Computer Science, Germany

Keyword(s): XML Compression, Grammar-based Compression, XML Sub-tree Clustering.

Related Ontology Subjects/Areas/Topics: Databases and Information Systems Integration ; e-Business ; Enterprise Information Systems ; Middleware Integration ; Middleware Platforms ; Technology Platforms ; Web Databases

Abstract: XML has become the de facto standard for data exchange in enterprise information systems. But whenever XML data is stored or processed, e.g. in form of a DOM tree representation, the XML markup causes a huge blow-up of the memory consumption compared to the data, i.e., text and attribute values, contained in the XML document. In this paper, we present CluX, an XML compression approach based on clustering XML sub-trees. CluX uses a grammar for sharing similar substructures within the XML tree structure and a cluster-based heuristics for greedily selecting the best compression options in the grammar. Thereby, CluX allows for storing and exchanging XML data in a space efficient and still queryable way. We evaluate different strategies for XML structure sharing, and we show that CluX often compresses better than XMill, Gzip, and Bzip2, which makes CluX a promising technique for XML data exchange whenever the exchanged data volume is a bottleneck in enterprise information systems.


Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Böttcher, S., Hartel, R. and Krislin, C. (2010). CLUX - Clustering XML Sub-trees. In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 3: ICEIS; ISBN 978-989-8425-04-1; ISSN 2184-4992, SciTePress, pages 142-150. DOI: 10.5220/0002877901420150

author={Stefan Böttcher and Rita Hartel and Christoph Krislin},
title={CLUX - Clustering XML Sub-trees},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 3: ICEIS},


JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 3: ICEIS
TI - CLUX - Clustering XML Sub-trees
SN - 978-989-8425-04-1
IS - 2184-4992
AU - Böttcher, S.
AU - Hartel, R.
AU - Krislin, C.
PY - 2010
SP - 142
EP - 150
DO - 10.5220/0002877901420150
PB - SciTePress