Authors:
Daniel Rocco
1
;
James Caverlee
2
and
Ling Liu
2
Affiliations:
1
University of West Georgia, United States
;
2
College of Computing, Georgia Institute of Technology, United States
Keyword(s):
XML compression, path queries.
Related
Ontology
Subjects/Areas/Topics:
Internet Technology
;
Protocols and Standards
;
Web Information Systems and Technologies
;
Web Services and Web Engineering
;
XML and Data Management
Abstract:
XML is an increasingly popular data storage and exchange format whose popularity can be attributed to its
self-describing syntax, acceptance as a data transmission and archival standard, strong internationalization support, and a plethora of supporting tools and technologies. However, XML’s verbose, repetitive, text-oriented document specification syntax is a liability for many emerging applications such as mobile computing and distributed document dissemination. This paper presents XPack, an efficient XML document compression system that exploits information inherent in the document structure to enhance compression quality. Additionally, the utilization of XML structure features in XPack’s design should provide valuable support for structure-aware queries over compressed documents. Taken together, the techniques employed in the XPack compression scheme provide a foundation for efficiently storing, transmitting, and operating over Web documents. Initial experimental results demonst
rate that XPack can reduce the storage requirements for Web documents by up to 20% over previous XML compression techniques. More significantly, XPack can simultaneously support operations over the documents, providing up to two orders of magnitude performance improvement for certain document operations when compared to equivalent operations on unencoded XML documents.
(More)