specific algorithms have both been provided. Besides
the formal and algorithmic approaches, experiments
showed that the annotated tree compressibility, with-
out using any backend compressors is high, on aver-
age approximately 0.4. Finally, a general analysis and
results of testing of compression of entire XML doc-
ument with single instances of vanilla compressors,
compression of annotated tree over markup density,
and compression of annotated transform over com-
pression of XML data were provided, showing the
usefulness of the annotated tree approach.
Simple queries, such as finding all children of a
given node can be efficiently evaluated using the an-
notated trees. Our future work will extend queries to
the subset of XPath expressions known as the core
XPath as defined in (Gottlob et al., 2005), as well as
more sophisticated navigational queries, e.g. asking
for the j-th level-ancestor of u.
ACKNOWLEDGEMENTS
The work of the first and third authors are par-
tially supported by the NSERC RGPIN grant and
NSERC CSG-M (Canada Graduate Scholarship-
Masters) grant respectively.
REFERENCES
Arion, A., Bonifati, A., Manolescu, I., and Pugliese, A.
(2007). XQueC: a query-conscious compressed XML
database. ACM Transactions on Internet Technology,
7(2).
Baseball.xml (2013). baseball.xml, retrieved October 2013
from http://rassyndrome.webs.com/cc/baseball.xml.
Benoit, D., Demaine, E., Munro, J., and Raman, V. (1999).
Representing Trees of Higher Degree. In Dehne, F.,
Sack, J., Gupta, A., and Tamassia, R., editors, Algo-
rithms and Data Structures, volume 1663 of Lecture
Notes in Computer Science, pages 169–180. Springer
Berlin Heidelberg.
Bille, P., Gortz, I., Weimann, O., and Landau, G. M. (2013).
Tree Compression with Top Trees. In In Proceedings
of the 40th International Colloquium on Automata,
Languages, and Programming.
Burrows, M. and Wheeler, D. (1994). A block-sorting loss-
less data compression algorithm. Technical Report,
Digital Equipment Corporation.
Busatto, G., Lohrey, M., and Maneth, S. (2005). Efficient
Memory Representation of XML Documents. In Bier-
man, G. and Koch, C., editors, Database Program-
ming Languages, volume 3774 of Lecture Notes in
Computer Science, pages 199–216. Springer Berlin
Heidelberg.
Busatto, G., Lohrey, M., and Maneth, S. (2008). Efficient
memory representation of XML document trees. Inf.
Syst., 33(4-5):456–474.
bzip2 (2013). bzip2 compression, retrieved October 2013
from http://www.bzip.org/.
Chen, S. and Reif, J. (1996). Efficient Lossless Compres-
sion of Trees and Graphs. In In IEEE Data Compres-
sion Conference (DCC).
Consortium, T. U. (2013). Update on activities at
the Universal Protein Resource (UniProt) in 2013.
http://dx.doi.org/10.1093/nar/gks1068. Retrieved on
June 20, 2013.
Corbin, T., M
¨
uldner, T., and Miziołek, J. (2013). Pre-order
Compression Schemes for XML in the Real Time En-
vironment. In The Ninth International Conference on
Web Information Systems and Technologies, Aachen,
Germany. WEBIST.
Corpus, W. (2013). Wratislavia XML cor-
pus, retrieved October 2013 from
http://www.ii.uni.wroc.pl/ inikep/research/wratislavia/.
Ferragina, P., Luccio, F., Manzini, G., and Muthukrishnan,
S. (2009). Compressing and indexing labeled trees,
with applications. J. ACM, 57(1):4:1–4:33.
Gottlob, G., Koch, C., and Pichler, R. (2005). Efficient al-
gorithms for processing xpath queries. ACM Trans.
Database Syst., 30(2):444–491.
GZIP (2013). The gzip home page, retrieved October 2013
from http://www.gzip.org.
Jacobson, G. (1989). Space-efficient static trees and graphs.
In Proceedings of the 30th Annual Symposium on
Foundations of Computer Science, SFCS ’89, pages
549–554, Washington, DC, USA. IEEE Computer So-
ciety.
Mahoney, M. (2012). Large Text Compression
Benchmark, Retrieved October 2013 from
http://mattmahoney.net/dc/zpaq.html.
M
¨
uldner, T., Corbin, T., Miziołek, J., and Fry, C. (2012).
Design and Implementation of an Online XML Com-
pressor for Large XML Files. International Journal
On Advances in Internet Technology, 5(3):115–118.
M
¨
uldner, T., Fry, C., Miziołek, J., and Durno, S. (2009).
XSAQCT: XML queryable compressor. In Balisage:
The Markup Conference 2009, Montreal, Canada.
XML (2013). Extensible markup language (XML)
1.0 (Fifth edition), retrieved October 2013 from
http://www.w3.org/tr/rec-xml/.
xmlgen (2013). The benchmark data generator,
retrieved October 2013 from http://www.xml-
benchmark.org/generator.html.
Ziv, J. and Lempel, A. (2006). A universal algorithm for
sequential data compression. IEEE Trans. Inf. Theor.,
23(3):337–343.
ZPAQ (2013). Zpaq, retrieved October 2013 from
http://www.w3.org/tr/rec-xml/.
AnnotatedTreesandtheirApplicationstoXMLCompression
39