Pre-order Compression Schemes for XML in the Real Time Environment

Tyler Corbin, Tomasz Müldner, Jan Krzysztof Miziołek

Abstract

The advantages of using XML come at the cost, especially when used on networks and small mobile devices. This paper presents a design and implementation of four online XML compression algorithms, which exploit local structural redundancies of pre-order traversals of an XML tree, and focus on reducing the overhead of sending packets and maintaining load balancing between the sender and receiver. For testing, we designed a suite consisting of 11 XML files with various characteristics. Ten encoding techniques were compared, compressed respectively using GZIP, EXI, Treechop, XSAQCT and its improvement, and our algorithms. Experiments indicate that our new algorithms have similar or better performance than other online algorithms, and have only worse performance than EXI for files larger than 1 GB.

References

  1. Arion, A., Bonifati, A., Manolescu, I., and Pugliese, A. (2007). XQueC: a query-conscious compressed XML database. ACM Transactions on Internet Technology, 7(2).
  2. Baseball.xml (2012). baseball.xml, retrieved October 2012 from http://rassyndrome.webs.com/cc/baseball.xml.
  3. enwiki dumps (2012). enwiki-latest.xml, retrieved October 2012 from http://dumps.wikimedia.org/enwiki/latest/.
  4. EXI (2012). Efficient XML Interchange (EXI) Format 1.0, Retrieved October 2012 from http://www.w3.org/ TR/exi/.
  5. GZIP (2012). The gzip home page, retrieved October 2012 from http://www.gzip.org.
  6. Hartmut, L. and Suciu, D. (2000). XMill: an efficient compressor for XML data. ACM Special Interest Group on Management of Data (SIGMOD) Record, 29(2):153- 164.
  7. HTTP (2012). HTTP RFC 2616, retrieved October 2012 from http://www.w3.org/protocols/rfc2616/ rfc2616.html.
  8. Leighton, G. and Barbosa, D. (2009). Optimizing XML compression. XML Database Symposium (XSym) 7809, pages 91-105, Berlin, Heidelberg. SpringerVerlag.
  9. Leighton, G., Müldner, T., and Diamond, J. (2005). TREECHOP: A Tree-based Query-able Compressor for XML. The Ninth Canadian Workshop on Information Theory, pages 115-118.
  10. Lin, Y., Zhang, Y., Li, Q., and Yang, J. (2005). Supporting efficient query processing on compressed XML files. Proceedings of the Symposium on Applied Computing (SAC) 7805, pages 660-665, New York, NY, USA. ACM.
  11. macbeth (2012). macbeth.xml, retrieved October 2012 from http://www.ibiblio.org/xml/examples/.
  12. Measurements (2012). Efficient XML Interchange Measurements Note, retrieved October 2012 from http:// www.w3.org/tr/exi-measurements/.
  13. Müldner, T., Corbin, T., Miziolek, J., and Fry, C. (2012a). Design and Implementation of an Online XML Compressor for Large XML Files. International Journal On Advances in Internet Technology, 5(3):115-118.
  14. Müldner, T., Fry, C., and Miziolek, J. (2012b). Online Internet Communication using an XML Compressor. In The Seventh International Conference on Internet and Web Applications and Services, pages 131-136, Stuttgart, Germany. International Academy, Research, and Industry Association. (IARIA).
  15. Müldner, T., Fry, C., Miziolek, J., and Durno, S. (2008). SXSAQCT and XSAQCT: XML Queryable Compressors. In S. Böttcher, M. Lohrey, S. M. and Rytter, W., editors, Structure-Based Compression of Complex Massive Data, number 08261 in Dagstuhl Seminar Proceedings.
  16. Müldner, T., Fry, C., Miziolek, J., and Durno, S. (2009). XSAQCT: XML queryable compressor. In Balisage: The Markup Conference 2009, Montreal, Canada.
  17. Ng, W., Lam, W.-Y., Wood, P., and Levene, N. (2006). XCQ: a queriable XML compression system. Knowledge and Information Systems, 10(4):421-452.
  18. Peintner, D. (2012). EXI: EXIficient retrieved October 2012, from http://exificient.sourceforge.net.
  19. Qureshi, M. H. and Samadzadeh, M. H. (2005). Determining the complexity of XML documents. International Conference on Information Technology: Coding and Computing(ITCC) 7805, pages 416-421, Washington, DC, USA. IEEE Computer Society.
  20. Ruellan, H. (2012). XML entropy study. In Balisage: The Markup Conference 2012, Montreal, Canada.
  21. Sakr, S. (2008). An experimental investigation of XML compression tools. The Computing Research Repository (CoRR), abs/0806.0075.
  22. SAX (2012). Simple API for XML (SAX), retrieved October 2012 from http://www.saxproject.org.
  23. Snyder, S. (2010). Efficient XML Iinterchange (EXI) compression and performance benefits: Development, implementation and evaluation, retrieved October 2012 from http://www.dtic.mil/cgi-bin/ gettrdoc?ad=ada518679. Master's thesis, Naval Postgraduate School, Monterey, California.
  24. soap (2012). SOAP Version 1.2 Part 1: Messaging Framework (Second Edition), retrieved October 2012 from http://www.w3.org/tr/soap12-part1/.
  25. Tolani, P. and Haritsa, J. (2002). XGRIND: a query-friendly XML compressor. International Conference on Data Engineering (ICDE)78 02, pages 225-234.
  26. Wratislavia (2012). Wratislavia XML corpus, retrieved October 2012 from http://www.ii.uni.wroc.pl/ inikep/ research/wratislavia/.
  27. XML (2012). Extensible markup language (XML) 1.0 (Fifth edition), retrieved October 2012 from http:// www.w3.org/tr/rec-xml/.
  28. xmlgen (2012). The benchmark data generator, retrieved October 2012 from http://www.xml-benchmark.org/ generator.html.
  29. XPath (2012). XML Path Language (XPath), Retrieved October 2012 from http://www.w3.org/TR/xpath/.
  30. XQuery (2012). XQuery 1.0: An XML Query Language (Second Edition), Retrieved October 2012 from http:// www.w3.org/TR/xquery/.
Download


Paper Citation


in Harvard Style

Corbin T., Müldner T. and Krzysztof Miziołek J. (2013). Pre-order Compression Schemes for XML in the Real Time Environment . In Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8565-54-9, pages 5-15. DOI: 10.5220/0004365100050015


in Bibtex Style

@conference{webist13,
author={Tyler Corbin and Tomasz Müldner and Jan Krzysztof Miziołek},
title={Pre-order Compression Schemes for XML in the Real Time Environment},
booktitle={Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2013},
pages={5-15},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004365100050015},
isbn={978-989-8565-54-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Pre-order Compression Schemes for XML in the Real Time Environment
SN - 978-989-8565-54-9
AU - Corbin T.
AU - Müldner T.
AU - Krzysztof Miziołek J.
PY - 2013
SP - 5
EP - 15
DO - 10.5220/0004365100050015