QUERYABLE SEPA MESSAGE COMPRESSION BY XML SCHEMA SUBTRACTION

Stefan Böttcher, Rita Hartel, Christian Messinger

2010

Abstract

In order to standardize the electronic payments within and between the member states of the European Union, SEPA (Single Euro Payments Area) – an XML based standard format – was introduced. As the financial institutes have to store and process huge amounts of SEPA data each day, the verbose structure of XML leads to a bottleneck. In this paper, we propose a compressed format for SEPA data that removes that data from a SEPA document that is already defined by the given SEPA schema. The compressed format allows all operations that have to be performed on SEPA data to be executed on the compressed data directly, i.e., without prior decompression. Even more, the queries being used in our evaluation can be processed on compressed SEPA data with a speed that is comparable to ADSL2+, the fastest ADSL standard. In addition, our tests show that the compressed format reduces the data size to 11% of the original SEPA messages on average, i.e., it compresses SEPA data 3 times stronger than other compressors like gzip, bzip2 or XMill – although these compressors do not allow the direct query processing of the compressed data.

References

  1. J. Adiego, G. Navarro, P. de la Fuente: Lempel-Ziv Compression of Structured Text. Data Compression Conference 2004
  2. R. J. Bayardo, D. Gruhl, V. Josifovski, and J. Myllymaki., 2004. An evaluation of binary xml encoding optimizations for fast stream based XML processing. In Proc. of the 13th international conference on World Wide Web.
  3. S. Böttcher, R. Steinmetz, N. Klein, 2007. XML Index Compression by DTD Subtraction. International Conference on Enterprise Information Systems (ICEIS).
  4. P. Buneman, M. Grohe, Ch. Koch, 2003. Path Queries on Compressed XML. VLDB.
  5. M. Burrows and D. Wheeler, 1994. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation.
  6. G. Busatto, M. Lohrey, and S. Maneth, 2005. Efficient Mem¬ory Representation of XML Dokuments, DBPL.
  7. K. S. Candan, W.-P. Hsiung, S. Chen, J. Tatemura, D. Agrawal, 2006. AFilter: Adaptable XML Filtering with Prefix-Caching and Suffix-Clustering. VLDB.
  8. J. Cheney, 2001. Compressing XML with multiplexed hierarchical models. In Proceedings of the 2001 IEEE Data Compression Conference (DCC 2001).
  9. J. Cheng, W. Ng: XQzip, 2004. Querying Compressed XML Using Structural Indexing. EDBT.
  10. P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan, 2006. Compressing and Searching XML Data Via Two Zips. In Proceedings of the Fifteenth Interna¬tional World Wide Web Conference.
  11. M. Girardot and N. Sundaresan. Millau, 2000. An Encoding Format for Efficient Representation and Exchange of XML over the Web. Proceedings of the 9th International WWW Conference.
  12. D.A. Huffman, 1952. A method for the construction of minimum-redundancy codes. In: Proc. of the I.R.E.
  13. J. Ziv and A. Lempel: A Universal Algorithm for Sequential Data Compression, 1977. In IEEE Transactions on In¬formation Theory, No. 3, Volume 23, 337-343
  14. H. Liefke and D. Suciu, 2000. XMill: An Efficient Compres¬sor for XML Data, Proc. of ACM SIGMOD.
  15. J. K. Min, M. J. Park, C. W. Chung, 2003. XPRESS: A Queriable Compression for XML Data. In Proceedings of SIGMOD.
  16. W. Ng, W. Y. Lam, P. T. Wood, M. Levene, 2006: XCQ: A queriable XML compression system. Knowledge and Information Systems.
  17. D. Olteanu, H. Meuss, T. Furche, F. Bry, 2002: XPath: Looking Forward. EDBT Workshops.
  18. H. Subramanian, P. Shankar: Compressing XML Documents Using Recursive Finite State Automata. CIAA 2005
  19. P. M. Tolani and J. R. Hartisa, 2002. XGRIND: A queryfriendly XML compressor. In Proc. ICDE.
  20. Ch. Werner, C. Buschmann, Y. Brandt, S. Fischer: Compressing SOAP Messages by using Pushdown Automata. ICWS 2006
  21. N. Zhang, V. Kacholia, M. T. Özsu, 2004. A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML. ICDE
Download


Paper Citation


in Harvard Style

Böttcher S., Hartel R. and Messinger C. (2010). QUERYABLE SEPA MESSAGE COMPRESSION BY XML SCHEMA SUBTRACTION . In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 4: ICEIS, ISBN 978-989-8425-07-2, pages 23-29. DOI: 10.5220/0002889000230029


in Bibtex Style

@conference{iceis10,
author={Stefan Böttcher and Rita Hartel and Christian Messinger},
title={QUERYABLE SEPA MESSAGE COMPRESSION BY XML SCHEMA SUBTRACTION},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 4: ICEIS,},
year={2010},
pages={23-29},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002889000230029},
isbn={978-989-8425-07-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 4: ICEIS,
TI - QUERYABLE SEPA MESSAGE COMPRESSION BY XML SCHEMA SUBTRACTION
SN - 978-989-8425-07-2
AU - Böttcher S.
AU - Hartel R.
AU - Messinger C.
PY - 2010
SP - 23
EP - 29
DO - 10.5220/0002889000230029