QUERYABLE SEPA MESSAGE COMPRESSION

BY XML SCHEMA SUBTRACTION

Stefan Böttcher, Rita Hartel and Christian Messinger

Univeristy of Paderborn, Computer Science, Fürstenallee 11, 33102 Paderborn, Germany

Keywords: SEPA-XML Message Compression, SEPA Data, Exchange, Efficient Query Processing on Compressed

SEPA Data.

Abstract: In order to standardize the electronic payments within and between the member states of the European Un-

ion, SEPA (Single Euro Payments Area) – an XML based standard format – was introduced. As the finan-

cial institutes have to store and process huge amounts of SEPA data each day, the verbose structure of XML

leads to a bottleneck. In this paper, we propose a compressed format for SEPA data that removes that data

from a SEPA document that is already defined by the given SEPA schema. The compressed format allows

all operations that have to be performed on SEPA data to be executed on the compressed data directly, i.e.,

without prior decompression. Even more, the queries being used in our evaluation can be processed on

compressed SEPA data with a speed that is comparable to ADSL2+, the fastest ADSL standard. In addition,

our tests show that the compressed format reduces the data size to 11% of the original SEPA messages on

average, i.e., it compresses SEPA data 3 times stronger than other compressors like gzip, bzip2 or XMill –

although these compressors do not allow the direct query processing of the compressed data.

1 INTRODUCTION

1.1 Motivation

In order to simplify and to standardize the inner-

European financial infrastructure, the European

Payment Council (EPC) applied the XML-based

standard SEPA (Single Euro Payments Area) that

defines a format for financial transactions amongst

the member states, i.e., amongst all members of the

European Union plus Liechtenstein, Iceland, Nor-

way, Monaco and Switzerland. For example, since

January 2008 it is possible to execute money trans-

fers in SEPA format and by the end of 2012, all SE-

PA members will have to replace all their national

payment systems by SEPA payment systems.

As the SEPA format specifies customer-to-bank

money transactions (“pain” messages) as well as

inter-bank money transactions (“pacs” messages)

each bank has to process and store huge amounts of

XML data.

1.2 Contributions

In this paper we present an approach to XML com-

pression – called XML Schema subtraction (XSDS)

– that allows compressing the XML structure of SE-

PA messages into a data format that is 9 times

smaller than the original message size. Furthermore,

XSDS allows to process the compressed messages in

a similar way as the original messages – e.g. by eva-

luating XPath queries – without prior decompres-

sion.

Thus, using XSDS-compressed SEPA messages

instead of original SEPA messages as internal for-

mat within a bank institute allows on the one hand to

save storage costs while archiving the data and on

the other hand to reduce the amount of data to be

processed.

1.3 Paper Organization

The remainder of this paper is organized as follows.

Section 2 describes the basic concept XSDS, i.e.

how schema information can be removed from an

XML document. Section 3 gives an overview of how

the compressed data can be processed directly, i.e.,

without prior decompression. Section 4 evaluates the

Böttcher S., Hartel R. and Messinger C. (2010).

QUERYABLE SEPA MESSAGE COMPRESSION BY XML SCHEMA SUBTRACTION.

In Proceedings of the 12th International Conference on Enterprise Information Systems - Software Agents and Internet Computing, pages 23-29

DOI: 10.5220/0002889000230029

 SciTePress

compression ratio of XSDS and query processing on

XSDS compressed data based on SEPA data. Sec-

tion 5 compares XSDS to related work. Finally, Sec-

tion 6 summarizes our contributions.

2 THE CONCEPT

2.1 The Basic Idea

SEPA is a standard that defines the format of elec-

tronic payment within the member states of the EU.

Each electronic payment is processed and stored in

form of an XML document, the format of which is

defined by a set of XML schemata (XSD) by SEPA.

Some parts of each SEPA file are strictly deter-

mined by the SEPA standard, e.g., that each pay-

ment message starts with the tag <Document> or

that the first two child nodes of the element

<GrpHdr> (group header) are the elements <MsgId>

(message ID) and <CreDtTm> (date and time of

message creation). Other parts are variable and vary

from document to document (e.g. whether the Deb-

tor (<Dbtr>) has a postal address (<PstAdr>) or not).

The main compression principle of XML schema

subtraction (XSDS) is the following. XSDS removes

all information that is strictly defined by the XML

schema information from a given XML document,

and, in the compressed format, XSDS encodes only

those parts of the XML document that can vary ac-

cording to the XML schema. The compression prin-

ciple of XSDS is similar to the compression prin-

ciples of XCQ (Ng et al., 2006) and DTD subtrac-

tion (Böttcher, Steinmetz, and Klein, 2007) which

are able to remove information provided by a DTD

from a given XML document. However, in contrast

to these approaches, XSDS removes information

given by an arbitrary XML schema, which is signifi-

cantly more complex than just considering DTDs.

The current paper reports about XSDS, but focuses

on the advantages of applying XSDS to SEPA as an

application standard which is significant for finan-

cial transactions in the EU member states.

2.2 This Paper’s Example

As the whole SEPA standard is too huge to be dis-

cussed within this paper, we only have a detailed

look on a small excerpt. Each payment document

contains (amongst others) a Debtor. The information

on the Debtor is stored as an element with label

<Dbtr> that contains a name (label <Nm>) and zero

or one ID (label <Id>) followed by zero or one post-

al address (label <PstAdr>). The ID contains either a

privat ID (label <PrvtId>) or an organization ID (la-

bel <OrgId>). The postal address consists of a city

(label <City>) and zero to two address lines (label

<AdrLine>). Figure 1 shows a graphical visualiza-

tion of the element <Dbtr> and its definition.

Figure 1: Excerpt of the SEPA schema.

2.3 Removing Schema Information

from the Document Structure in

XSDS

Within the structure of an XML document, i.e.,

within the element tags, there exist only three differ-

ent concepts that allow for variant parts within an

XML document defined by a given schema: First,

the XSD requires the choice of one out of different

given alternatives. Second the XSD element ‘all’

requires the occurrence of all elements declared by

children of the ‘all’ element, but they can occur in

any order. Third, when the XSD requires a repetition

of elements, this usually allows for a varying num-

ber of elements (including all its descendant ele-

ments).

The compression of these variant parts within an

XML document works as follows. Each compression

step assumes that we consider one current position

in the XML document at a time for which the XSD

allows variant parts. For each current position in the

XML document for which the XSD allows a choice,

we only store the alternative chosen at the current

position. (This requires log(n) bits, if there are n

possible alternatives). For each XSD element ‘all’,

we only encode the order of the elements required

by the children of the ‘all’ element in the XSD. Fi-

nally, for each repetition of elements starting at a

given position within an XML document, we only

store the number of occurrences of this element

found at the current position of the XML document.

(If the number of children per node is e.g. limited by

2^32 (MAXINT), this requires 1-5 bytes per repeti-

tion node, depending on the concrete number of re-

petitions).

ICEIS 2010 - 12th International Conference on Enterprise Information Systems

The compression of the non-variant parts of an XML

document, i.e. of the nodes that are fixed by the

XSD for the current position in the XML document,

is even much simpler. We can omit these nodes from

our compressed format, as these nodes can be recon-

structed for the given position from the XSD.

When applying the compression by removing

schema information from the SEPA excerpt shown

in Figure 1, we do the following for the variant parts

of a given XML document fragment matching this

schema excerpt. We store a single bit for the repeti-

tion nodes with label ‘0..1’ stating whether or not

there is an <Id> element and whether or not there is

an <PstAdr> element found in the current XML

document position. We store a single bit for the

choice node with label ‘|’ stating whether there is a

<PrvtId> or a <OrgId> element found in the current

XML document position. Finally, we store two bits

for the repetition node with label ‘0..2’ stating

whether there are 0, 1, or 2 elements <AdrLine>.

The remaining parts of an XML fragment for this

SEPA excerpt are fixed by the SEPA excerpt. This

includes all element names found in the XML frag-

ment. For example, not only the element name

<Dbtr> of the fragment is root is fixed, but also the

element name <Nm> of the first child of the <Dbtr>

element is fixed. Furthermore, it is not necessary to

include the element names for optional parts like

<Id> or <PstAdr> - when the option has been cho-

sen, the element name is fixed by the XSD. Similar-

ly, for repetitions, e.g. occurrences of the <AdrLine>

element, it is sufficient to store the number of repeti-

tions in the compressed data format. The element

name for each repeated element is fixed. Finally, the

<PrvtId> element must occur whenever the first al-

ternative of the choice is taken.

Therefore, we need (at most) 5 bits to store the

structure of each possible XML fragment matching

the structure of a <Dbtr>-element in the given SEPA

excerpt. Requiring only 5 bits is optimal, as there

exist 24 different ‘shapes’ of the <Dbtr>-element

and its descendants.

2.4 Compressing the Textual Data

Beneath the structure, a SEPA document contains

textual data. Whereas large parts of the structure are

defined by the schema, less information is given on

the textual data. Nevertheless, compression of tex-

tual data and query evaluation on compressed data

can be improved by grouping together textual data

that is included by the same parent elements.

For these purposes, for each parent element of

textual data, a single container is provided that stores

the textual data in document order. Storing the tex-

tual data in different containers provides two advan-

tages:

• When processing the SEPA documents, different

queries have to be evaluated, as e.g. whether the

payment creditor is on an embargo list. In many

cases, this can be answered by simply searching

in a few containers.

• As each container contains data of the same do-

main (e.g., names, zip codes …) compressing

each container separately from the other contain-

ers yields a stronger compression ratio than

compressing all the textual data of one document

together.

XSDS mainly differs between three different types

of textual data: String data, Integer data and data

enumerations that only allow a value of a given

enumeration of possible values (e.g., the address

type of a postal address of an invoicee can be one

value of the following list: ADDR, PBOX, HOME,

BIZZ, MLTO, DLVY).

In our implementation, each container that con-

tains String data is compressed via the generic text

compressor gzip. Each entry of an Integer container

is stored via a variable-length Integer encoding that

stores 1-5 bytes per Integer value depending on the

concrete value. Finally, each value stored within an

enumeration container is stored analogously to a

choice within the structure: If an enumeration con-

tainer allows n different values, each value is

represented by an encoding with size log(n) bits, that

defines the position of the current value within the

alternative.

3 QUERY PROCESSING

In contrast to other compressors like XMill (Liefke

and Suciu, 2000), gzip or bzip2 that are mainly used

for archiving, XSDS is able to evaluate queries on

the compressed data directly, i.e., without prior de-

compression.

This makes XSDS not only useful for archiving

data, but also for compressing data that is still

processed or exchanged among partners. For exam-

ple, for a bank institute, this means that the bank

institute compresses SEPA data that it receives, if it

is not already compressed. Then, the bank institute

can process the compressed SEPA data and archive

it without any need of decompression or recompres-

sion between multiple processing steps. Only if the

bank institute sends the data to a customer or another

institute that requires uncompressed SEPA data as

QUERYABLE SEPA MESSAGE COMPRESSION BY XML SCHEMA SUBTRACTION

input, a decompression into the uncompressed SEPA

format might be needed.

For query evaluation on compressed SEPA data,

we use the looking forward approach (Olteanu, et.

al., 2002) followed by a query rewriting system that

reduces queries to using only the axes child, descen-

dant-or-self, following-sibling and a self axis using

filters (citation omitted to avoid self reference). Ad-

ditionally, we have implemented a generic query

processing engine that further reduces XPath queries

to queries on the basic axes first-child, next-sibling,

and parent, and the operations getXMLNodeType

and getLabel on the data compressed by our XSDS

compression approach.

In order to determine first-child, next-sibling or

label of a current context node, we simultaneously

parse through the XML schema and the compressed

document. Similar to keeping track of the current

context node which describes the actual parsing po-

sition in the XML document, simultaneously parsing

keeps track of a current XML schema node which

describes the actual parsing position in the XML

schema. Whenever the XML schema allows for va-

riant parts, as there is a repetition, a choice or an

‘all’ element in the schema, the concrete choice se-

lected for this variant part is looked up in the com-

pressed data.

Concerning the textual data, only those contain-

ers have to be decompressed that contain data that is

addressed by the query: either data that is needed for

evaluating a predicate filter or data that is part of the

output result. Whenever the compressor of a data

container allows partial decompression of the com-

pressed data and value comparisons directly in the

compressed data, no needless decompression of tex-

tual data is performed at all. Only those text values

that are required for query processing are decom-

pressed and read.

4 EVALUATIONS

4.1 Compression Ratio

In order to evaluate our approach, we have collected

9 SEPA example files provided by different bank

institutes from the internet. We have compared our

approach to 3 different compression approaches:

First, Gzip – a generic text compressor based on

LZ77 and Huffman, second XMill (Liefke and Su-

ciu, 2000) – a non-queryable XML compressor, and

third, bzip2 – a generic text compressor based on

Burrows-Wheeler Transform (Burrows and Wheeler,

1994), Move-to-Front and Huffman.

The results of our evaluation are shown in Figure 2.

Although all other tested compressors do not allow

query evaluation on the compressed data, i.e., they

require a prior decompression when processing the

data, our approach additionally outperforms them in

terms of the reached compression ratio, i.e. the size

of compressed document divided by the size of orig-

inal document.

Figure 2: Compression ratio reached for SEPA documents.

While bzip2 reaches compression ratios from 8% to

51% (37% on average), gzip reaches compression

ratios from 10% to 49% (30% on average) and

XMill reaches compression ratios of 10% to 54%

(32%) on average, XSDS reaches compression ratios

of 5% to 15% (11% on average). In other words, on

average XSDS compresses 3 times stronger than all

other evaluated compressors. To the best of our

knowledge, XSDS is the compressor that reaches the

strongest compression on SEPA documents.

4.2 Query Performance

In order to test the query performance, we have gen-

erated a set of example SEPA documents with the

same structure, but increasing size (while the smal-

lest document, D12, has a size of 17 kB and contains

2 SEPA messages, the largest document, D1, has a

size of 193 MB and contains 25,000 SEPA messag-

es).

We have evaluated the following set of queries

that ask for debtor names, currency of the payment,

a complete SEPA message, or the amounts of the

payments on the documents:

Q1=/sepade/Msg/Document/pain.001.001.02

/PmtInf/Dbtr/Nm

Q2= //Dbtr/Nm

Q3= //InstdAmt[@Ccy]

Q4= /sepade/Msg

Q5=//Amt

ICEIS 2010 - 12th International Conference on Enterprise Information Systems

Figure 3: Throughput reached for document D2.

Figure 3 shows the results of our evaluation on doc-

ument D2 (77 MB, 10,000 SEPA messages). For this

document, our query evaluation on the compressed

document reaches an average throughput rate which

is equivalent to 26 Megabits/s regarding the uncom-

pressed original SEPA document. In other words,

our query evaluation is faster than the ADSL2+ -

currently the fastest ADSL standard – that reaches

download rates of 24 Mbit/s.

Figure 4: Query scalability.

In order to test the scalability of the query evaluation

on the XSDS compressed data, we have evaluated

the queries Q1 to Q5 on the documents D1 to D12.

As shown in Figure 4, query evaluation on XSDS

compressed data scales excellently, as the through-

put rate even increases with increasing the original

file size up to about 100 MB. One reason for the

increase of the throughput rate up to an original file

size of 100 MB is that the reached compression ratio

is stronger, the bigger the files are, and thus the data

volume to be processed during query evaluation

decreases in relation to the size of the original file.

For an original file size of more than 100 MB, the

average query evaluation throughput decreases at

least for query Q4 (that requires a total decompres-

sion of all text zipped text containers).The reason for

the final decrease of the query evaluation throughput

lies within the size of the text containers. The bigger

the zipped text containers get, the less efficient they

can be accessed. Because of that, we propose to split

the text containers into several text containers as

soon as a certain threshold (e.g. 10,000 SEPA mes-

sages) is exceeded. As each SEPA message

represents a single payment transaction, no queries

have to be evaluated that span several SEPA mes-

sages. Therefore, splitting a text container when its

size exceeds a given threshold will not lead to any

restriction.

5 RELATED WORK

There exist several approaches to XML structure

compression, which can be mainly divided into three

categories: encoding-based compressors, grammar-

based compressors, and schema-based compressors.

Many compressors do not generate compressed

data that supports evaluating queries, i.e., any query

processing on the compressed data needs prior de-

compression.

The encoding-based compressors allow for a

faster compression speed than the other ones, as only

local data has to be considered in the compression as

opposed to considering different sub-trees as in

grammar-based compressors.

The XMill algorithm (Liefke and Suciu, 2000) is

an example of the first category. The structure is

compressed, by assigning each tag name a unique

and short ID. Each end-tag is encoded by the symbol

‘/’. This approach does not allow querying the com-

pressed data.

XGrind (Tolani and Hartisa, 2002), XPRESS

(Min, Park, and Chung, 2003) and XQueC (Arion et

al., n.d.) are extensions of the XMill approach. Each

of these approaches compresses the tag information

using dictionaries and Huffman-encoding (Huffman,

1952) and replaces the end-tags by either a

‘/’symbol or by parentheses. All three approaches al-

low querying the compressed data. However, as all

of them result in a weaker compression than XMill,

XSDS compresses stronger than all of them.

The encoding-based compression approaches

(Bayardo et al., 2004), (Cheney, 2001), and (Girar-

dot and Sunderesan, 2000) use tokenization. (Che-

ney, 2001) replaces each attribute and element name

by a token, where each token is defined when it is

being used for the first time. (Bayardo et al., 2004)

and (Girardot and Sunderesan, 2000) use tokeniza-

tion as well, but they enrich the data by additional

information that allows for a fast navigation (e.g.,

number of children, pointer to next-sibling, exis-

tence of content and attributes). All three approaches

QUERYABLE SEPA MESSAGE COMPRESSION BY XML SCHEMA SUBTRACTION

use a reserved byte for encoding the end-tag of an

element. They all allow querying the compressed

data.

The encoding-based compression approach in

(Zhang, Kacholia, and Özsu, 2004) defines a suc-

cinct representation of XML that stores the start-tags

in form of tokens and the end-tag in form of a spe-

cial token (e.g. ‘)’). They enrich their compressed

XML representation by some additional index data

that allows a more efficient query evaluation. This

approach allows querying of compressed data.

XQzip (Cheng and Ng, 2004) and the approaches

presented in (Adiego, Navarro, and de la Fuente)

and (Buneman, Grohe, and Koch, 2003) belong to

grammar-based compression. They compress the

data structure of an XML document by combining

identical sub-trees. Afterwards, the data nodes are

attached to the leaf nodes, i.e., one leaf node may

point to several data nodes. The data is compressed

by an arbitrary compression approach. These ap-

proaches allow querying compressed data.

An extension of (Buneman, Grohe, and Koch,

2003) and (Cheng and Ng, 2004) is the BPLEX al-

go-rithm (Busatti, Lohrey, and Maneth, 2005). This

approach does not only combine identical sub-trees,

but recognizes similar patterns within the XML, and

therefore allows a higher degree of compression. It

allows querying of compressed data.

Schema-based compression comprises such ap-

proaches as XCQ (Ng et al., 2006), XAUST (Sub-

ramanian, and Shankar, 2005), Xenia (Werner et al.,

2006) and DTD subtraction (Böttcher, Steinmetz,

and Klein, 2007). They subtract the given schema

information from the structural information. Instead

of a complete XML structure stream or tree, they

only generate and output information not al-ready

contained in the schema information (e.g., the cho-

sen alternative for a choice-operator or the num-ber

of repetitions for a *-operator within the DTD).

These approaches are queryable and applicable to

XML streams, but they can only be used if schema

information is available.

XSDS follows the same basic idea to delete in-

formation which is redundant because of a given

schema. In contrast to XCQ, XAUST and DTD sub-

traction that can only remove schema information

given by a DTD, XSDS works on XML schema

which is significantly more complex than DTDs.

Furthermore, XSDS uses a counting schema for re-

petitions that compresses stronger than e.g. the ones

used in XCQ or Xenia.

The approach in (Ferragina et al., 2006) does not

belong to any of the three categories. It is based on

Burrows-Wheeler Transform (Burrows and Wheeler,

1994), i.e., the XML data is rearranged in such a

way that compression techniques such as gzip

achieve higher compression ratios. This approach

allows querying the compressed data only if it is

enriched with additional index information.

In comparison to all other approaches, XSDS is

the only approach that combines the following ad-

vantageous properties: XSDS removes XML data

nodes that are fixed by the given XML schema, it

encodes choices, repetitions, and ‘all’-groups in an

efficient manner, and it allows for efficient query

processing on the compressed XML data.

To the best of our knowledge, no other XML

compression technique combines such a compres-

sion performance for SEPA data with such query

processing speed on compressed data.

6 CONCLUSIONS

We have presented XSDS (XML schema subtrac-

tion) – an XML compressor that performs especially

well for electronic payment data in SEPA format.

XSDS removes all data that can be inferred from

the given schema information of the XML docu-

ment. Thereby, XSDS provides two major advantag-

es: First, XSDS generates a strongly compressed

document representation which may save costs and

energy by saving bandwidth for data transfer and by

saving main memory required to process data and by

saving secondary storage needed to archive com-

pressed XML data. Second, XSDS supports fast

query evaluation on the compressed document with-

out prior decompression.

Our experiments have shown that XSDS com-

presses SEPA messages down to a size of 11% of

the original SEPA document size on average, which

outperforms the other compressors, i.e. gzip, XMill

and bzip2, by a factor of 3. Furthermore, query eval-

uation directly on the compressed SEPA data is not

only possible, but in our experiments, query

processing reaches throughput rates that are higher

than those of ADSL2+. Therefore, we consider the

XSDS compression technique to be highly beneficial

in all SEPA applications for which the data volume

is a bottleneck.

REFERENCES

J. Adiego, G. Navarro, P. de la Fuente: Lempel-Ziv Com-

pression of Structured Text. Data Compression Confe-

rence 2004

ICEIS 2010 - 12th International Conference on Enterprise Information Systems

A. Arion, A. Bonifati, I. Manolescu, A. Pugliese. XQueC:

A Query-Conscious Compressed XML Database, to

appear in ACM Transactions on Internet Technology.

R. J. Bayardo, D. Gruhl, V. Josifovski, and J. Myllymaki.,

2004. An evaluation of binary xml encoding optimiza-

tions for fast stream based XML processing. In Proc.

of the 13th international conference on World Wide

Web.

S. Böttcher, R. Steinmetz, N. Klein, 2007. XML Index

Compression by DTD Subtraction. International Con-

ference on Enterprise Information Systems (ICEIS).

P. Buneman, M. Grohe, Ch. Koch, 2003. Path Queries on

Compressed XML. VLDB.

M. Burrows and D. Wheeler, 1994. A block sorting loss-

less data compression algorithm. Technical Report

124, Digital Equipment Corporation.

G. Busatto, M. Lohrey, and S. Maneth, 2005. Efficient

Mem¬ory Representation of XML Dokuments, DBPL.

K. S. Candan, W.-P. Hsiung, S. Chen, J. Tatemura, D.

Agrawal, 2006. AFilter: Adaptable XML Filtering

with Prefix-Caching and Suffix-Clustering. VLDB.

J. Cheney, 2001. Compressing XML with multiplexed

hierarchical models. In Proceedings of the 2001 IEEE

Data Compression Conference (DCC 2001).

J. Cheng, W. Ng: XQzip, 2004. Querying Compressed

XML Using Structural Indexing. EDBT.

P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrish-

nan, 2006. Compressing and Searching XML Data Via

Two Zips. In Proceedings of the Fifteenth Inter-

na¬tional World Wide Web Conference.

M. Girardot and N. Sundaresan. Millau, 2000. An Encod-

ing Format for Efficient Representation and Exchange

of XML over the Web. Proceedings of the 9th Interna-

tional WWW Conference.

D.A. Huffman, 1952. A method for the construction of

minimum-redundancy codes. In: Proc. of the I.R.E.

J. Ziv and A. Lempel: A Universal Algorithm for Sequen-

tial Data Compression, 1977. In IEEE Transactions on

In¬formation Theory, No. 3, Volume 23, 337-343

H. Liefke and D. Suciu, 2000. XMill: An Efficient Com-

pres¬sor for XML Data, Proc. of ACM SIGMOD.

J. K. Min, M. J. Park, C. W. Chung, 2003. XPRESS: A

Queriable Compression for XML Data. In Proceedings

of SIGMOD.

W. Ng, W. Y. Lam, P. T. Wood, M. Levene, 2006: XCQ:

A queriable XML compression system. Knowledge

and Information Systems.

D. Olteanu, H. Meuss, T. Furche, F. Bry, 2002: XPath:

Looking Forward. EDBT Workshops.

H. Subramanian, P. Shankar: Compressing XML Docu-

ments Using Recursive Finite State Automata. CIAA

2005

P. M. Tolani and J. R. Hartisa, 2002. XGRIND: A query-

friendly XML compressor. In Proc. ICDE.

Ch. Werner, C. Buschmann, Y. Brandt, S. Fischer: Com-

pressing SOAP Messages by using Pushdown Auto-

mata. ICWS 2006

N. Zhang, V. Kacholia, M. T. Özsu, 2004. A Succinct

Physical Storage Scheme for Efficient Evaluation of

Path Queries in XML. ICDE

QUERYABLE SEPA MESSAGE COMPRESSION BY XML SCHEMA SUBTRACTION