variance in data sets that most affects compression.
In the CRM domain, the structure is non-repeating,
meaning that the data set consists of many different
elements without significant repetition. In the road
maintenance domain, the structure is repeating, mean-
ing that repetition of the same elements takes up a
significant portion of the data set. The reason for the
repeating structure in the road maintenance domain is
that the route of the maintenance crew is recorded as
a long list of coordinates.
Sample forms and data sets (purged of confiden-
tial information) were given for analysis to the Fuego
Core research team at the Helsinki Institute for Infor-
mation Technology, who specialize in efficient XML
processing for mobile devices. They concluded that
by taking advantage of the known document structure,
advanced bit-efficient XML representations can pro-
vide significant advantages when compared to generic
compression algorithms. The rest of this paper covers
the Fuego Core analysis in more detail.
We begin this paper with a more detailed problem
statement in Section 2. A brief overview of the main
features of the Xebu binary XML format that we used
is given in Section 3. The purpose of the section is
to introduce Xebu sufficiently well to allow follow-
ing the rest of this document, so no technical details
are provided. Section 4 considers the scenario on a
high level, looking at the features of Xebu to deter-
mine how to best use it in the scenario and in what
specific ways. Section 5 provides sample measure-
ments to determine the effectiveness of Xebu in the
problem domain. Section 6 lists some conclusions
on the feasibility based on the previous sections. Fi-
nally, Section 7 outlines some future work that could
be done on Xebu to make it a better fit for a variety of
applications, especially this one.
2 PROBLEM STATEMENT
XForms (W3C, 2006) is a useful language for spec-
ifying interactive XML-based applications for dis-
tributed computing. However, when considering
small mobile devices that could definitely benefit
from a standardized, truly user-interface-agnostic lan-
guage, the use of XML raises some questions. The
chief among these is XML’s verbosity, which causes
a large amount of network bandwidth to be used in
communication.
In an XForms application, there are two kinds of
documents. The form document follows the XForms
schema, and contains the data model and presentation
logic of the application. The data model also includes
the specification of what data the user is expected to
provide, and a document template for submitting the
data.
After a user has filled the requisite information on
the form, the application then needs to send the in-
formation to the server. This is done by filling the
user-provided information into the template provided
in the form, and sending the resulting XML document
over whatever protocol the application uses to com-
municate.
To mitigate the effect of XML’s verbosity, the ob-
vious solution is to apply some form of compression
to it before sending it over the network. Common ex-
isting protocols such as HTTP (Fielding et al., 1999)
already support indicating the use of generic compres-
sion like gzip (Deutsch, 1996). Since XML is text,
and highly-redundant text at that, generic compres-
sion algorithms usually perform acceptably well.
There are, however, two potential issues with
generic compression over XML when applied to
XForms applications. One is that compression takes
time, and when this gets added to the already signif-
icant time needed to process XML, the amount of
required processing may become prohibitive. The
larger problem is that the amount of data may in many
cases be quite small, so generic compression that is
based on redundancy in the data will not perform very
well.
Because of these, and other, reasons, there
have been several proposals for binary XML for-
mats (Pericas-Geertsen, 2003; W3C, 2003; W3C,
2005). Such a format is a replacement for XML that
is usually intended to be a more compact representa-
tion as well as more efficiently processable. Two re-
quirements for a general binary XML format are that
it be able to represent any XML and that it be able to
use available schema information (usually in the form
of XML Schema (W3C, 2001a; W3C, 2001b)) to im-
prove its compression ratio.
There are a large number of binary XML for-
mats already in use. Well-known general-purpose for-
mats include Fast Infoset (Sandoz et al., 2004) and
XBIS (Sosnoski, 2003), but these are capable of us-
ing only a limited amount of schema information,
i.e., they cannot take advantage of the structure in-
formation present in a schema. Better use of schema
is provided by formats such as ASN.1 (ITU, 2004),
BiM (Niedermeier et al., 2002), and Xenia (Werner
et al., 2006), but these have the drawback that all doc-
uments must be schema-valid and usually good re-
sults are achieved only when the schema describes ev-
erything very precisely. The EXI format currently be-
ing developed at the W3C (W3C, 2007) has the ability
to serialize any XML but also to use all the informa-
tion available in a schema to improve its compression
ICEIS 2008 - International Conference on Enterprise Information Systems
6