must provide the root node of every locality of the
working set (denoted by the node’s XPath expression)
as well as the breadth and depth of each locality. The
following request shows the markup for requesting
the working set highlighted in Figure 2:
<VDOMRequest xmlns="http://vdom.org/request/">
<LocalityRoot id="2">
<LeftBorder id="3"/>
<Breadth size="2"/>
</LocalityRoot>
<!-- Possibily more locality requests -->
</VDOMRequest>
The VDOM protocol also allows the reporting
of error conditions (e.g., when the client requests a
node with an invalid XPath expression). The VDOM
protocol must be mapped to some transport mecha-
nism. Since we use XML for the representation of
the VDOM PDUs, Web Services seem to be a natural
choice, although other transport mechanisms such as
CORBA or plain TCP-connections also are possible.
The working sets are identified depending on the
application at the client side and the XML docu-
ment at the server side. For now we only use some
static heuristics to determine the working sets but the
VDOM client can also make use of different parame-
ters to infer suitable working sets in order to minimize
communication overhead. The schema of an XML
document can be used to infer the working sets (e.g.,
the multiplicity of an element can give an indication
to the size of a working set). The application can also
be used to infer the size of working sets. E.g., differ-
ent working sets will be delivered sequentially to the
client if it prefers a breadth-first search or if it prefers
a depth-first search. The usage history can also be
used to help the decision of working sets.
There are two possible strategies for delivering
working sets from the VDOM server to the VDOM
client. The first consists in delivering working sets
only when requested by the VDOM client. Every time
the application reaches a portion of the tree that are
not locally available, the VDOM client automatically
forms a request to the VDOM server for a working
set containing the needed portion. The second strat-
egy consists in estimating the needs of the applica-
tion and delivering some potential working sets before
they are required. Among the two strategies, the first
one makes requests only when some new portions are
required by the application, the response time may in-
crease. The second one estimates the suitable working
set. It is more efficient if the estimate happens to be
mostly correct; while in the inverse case, pre-fetching
several working sets without using them may lead to
lower performance.
4 CONCLUSION AND OUTLOOK
In this paper, we introduced the VDOM architecture
that allows applications to transparently access large
XML documents through a DOM API. In the VDOM
architecture, an XML document is partitioned into
working sets that are transferred individually to the
client. A protocol has been proposed to specify the
request and response PDUs of working sets. DOM
API wrappers are defined to make the whole architec-
ture transparent to the user application. Server wrap-
pers have also been defined to be able to connect to
different kinds of XML document data sources. We
are working on a prototype implementation that uses
JDOM as the client side DOM API and MySQL on
the server side.
Apart from validating our ideas by running some
benchmarks, we plan to generalize some internal pro-
cesses of the VDOM architecture. In particular deter-
mining the size of the working set needs to be further
investigated. We currently only use static (compile-
time) heuristics to determine the size of the requested
working set. One obvious extension would be to ob-
serve the application’s behavior (i.e., the way the ap-
plication traverses the DOM tree) to adapt the size
of the working set at runtime. Other extensions of
the work presented in this paper are read/write access
to the server, as well as generalizing the client/server
model to a peer-to-peer model where the XML docu-
ment is distributed among different peers.
REFERENCES
JDOM (2004). Java DOM-API. http://www.jdom.org/.
Lowe, P. (1977). An approximating polynomial for the
computation of saturation vapor pressure. Journal of
Applied Meterology, 16:100–103.
San Francisco State University (2003). NetBEAMS - Net-
worked Bay Environmental Assessment Monitoring
System. http://www.netbeams.org/.
SAX Project (2004). Simple API for XML (SAX).
http://www.saxproject.org/.
Tanenbaum, A. and Woodhull, A. (2006). Operating Sys-
tems Design and Implementation. Prentice Hall, third
edition.
W3C (2004). Document Object Model (DOM).
http://www.w3.org/DOM/.
W3C (2006a). eXtensible Markup Language (XML).
http://www.w3.org/XML/.
W3C (2006b). XML Path Language 2.0.
http://www.w3.org/TR/xpath/.
Zambrano, B. and Puder, A. (2006). A flexible system
for real-time oceanographic monitoring. Extended ab-
stract, San Francisco State University.
PROVIDING SCALABLE ACCESS TO LARGE XML DOCUMENTS
183