XQuery or XSLT or XPath version, total lengths and
offsets to each body section (Novoselsky, 2008).
The body contains sections for the byte-code itself
(which are referred as XVM virtual instructions),
strings, numbers (as strings), string-tables, types,
patterns, pattern tables, external function tables, …
etc. All byte-code references are in the form of
relative offsets (as a number of units) from the
beginning of the corresponding section or offsets
from the current address. Each virtual instruction has
op-code, operands and flags. The flags carry
information about the operand type, XPath step
modes, sequential type occurrence, … etc.
XCompiler implements both static and dynamic
(default) modules linking. If the static linking mode
is set, all dependent modules are compiled together
with the main module, all external references are
resolved and one composite executable byte-code
module is generated. The result byte-code contains
all dependent modules with no demarcation
boundary between them. If dynamic mode is set, all
modules are compiled separately and their byte-code
has an extended header containing tables for
imported and exported entities. The exported entities
are top-level function and variables while the
imported ones are all external references that refer to
entities in other modules plus type references. When
compiler compiles an import statement, it reads the
imported module header and adds all module export
entities to the symbol-table. If the compiler
encounters a reference to imported entity, it adds it
to the current module import-table. All external
references are resolved by name and module id,
quite like references in Java classes. In run-time,
when XVM executes an instruction that refers to
unresolved imported entity, it checks if the
corresponding module is loaded. If the module is not
loaded, the XVM loads it and allocates a table for
the module external references. As it was said
earlier, the external references are resolved lazily on
demand. The XVM dynamic module linking is
better suited for larger applications that use module
libraries. Also, it has a smaller run time memory
footprint. On the other hand the static linking has a
minor performance advantage over the dynamic one.
When XIPE works with a “heavy” XDR (usually
an XML DB) some of CISC instruction can be
executed directly by XDR host processor. To
accomplish this, XCompiler provides a Query Push-
Down Interface, which allows the host XDR to plug
in a host query optimizer. The host query optimizer
has a detailed knowledge of the underlying XML
data indexing so it can decides what part of the input
program will be compiled into CISC instructions. In
run-time the corresponding host iterator-executer
will be invoked by the XVM as an external function
and will perform the iterator tree execution. The
returned iterator data object does not materialize the
result sequence. Instead, it provides an iterator
interface (open-fetch-close) to its consumer.
2.2.2 XML Virtual Machine (XVM)
The data model all XML languages referred here use
the XQMD (Fernandez, 2007), which treats data
objects as sequence of items. The item type can be a
basic built-in type, such as number, string, dates etc,
or an XML node reference. The size of the sequence
is dynamic and so is the size of strings.
XQuery/UF/FT, XSLT and XPath are functional
languages. They don’t allow side effects and their
variables are single-assigned only. That means that a
single stack can be used to hold their intermediate
results during execution. This is why XVM uses
stacks to hold XQDM instances in order to minimize
dynamic memory allocations. Intermediate results
that have a fixed data size, such as number, dates
etc, are loaded into the system main-stack. The
content of the intermediate results with a dynamic
data size, such as strings and sequences, is stored
into complimentary content stacks with an object
descriptor in the main-stack. That way, since
intermediate results are transient, the majority of the
intermediate computation results do not require
additional memory allocation. Only when the stack
is full, XVM dynamically grows that stack with an
additional segment. Also, all single-assigned
variables, function and template parameters,
function and template call-frames reside in the main
stack. Logically, XVM doesn’t need more than one
stack but because run-time objects have a dynamic
size the loop variables can’t be hold in pre-allocated
slots in the main stack. This is why XVM uses a
second stack (context-stack) for loop and context
variables. Both, the main-stack and the context-stack
have the corresponding item, string and node stacks
for dynamically sized objects.
However, XQuery/UF/SE introduces some
sequential (non-functional) construct such multi-
assigned variables and DOM updates. That means,
that dynamically sized multi-assigned variable
values can’t be stored in the complimentary stacks
anymore. To hold such variable values XVM uses a
non-stack based dynamic memory (heap). Garbage
collector techniques are applied to free the memory
when the results are no longer needed.
XVM execution architecture is quite simple.
There is a set of functions, one for each instruction,
XIPE - An XML Integrated Processing Environment
89