storage space and, in turn, document I/O. As a
consequence, log space and log I/O may be greatly
reduced, too. The combined use of a so-called path
synopsis (Goldman and Widom, 1997) storing only
path classes and SPLIDs as node labels makes it
possible to virtualize the entire structure part and to
reconstruct it or selected paths completely on
demand.
In an elementless layout of an XML document,
only its content nodes are stored in document order –
in the way as for the complete document. The stored
node format is of variable length and is composed of
entries of the form (SPLID, PCR, value) where PCR
(path class reference) refers to a node in the path
synopsis and enables the reconstruction of its entire
path to the root. Compared to the sample of
complete storage in Figure , the elementless XML
fragment exemplified in Figure saves enormous
space, but can, nevertheless, reconstructed without
any loss.
3.2 Logging and Recovery
Minimizing log I/O is important for transaction opti-
mization. Again, we could just consider the nodes of
an XML document as records and “blindly” apply
standard logging techniques, e.g., using
physiological logging as a salient method (Gray and
Reuter, 1993). Then, we had to write Undo/Redo log
entries for all modifications in the structure and
content part.
Instead, our XDBMS adheres to a three-level re-
covery providing hierarchically dependent DB con-
sistency qualities: block consistency, DML-operation
consistency, and transaction consistency. A very ex-
pensive method, a block-consistent state can be
guaranteed by reserving a block in each container
file for before-image logging. When propagating a
modified block back to disk, a copy of it is first
written to the before-image block. Because either the
new or the old block is available, recovery can
always rely on a block-consistent DB state. A more
optimistic attitude would not apply such an
overcautious method, but – if in extremely rare cases
a corrupted block is detected – enforce archive
recovery. Of course, such failure cases imply longer
processing delays, but substantial log-rated I/O is
saved in normal processing mode.
At the propagation level, the buffer manager ap-
plies entry logging for which each DML operation is
decomposed into so-called elementary operations
whose reaches are limited to a single block. Using
log sequence numbers (LSNs), the log entries can be
uniquely related to the blocks modified by these
operations and the attached LSNs enable the
decision whether or not the log entries have to be
applied to the related blocks during recovery. Hence,
restart can reconstruct in a kind of forward recovery
(repeating history) a DML-operation-consistent DB
state and for winner transactions even a transaction-
consistent DB state. Finally, the transaction manager
records the transaction boundaries and all inverse
DML operations (logical DML operation logging is
saving space and, thus, log I/O) to be prepared to
rollback all loser transactions thereby executing
DML operations on the reconstructed operation-
consistent DB state.
3.3 Various Optimizations
The combined use of entry and DML operation log-
ging already seems to require minimal log I/O in
normal situations. Therefore, we focussed on
operation- specific situations and improvement of
related components to gain further optimization
potential.
Reduced logging: For initially storing a docu-
ment, stepwise rollback is not needed and complete
rollback using entry logging is overly expensive.
Therefore, logging of the block numbers involved is
sufficient to empty the affected container pages.
Administration of Fix indicators: Blocks current-
ly accessed by transactions have to be pinned in their
buffer frames to avoid replacement. So-called Fix
marks set by the requesting transactions indicate for
the buffer manager that a block is not eligible for re-
placement. In the initial solution, these Fix marks
were kept in the lock table where checking
performed very poorly. Because search of
replacement candidates is an extremely frequent
task, a specialized structure recording the Fix state
of all frames was added to the buffer manager.
Improved lock table management: Reimplemen-
ting the lock manager avoided static lock table
allocation and large lock granules on the lock table
itself. Using a pool of predefined lock request blocks
1.3.7.3.3
1.3.7.5.3
. . .
1.3.3.9.3
. . .
. . .
1.3.9 1.3.11
1.3.3.1.3 1.3.3.9.3 1.3.11.1.3
. . .
PCR
content
(compr. not shown) SPLIDs
1.7.1.3
1.3.1.3
1.3.5.3
1.3.8.3.3.3
5
9 W.11
. . .
. . .
65.95
. . .
document
index
document container
1.3.7.1.3
1.3.3.1.3
1.3.3.5.3
1.3.3.8.3.3.3
7
9
9
13
4
4
4 1.3.3.2.1.3
1.3.3.7.3.3
1.3.3.7.5.3
. . .
W.11Stevens
TCP/IP
1
1994
Figure 5: An XML document without elements stored.
ICEIS 2008 - International Conference on Enterprise Information Systems
372