the initial size of intervals in interval-based labelling
schemes which leave space for insertions. Whereas
constructing labels in multiplicative labelling schemes
which can easily cope with insertions and hybrid
labelling schemes are computationally expensive and
complex (Haw and Lee, 2011). For these reasons,
prefix based labelling approaches appear to be more
suitable for dynamic XML data (Sans and Laurent,
2008). Therefore, this research concerns prefix
labelling schemes where the labelling summarizes both
the position of the node in the tree and also maintains
the document order during updating.
The first prefix labelling scheme which
considered document order was introduced by
(Tatarinov et al., 2002) and is called Dewey labelling
scheme. It assigns integer labels based on the Dewey
decimal classification system for libraries. Although
this scheme is the most widely used (He, 2015) in
XML query processing since it easily identifies the
structural relationship between XML nodes, it does
not support node insertion.
Recently many prefix-based XML labelling
schemes have been proposed in the literature to support
node insertions amongst them the SCOOTER labelling
scheme (O’Connor and Roantree, 2012). Unlike
Dewey, SCOOTER labels are based on quaternary
strings and represent node order lexicographically
rather than numerically. However, like all dynamic
labelling schemes, SCOOTER suffers from what is
called the overflow problem in certain circumstances
(Ghaleb and Mohammed, 2013).
3 ENCODING METHODS
A key factor for all dynamic XML labelling schemes
is how their labels are physically encoded, decoded
and stored in a computer. In the logical representation
of prefix labelling schemes there is always a delimiter
“.” but this delimiter is encoded and stored separately
from the label value (Li et al., 2008). Therefore, the
logical interpretation of a label in the computer
immediately affects the label size on disk as well the
computational cost of encoding/decoding between
the logical and physical representations (O’Connor
and Roantree, 2013).
All existing dynamic labels storage schemes can
be categorised into four classes: length fields, control
tokens, separators, and prefix-free codes.
3.1 Length Field
Concept of a length field is a field to store the length
of a node label (as a fixed length bit number) directly
before the node label value. The length of labels can
vary widely depending on the node’s position within
the XML tree. Since XML trees are arbitrarily wide
and arbitrarily deep there restriction on the number of
nodes might be inserted later, as a consequence in a
dynamic XML the number of node insertions is
limited to the capacity of the fixed length field
yielding to the overflow problem.
3.2 Control Tokens
Control tokens are tokens used to indicate the position
of a label value within a specific-level interval and
these tokens are used to determine how the
subsequent bit sequence of the label value is
interpreted. An example of control tokens is UTF-8
(Yergeau, 2003) which is employed by the Dewey
labelling scheme to encode Dewey labels, where each
component of Dewey path is encoded in UTF-8 and
then concatenated together in the same path order
(Tatarinov et al., 2002). However, this encoding
method causes overflow when a code value goes
beyond 2
31
.
3.3 Separator
In prefix based labelling schemes a separator “.” is
usually encoded and stored separately from the label
itself. In a separator storage scheme a predefined bit
sequence is reserved as a delimiter and not a part of
the label value. For instance, the quaternary encoding
QED (Li and Ling, 2005) and SCOOTER (O’Connor
and Roantree, 2012) employed their own separator
storage scheme in which the digit “0” is used only for
separators and therefore the separator code size
remain constant no matter how big the label size
might become. This approach results in slow bit-by-
bit or byte-by-byte comparison operation during
decoding because of the process needed to recognize
bit “0” or “00” as a separator rather than the binary
representation of the code itself. Consequently, it
degrades query performance.
3.4 Prefix-free Codes
Prefix-free codes are based on the (Elias, 1975)
proposition that a prefix set S is said to be a prefix
code if and only if no member of S is the beginning
of another. A prefix-free code approach often requires
fewer bits to represent a label than a control token
scheme since the prefix-free codes can be adjusted
according to the number of members within a prefix
set (Härder et al., 2007). An example of a dynamic
labelling scheme that uses prefix-free codes is