each label is presented as an integer number and
delimiter “.”, the delimiter is encoded and stored
separately from the label value (Li et al. 2008). Each
node is labelled as a combination of its parent
label and postfix integer number (
). If is the
child of in XML tree then label of , label () is
concatenation of label of and which is presented
as label
, where is the parent of . For example,
if element label for is 2.5.3 then its
child label
will be 2.5.3.4. If an element label is 5.1.3.1 then its
parent label is 5.1.3, its first ancestor label is 5.1. The
advantage of this method is that for any element label,
we can easily extract node labels of its ancestors and
determine the relationship between nodes. However,
the drawback of Dewey scheme is not appropriate for
dynamic XML data; inserting a new sibling node into
XML tree using Dewey labelling scheme requires
relabelling all its right sibling nodes along with their
descendants and this has produced a large label size
at the cost of extra storage.
(Xu et al. 2009) have proposed Dynamic Dewey
encoding scheme (DDE), which is an update of
Dewey encoding scheme to transform the original
Dewey into a fully dynamic labelling scheme. The
advantage of the DDE is that, the label has different
length; starting with a byte for the first level and
increases in depth in relation to the level value. So
that can be appropriate for avoiding overflow
problems. In addition, it has the ability to avoid re-
labelling completely and support high query
performance. The main drawback is that, a large label
size. Especially when the depth increases, and
frequent insertions occur between two siblings by
applying the midpoint technique.
(Kobayashi et al. 2005) have proposed VLEI
encoding scheme. VLEI scheme is applied to XML
labeling. The data type is binary string. The VLEI
encoding has used number 9 for the identifier. For
example, when a child node is inserted, the label for
the node becomes the label of its parent node + 9 +
VLEI code. However, VLEI encoding used eight
bytes for the VLEI code. The VLEI main drawback is
that lead to overflow problem especially with skewed
insertion. is the new VLEI sequence code.
If
For example,
The authors in (Zhang et al. 2001) used Range
based labelling scheme which aims to determine the
structural relationships between nodes by using the
related containment information. Each label is
represented as a 3-tuple and has fixed-length. Interval
scheme does not result in large label sizes but lead to
overflow problems. start, end and depth are to
identify exactly the position of an element. start is
generated by a pre-order traversal of the document
trees exactly finds the occurrence position. While end
is the maximal start of elements in the sub-tree of
current element and depth gives additional informa-
tion to determine the parent-child relationship.
In summary, following from the related work,
the main drawback we have identified in the existing
labelling schemes is the growth of the label sizes in
response to that we present a comparison between
two schemes with focus on achieving labelling time
and memory size. Our work compares Range-based
encoding and Dewey encoding to ensure the
generation of short labels size and to control the bits
subsequent of the label value using UTF-8, UTF-16,
UTF-23 in terms of encoding or decoding time and to
achieve the fastest labelling time, ultimately
impacting the query performance.
3 COMPARISONS BETWEEN
DEWEY ENCODING AND THE
RANGE-BASED ENCODING
SCHEMES
In the Dynamic Dewey encoding scheme, each label
has a different length; starting with a byte for the first
level and it increases in relation to the level value. The
length of labels can vary widely depending on the
nodes position within the XML tree. However, prefix
labels naturally extend when XML data is updated
during frequent insertions, causing overflow
problems. However, in the Local Order Encoding
scheme, each node is assigned an integer number and
each label has a fixed length label; which is one byte
for each node and used UTF-8-character encoding
(Yergeau 2003). Furthermore, in Dewey encoding,
each label presented as combination of its parent label
and postfix integer number by delimiter “.”, the
delimiter is encoded and stored separately from the
label value (Li et al. 2008).
In contrast, in Range based labelling scheme,
each label presented as combination of the start, end,
depth values using “,” as a delimiter. Furthermore, in
Quaternary encoding QED (Li and Ling 2005) and
SCOOTER encoding (O’Connor and Roantree 2012)
Comparison between Range-based and Prefix Dewey Encoding
365