2 BACKGROUND
2.1 B+-tree
A database management system (Garcia-Molina
et al., 2000) is a set of programs that allow users to
define the type of data they want to store and manages
that data by providing efficient retrieval. Efficient re-
trieval is done by using appropriate data structures
such as B-tree, B+-tree or hash files as indexes within
the database. An index is “a data structure that allows
for random access to arbitrary data within a field, or
a set of fields. In particular, an index lets us find a
record without having to look at more than a small
fraction of all possible records.” (Garcia-Molina et al.,
2000) From this definition, we can see that an index
(Bayer and McCreight, 1972) consists of “index ele-
ments which are pairs (x, a) of fixed size physically
adjacent data items, namely a key x and some asso-
ciated information a. The key x identifies a unique
element in the index, and the associated information
is typically a pointer to a record or a collection of
records in a random access file.” All indexes are based
on the same basic concept — Key and Reference to
Data.
The B-tree and its variant B+-tree are efficient data
structures that are widely used as tree-based multi-
level indexes in database systems. They had already
become so widely used (Comer, 1979) that “the B-
tree is, de facto, the standard organization for indexes
in a database system”. However, B+-trees can support
true indexed sequential access as virtual trees, and
possibly compress separators and potentially produce
an even shallower tree than B-trees (Folk and Zoel-
lick, 1992). A B-tree (Bayer and McCreight, 1972) is
a multi-way search tree designed to solve how to ac-
cess and maintain efficiently an index that is too large
to hold in memory, so the index itself must be exter-
nal and is organized in pages that are blocks of infor-
mation transferred between main memory and backup
storage like hard disks. The power of B-trees lies in
the following significant advantages:
1. Storage utilization is guaranteed to be at least 50%
and should be considerably better in the average
(Bayer and McCreight, 1972).
2. The balance is maintained dynamically at a rela-
tively low cost. No overly long branches exist, and
random insertions and deletions are accommodated
to maintain balance (Folk and Zoellick, 1992).
The B+-tree retains the search and insertion effi-
ciencies of the B-tree but increases the efficiency of
searching the next record in the tree from O(log N) to
O(1).
The B+-tree supports equality queries and range
queries efficiently. Range queries use the forward
or backward pointers in the leaf nodes to get all the
records in the requested range.
2.2 The STL Style
The Standard Template Library (STL) (Stepanov and
Lee, 1995) is a template-based C++ library of generic
data structures and algorithms that work together in
an efficient and flexible fashion. “The Standard Tem-
plate Library provides a set of well-structured generic
C++ components that work together in a seamless
way. Special care has been taken to ensure that all the
template algorithms work not only on the data struc-
ture in the library, but also on built-in C++ data struc-
tures.”
There are six components in the STL organization.
Three components, in particular, can be considered
the core components of the library: template-based
container classes, iterators and generic algorithms
(template functions). The remaining three compo-
nents of the STL are also fundamental to the library
and contribute to its flexibility and portability: alloca-
tors, adapters and functors (function objects).
We adopt the STL style to design and implement
B+-tree index because the STL supports good pro-
gramming practices and addresses several problems
with previous C++ container libraries in a new and
innovative way. There are a number of advantages to
using the STL:
1. “Standard” and “template”: The STL is made
up of “standard components”. Each of them has a
clear standard interface and a well-defined function-
ality. This makes all the components easy to under-
stand and to reuse. Also new components may be
added with the same look as standard ones. Program-
ming with “templates” is a compiler-supported mech-
anism to take generic data structures, such as arrays
and lists, and generic algorithms, such as sort and bi-
nary search, and make them independent of the type
of data being manipulated.
2. Reuse: The STL supports the generic programming
paradigm, whose goal is to design algorithms so they
are fundamentally independent from the types they
act upon. The STL provides reusable components
to achieve code reuse based on templates, rather than
class inheritance. A large number of components al-
ready exist with a complete implementation on hand.
This dramatically reduces the time needed for the im-
plementation for many large systems where a great
percentage of the code is simply imported from the
STL.
3. Smaller source codes: The STL is easy-to-learn
because the library is quite small owing to the high
degree of generality.
4. Flexibility: The use of generic algorithms allows
algorithms to be applied to many different structures.
Furthermore, the STL’s generic algorithms also work
ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
164