Consecutive bytes are stored in blocks. Typical
block sizes vary from 4 kilobytes to 64 kilobytes.
Blocks can be stored sequentially or organized and
referenced by a control structure. Logically, a
bitmap represents a set of objects as a set
membership vector. For every object in the database,
the corresponding Boolean element of the vector is
True if the object belongs to the set. For objects
that do not belong to the set, the corresponding
elements are False. Boolean values of the vector
are represented by bits.
Consider the following method for storing
category membership. Membership information for
each category C is represented by one bitmap, where
each bit corresponds to an object ID. Bit N in the
array is 1 if an object with the object ID N belongs
to category C, and 0 otherwise. Thus, the length of
the array in bits will be equal to the number of
objects in the database and access to this information
will be fast since it is equivalent to direct access to
an element of an array.
The size of the bitmap is proportional to the
domain size. Thus, one bit is allocated in a bitmap
for every abstract object regardless of whether it
belongs to a represented set or not. For example,
consider a set of objects that have certain Boolean
attribute values set to True. The bitmap requires
just one bit to represent each fact; it would otherwise
be represented in fact storage as a string consisting
of object ID, relation ID, and a Boolean value, thus
using at least 10 bytes of storage. Being stored in a
B-tree, this string would also require some space to
maintain the B-Tree block structure. Thus, the
bitmap is about 80 times more efficient in terms of
space for total Boolean attributes. Access to facts in
a B-Tree requires a B-Tree search and unpacking of
facts, which are both non-trivial and time-consuming
operations compared to retrieval of values from a
bitmap, which, in most cases, is a simple direct
access operation.
Operations to access individual bits are simple.
Suppose m is the first object ID in a block and we
need to access a bit that corresponds to the object
with object ID n. The byte number within the block
that the bit belongs to is i=(m-n)/8. Bit number
within the byte is j=(m-n)MOD8, where MOD is the
modulo operation (remainder of a division).
To read the bit, the following formula is used:
B[i]&(1<<j), where B represents block as a byte
array. Result is 0 if bit is 0 and non-zero if bit is 1.
Use B[i]|=(1<<j) to set the bit to 1 and
B[i]&=~(1<<j) to reset the bit to 0
. To set the
bit to value x, use B[i]=(B[i]&~(1<<j))|(x
<<j).
Bitmaps have several attractive properties:
1. Bitmaps represent a set of objects in a compact
way since only one bit is used per object. Under
favourable conditions, this can be further
improved.
2. Operations on bitmaps are fast since only one
CPU instruction is needed to act on several
objects simultaneously.
3. A bitmap is a simple structure and the overhead
for accessing and maintaining it is small.
5 CONCLUSION
While designing a semantic database, the general
approach is to be flexible in selecting storage types.
Fact, record, and bitmap storage might be utilized
simultaneously for different purposes. Record
storage might be used for those attributes it suits
best, whereas fact storage might be used for other
attributes. Category membership might be stored as
bitmaps. With fact storage, it should be up to the
user to select the best storage type for various types
of data.
ACKNOWLEDGEMENTS
This material is based on work supported by the
National Science Foundation under Grants No.
HRD-0317692, EIA-0320956, EIA-0220562, CNS-
0426125, IIS-0326284, CCF-0330342, IIS-0086144,
and IIS-0209190.
REFERENCES
Rishe, N., 1992. Database Design: the Semantic Modeling
Approach, McGraw-Hill. 528 pp.
TPC-D, 1998. Transaction Processing Performance
Council. TPC Benchmark D, Standard Specification
Revision 2.1.
England, K., 2001. Microsoft SQL Server 2000
Performance Optimization and Tuning Handbook,
Digital Press; 1st edition, 320 pp.
Winter, R., 1999. Indexing Goes a New Direction,
Intelligent Enterprise, 2(2), pp. 70-73.
Jakobsson, H., 1997. Bitmap Indexing in Oracle Data
Warehousing. Database seminar at Stanford
University. http://www-db.stanford.edu/dbseminar/
Archive/FallY97/slides/oracle.
ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
440