SEMANTIC DATABASE ENGINE DESIGN

Naphtali Rishe, Armando Barreto, Maxim Chekmasov,

Dmitry Vasilevsky, Scott Graham, Sonal Sood

Florida International University, High Performance Database Research center, Miami, FL

Ouri Wolfson

University of Illinois at Chicago, Chicago, IL,

Keywords: Semantic binary data model, database management system.

Abstract: New types of data processing applications are no longer satisfied with the capabilities offered by the

relational data model. One example of this phenomenon is the growing use of the Internet as a source of

data. The data on the Internet is inherently non-relational. As a result, demand developed for database

management systems natively built on advanced data models. The semantic binary data model (Rishe,

1992), satisfies the criteria for the models required for today’s applications by providing the ability to build

rich schemas with arbitrarily flexible relationships between objects. In this paper, we discuss a new design

for a semantic database management system which is based on the semantic binary data model. Our

challenge was to design and implement a database engine which, while being native to the model, is

reasonably efficient on a wide variety of industrial applications, and which surpasses relational systems in

performance and flexibility on those applications that require non-relational modelling. Special attention is

given to multi-platform support by the semantic database engine.

1 INTRODUCTION

The Semantic Binary Database Engine is a multi-

threaded, multi-platform computer program. Multi-

threading allows it to utilize the full CPU power of

multi-processor computers. Typically, two different

approaches are used for multi-threaded program

implementation. One approach is to use one thread

per CPU, a queue of work items, and non-blocking

operating system calls. Another approach is to use

one thread per request with blocking operating

system calls. While the first approach allows higher

performance, the second approach is easier to

implement and requires less effort to port to

different platforms.

Multi-platform support allows the engine to be

easily portable and to run on different platforms,

such as Microsoft Windows, Sun Solaris, HP-UX,

and Linux. It makes it possible for a client on one

platform to communicate with a server running on a

different platform. A detailed discussion of multi-

platform support is provided in section 5.

The Semantic Binary Database Engine consists of

two major parts – the Database Engine Kernel and

the User-Level Engine Environment as shown in

Figure 1. The interface between these two parts is

the Kernel API, which provides access to the

Kernel’s functionality.

Figure 1: Semantic Binary Database Engine.

The name 'Kernel' does not imply that it runs as a

part of the operating system kernel. The Database

Kernel consists of tightly coupled modules that

provide essential functionality and high performance

execution. The User-Level Engine Environment is a

set of loosely-coupled modules (add-ons) that have

access to the Kernel API and that provide user

433

Rishe N., Barreto A., Chekmasov M., Vasilevsky D., Graham S., Sood S. and Wolfson O. (2005).

SEMANTIC DATABASE ENGINE DESIGN.

In Proceedings of the Seventh International Conference on Enterprise Information Systems, pages 433-436

DOI: 10.5220/0002535604330436

 SciTePress

programs with the service interfaces. The set of add-

on modules may vary depending on the DBMS

packaging. Examples of add-on modules are a

remote access module and database monitoring

tools.

2 DATABASE ENGINE KERNEL

API

The Database Engine Kernel API is an interface

between the Database Kernel and the User-Level

Engine Environment. It has the following properties.

The Kernel API is a set of functions intended to

provide the functionality of the semantic database

and to hide the implementation details. While the

internal implementation of the Database Kernel and

modules in the User-Level Environment can be done

in an object-oriented fashion, it is preferable to keep

the API as a flat set of functions for easier and more

efficient inter-process communication.

The interface is accessible, but is not intended to be

used by the user application programs. The interface

was designed to support efficient execution rather

than ease of use. Add-on modules in the User-level

Engine Environment provide easy to use interfaces

for user programs. This separation provides better

reliability and stability than an alternative design

with all the software modules fitted into the database

kernel since the system can more easily survive

faults in modules of the database engine that are

running outside the Kernel. It also makes the Kernel

code smaller, less prone to errors, and easier to

debug and maintain. It is important for the database

Kernel not to crash even if some ill-behaved

programs misuse the Kernel API.

Since the Database Engine Kernel is a separate

process, the Kernel API is an interface between

processes running on the same computer. The

functionality of remote access is not provided at this

level, but is instead provided by the Remote Access

Server, which is one of the add-ons in the User-

Level Engine Environment. The Remote Access

Server can be added or removed from the system

depending on the expected functionality of the

system. For example, it may not be needed for

embedded applications.

The Kernel API handles data in terms of facts that

are not yet encoded for any storage structure. This

allows users of the interface to see the database in its

semantic representation. At the same time, the

storage subsystems in the kernel may employ any

kind of encoding and data structures to physically

store the data.

3 USER-LEVEL ENGINE

ENVIRONMENT

The User-Level Engine Environment is a set of

modules running as one or several processes on the

same computer where the Kernel runs. All these

modules (except the first three below) are

independent and can be designed and implemented

separately. It is important to separate them from the

Kernel modules to ensure stability and reliability of

the database engine.

The Local Semantic API module provides a

conventional semantic API for database

applications. This ensures compatibility with old

programs that use previous versions of the semantic

binary database engine. The semantic API was

designed with the assumption that the engine and the

user program would run in the same address space

(it uses pointers to internal memory structures).

While this is faster than remote access, it is not

secure since the database engine is not protected

from the ill-behaved user programs. This is the

reason for the current design to employ a Kernel API

to protect internals of the database and to run a

Local Semantic API module in the User-level

Engine Environment.

The complex query language for semantic databases,

called AVDV, is described in (Vaschillo, 2000). The

Complex Query Executor analyzes an AVDV query

graph, builds the execution plan, then performs

queries and obtains results according to the

execution plan. Several other components in the

User-Level Engine Environment, such as Web

Query Tool and Semantic SQL Server require

execution of complex queries and rely on this add-

on module.

The Semantic SQL Server module allows users to

query databases by means of semantic SQL (Rishe,

1999) and standard database protocols such as

ODBC. Semantic SQL Server accepts a SQL

statement and parses it. Parsing semantic SQL

statements involves discovering the relevant sub-tree

of the semantic schema given an unambiguous path

postfix as described in (Rishe et.al., 2000). The

corresponding query AVDV graph can then be

constructed. When the AVDV query graph is

constructed, it is passed to the Complex Query

Executor module for optimization and execution.

The result of execution is returned to the user.

The Export/Import module provides export and

import of a database into interchangeable standard

formats. Some of the common formats are Comma

Separated Values (CSV) and Extensible Markup

Language (XML). The module also provides export

to the proprietary native Semantic Definition

ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

434

Language (SDL) and to the XML-based Semantic

Definition Language (XSDL).

4 DATABASE ENGINE KERNEL

The Database Engine Kernel is the main component

of the Database Engine. The Kernel consists of

several modules that are tightly integrated to provide

maximum efficiency; all the modules run in one

address space. The modules have well defined

interfaces to communicate with each other, and the

internal details of each module’s implementation can

be designed independently. Figure 2 shows the basic

data flow between the modules.

Figure 2: Database Engine Kernel.

The Integrity Constraint Module is used to verify the

integrity constraints. The system reports updates to

the Integrity Constraint module. For some updates,

the system makes an immediate decision that the

operation should be rejected. Information about

other updates is stored and a decision is deferred

until commit time. For example, a decision on a

cardinality constraint can be verified right away,

while a decision on a totality constraint should be

deferred until transaction commit time.

The Transaction Coordinator carries out the

transaction lifecycle. The system supports two types

of transactions – transactions with versioning and

without versioning. The typical execution of a

transaction with versioning is shown in Figure 3.

The Concurrency Control module manages

transactions and ensures the consistent state of the

database. All requests for updates and queries in the

system are communicated to the module in the form

of lock requests and releases. The Concurrency

Control module stores enough information to decide

whether to grant or to delay certain lock request. The

module also verifies all the locks at the end of a

transaction and decides whether the transaction is

allowed to commit. Concurrency Control supports

several types of transactions and makes the final

commit/rollback decision according to the

transaction type.

Figure 3: Transaction lifecycle.

All transaction updates are communicated to the

Transaction Log module. This module provides

storage of the local transaction log on a per-

transaction basis. The transaction log can be

retrieved later on to be run against the database. This

module also maintains the global database

transaction log and appends the local transaction log

to the global database transaction log at transaction

commit time.

Storage Codecs perform encoding and decoding

between storage representation and semantic

representation. Fact, Record and Bitmap

representation can be used in appropriate situations.

When the components of the storage item are sent to

the module, it composes the storage item that will be

placed in the storage subsystem. When the storage

item is retrieved from the storage subsystem, the

module decodes it to semantic components. For

example, the fact representation codec takes two

object IDs and a relation ID and combines them into

a binary string when the information is to be stored.

On retrieval it takes the binary string and parses it

into two Object IDs and the relation ID.

The Storage Subsystem is the module which

stores/retrieves information to/from files composed

of fixed-size blocks. The information is already

encoded for storage by the Storage Codec module. A

B-Tree structure is the main storage subsystem used

in fact and record representation.

The Multi-version Disk Cache improves

performance of the disk subsystem by keeping the

content of disk blocks in memory and saving disk

I/O operations. If a block that is already in the cache

is requested, it is not retrieved from the disk and the

cached copy is used instead. If subsequent

modifications are made to the same block, the cache

keeps the block in memory and saves time on disk

write operations by waiting until the last

SEMANTIC DATABASE ENGINE DESIGN

435

modification is made. In addition to these standard

write-back cache functions, the Multi-version Disk

Cache provides functionality specific to the database

engine. It allows the storage subsystems to lock

blocks in memory. Whenever a block is needed by

the storage subsystem for certain operations, such as

binary search within the block, the storage

subsystem does not create its own copy of the block.

Instead, it requests the disk cache to lock this block

in memory for the duration of operation and uses the

same copy of the block. Sharing of the block copy is

possible since the modules in the Kernel, including

the disk cache, run in the same address space. This

type of sharing eliminates the necessity of block

copy operations. It is important that the lock is held

for only a short period of time, since all the locked

blocks have to be present in memory. If locks are

held for a long time, the system may run out of

memory.

The disk cache provides support for block

versioning. A block ID in the cache is two-

dimensional: it is composed of a sequential block

number in the database file and a sequential database

version (B, V). The transaction coordinator increases

the database version with every read-write

transaction and assigns the version to the

transaction. Whenever the transaction requests block

B for modification, the block ID is composed of B

and the database version V assigned to the

transaction. If the block does not already exist, a

new copy of the block is created. The new copy is

based on the block with the same block number B

and the database version that was current at the

beginning of the transaction. The old block is

retained until a transaction in the system requests it.

The Binary Server module hides the file structure of

the database and provides the user with flexible

storage options. The module is used to store data of

the database engine’s files with fixed size blocks to

the disk. The Binary Server implements a simple file

system that can provide this functionality by using

one disk file or several disk files or even raw disks

not formatted by the operating system. The Binary

Server can also distribute the database across

multiple computers. All storage subsystems share

the same space of disk blocks provided by the

Binary Server. Whenever a block is freed from a

storage structure, it goes into the common pool of

free blocks. This allows for better management of

space allocated to the database.

5 CONCLUSION

Semantic databases have many advantages over

relational databases that will allow them to grow in

popularity as the complexity of data increases.

However, the semantic database engine should be

implemented in a way that is not prohibitively

expensive on operations typical to relational

databases. We have shown that a number of

reasonable tradeoffs are possible in the design of the

semantic database engine that can make it

competitive on applications that are widely-used

today.

This work shows how a framework that allows

investigation of advantages and disadvantages of

different approaches in each of the database engine

modules can be built. A number of conclusions have

been made on the feasibility of particular choices

based on theoretical considerations, as well as

practical experience implementing various parts of

this design in several combinations.

Innovative technologies missing in the previous

semantic database theory and prototype

implementations have been designed and discussed.

These technologies are expected to overcome some

of the shortcomings that have kept semantic

databases from being widely accepted in the field.

ACKNOWLEDGEMENTS

This material is based on work supported by the

National Science Foundation under Grants No.

HRD-0317692, EIA-0320956, EIA-0220562, CNS-

0426125, IIS-0326284, CCF-0330342, IIS-0086144,

and IIS-0209190.

REFERENCES

Rishe, N., 1992. Database Design: the semantic modeling

approach, McGraw-Hill. 528 pp.

Rishe, N., 1994. Semantic Schema Design Language,

available at request, http://hpdrc.cs.fiu.edu .

Vaschillo, A. 2000. A Semantic Paradigm for Intelligent

Data Access. Ph.D. Dissertation, Florida International

University, 143 pp.

Rishe, N., 1999. Semantic SQL, available on request at

http://hpdrc.cs.fiu.edu .

Rishe, N., et.al. 2000. SemanticAccess: Semantic Interface

for Querying Databases. In Proceeding of the VLDB

Conference, pp. 591-594, September 10-14, 2000,

Cairo, Egypt.

ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

436