Authors:
Qiang Xue
1
;
Sakti Pramanik
1
;
Gang Qian
2
and
Qiang Zhu
3
Affiliations:
1
Michigan State University, United States
;
2
University of Central Oklahoma, United States
;
3
The University of Michigan, United States
Keyword(s):
Hybrid Digital tree, indexing, string databases, prefix searches, substring searches.
Related
Ontology
Subjects/Areas/Topics:
Coupling and Integrating Heterogeneous Data Sources
;
Databases and Information Systems Integration
;
Enterprise Information Systems
Abstract:
There is an increasing demand for efficient indexing techniques to support queries on large string databases. In this paper, a hybrid RAM/disk-based index structure, called the Hybrid Digital tree (HD-tree), is proposed. The HD-tree keeps internal nodes in the RAM to minimize the number of disk I/Os, while maintaining leaf nodes on the disk to maximize the capability of the tree for indexing large databases. Experimental results using real data have shown that the HD-tree outperformed the Prefix B-tree for prefix and substring searches. In particular, for distinctive random queries in the experiments, the average number of disk I/Os was reduced by a factor of two to three, while the running time was reduced in an order of magnitude.