external balancing is presented in (Henrich, SIX and
Widmayer, 1990). Algorithms for maintaining a
suffix tree structure in secondary storage are
presented in (Clark and Munro, 1996). A clustering
algorithm generating optimal worst-case external
path length mapping of tree structures is described in
(Diwan, Rane, Seshadri and Sudarshan, 1996). An
efficient dynamic programming algorithm to pack
trees is presented in (Gil and Itai, 1999). Another
approach to pack trees in hierarchical memory using
approximate algorithms is proposed in (Bender,
Demaine and Farach-Colton, 2002).
6 CONCLUSION
This work described an efficient algorithm for
paging unbalanced binary trees. The algorithm can
be particularly applied for computational biology, in
which large trees constructed from biological
sequence, that cannot be balanced, are frequently
found. The algorithm obtains the best possible
allocation of nodes to pages when it is possible and
proposes an efficient policy for filling pages of non-
complete trees, based on the application of bin
packing to the fringe of the tree. The complexity of
the algorithm is given, which depends on the
packing algorithm’s complexity. The algorithm was
implemented and experimental results were
presented.
Considering the average number of accessed
pages per search, the algorithm produces a page
allocation up to 55% better than sequential
allocation, up to 64% better than breadth-first and
depth-first allocation, and results that are very close
to those obtained with B-trees. On the other hand,
considering the amount of unused space per page,
and the total number of pages required, the
algorithm presents an average page filling
percentage of 98.62%. The comparison shows that
the proposed approach is the only one that presents
an average number of page accesses for searching
close to the optimal and, at the same time, the page
filling percentage is also close to the optimal.
Future work includes investigating
experimentally the behavior of the algorithm
considering other approximation algorithms for
packing the fringe, and comparing those results to
those obtained with variations of B-trees. Another
topic left as future work is the evaluation of the
behavior of the algorithm considering dynamic data,
with frequent insertions and removals of nodes, as
well as the impact of concurrent data access. The
evaluation of the algorithm considering real data
instead of random data is also left as future work.
REFERENCES
Gonnet, G. H. ; Baeza-Yates, R. Handbook of Algorithms
and Data Structures: in Pascal and C. Addison-
Wesley, 1991, 424 p.
Cohen, J. Bioinformatics - An Introduction for Computer
Scientists. ACM Computing Surveys, v. 36, n. 2, p.
122-158, 2004.
Pedersen, C. N. S. Algorithms in Computational Biology.
PhD Dissertation, University of Aarhus, Denmark,
2000, 210 p.
Garey, M. R. ; Johnson, D. S. Computers and
Intractability: A Guide to the Theory of NP-
Completeness. W. H. Freeman and Company, 1979,
338 p.
Frakes, W. B. ; Baeza-Yates, R. Information Retrieval
Data Structures and Algorithms. Prentice Hall, 1992,
464 p.
Baeza-Yates, R. ; Ribeiro-Neto, B. Modern Information
Retrieval. Addison-Wesley, 1999, 513 p.
Vitter, J. S. External Memory Algorithms and Data
Structures: Dealing with Massive Data. ACM
Computing Surveys, v. 33, n. 2, p. 209-271, 2001.
Henrich, A. ; SIX, H.W. ; Widmayer, P. Paging Binary
Trees with External Balancing. Proceedings of the
15th International Workshop on Graph-theoretic
Concepts in Computer Science, p. 260-276,
Netherlands, 1990.
Clark, D. R. ; Munro, J. I. Efficient Suffix Trees on
Secondary Storage. Proceedings of the 7th Annual
ACM-SIAM Symposium on Discrete Algorithms, p.
383-391, Atlanta, 1996.
Diwan, A. A. ; Rane, S. ; Seshadri, S. ; Sudarshan, S.
Clustering Techniques for Minimizing External Path
Length. Proceedings of the 22nd VLDB Conference,
p. 342-353, India, 1996.
Gil, J. ; Itai, A. How to Pack Trees. Journal of Algorithms,
v. 32, n. 2, p. 108-132, 1999.
Bender, M. A. ; Demaine, E. D. ; Farach-Colton, M.
Efficient Tree Layout in a Multilevel Memory
Hierarchy. Proceedings of the 10th Annual European
Symposium on Algorithms, p. 165-173, Italy, 2002.
A SPACE-EFFICIENT ALGORITHM FOR PAGING UNBALANCED BINARY TREES
43