In this paper, we aim at managing high-resolution
rasterized spatial objects, which often occur in
modern geographical information systems. High
resolutions yield a high accuracy for intersection
queries but result in high efforts in terms of storage
space which in turn leads to high I/O cost during
query and update operations. Particularly the
performance of I/O loaded join procedures is
primarily influenced by the size of the voxel sets, i.e.
it depends on the resolution of the grid dividing the
data space into disjoint voxels
1.
1.1 Preliminaries
In this paper, we introduce an efficient sort-merge
join variant which is built on a cost-based
decompositioning algorithm for high-resolution
rasterized objects yielding a high approximation
quality while preserving low redundancy. Our
approach does not assume the presence of pre-
existing spatial indices on the relations.
We start with two relations R and S, both containing
sets of tuples (id, mbr, link), where id denotes a
unique object identifier, mbr denotes the minimal
bounding rectangle conservatively approximating the
respective object and link refers to an external file
containing the complete voxel set of the rasterized
object (cf.Figure 1a). In this paper, we assume that
the voxel representation of the objects is accurate
enough to determine intersecting objects without any
further refinement step. In order to carry out the
intersection tests efficiently, we decompose the high-
resolution rasterized objects. We store the generated
approximations in auxiliary temporary relations (cf.
Figure1b) allowing us to reload certain
approximations on demand keeping the main-
memory footprint small.
To the best of our knowledge, there does not exist
any join algorithm which aims at managing complex
rasterized objects stored in large files (cf. Figure 1a).
In many application areas, e.g. GIS or CAD, only
coarse information like the minimal bounding boxes
of the elements are stored in a databases along with
an object identifier. The detailed object description
is often kept in one large external file or likewise in a
BLOB (binary large object) stored in the database.
In this paper, we will present an efficient version of
the sort merge join which is based on this input
format and uses an analytical cost-based
decompositioning approach for generating suitable
approximations for complex rasterized objects.
1
In this paper, we use the term voxel to denote a 2D
pixel indicating that our approach is also suitable for 3D
data.
1.2 Outline
The remainder of the paper is organized as follows:
Section presents a cost-based decompositioning
algorithm for generating approximations for high-
resolution objects. In Section , we introduce our new
efficient sort-merge join variant. In Section , we
present a detailed experimental evaluation
demonstrating the benefits of our approach. Finally,
in Section , we summarize our work, and conclude
the paper with a few remarks on future work.
2 COST-BASED
DECOMPOSITION OF
COMPLEX SPATIAL OBJECTS
In the following, the geometry of a spatial object is
assumed to be described by a sequence of voxels.
Definition 1 (rasterized objects)
Let O be the domain of all object identifiers and let
id ∈ O be an object identifier. Furthermore, let IN
d
be the domain of d-dimensional points. Then we call
a pair O
voxel
= (id, {v
1
, ..., v
n
}) a d -
dimensional rasterized object. We call each of the v
i
an object voxel, where i ∈ {1, .., n}.
A rasterized object (cf. Figure 1) consists of a set of
d-dimensional points, which can be naturally ordered
in the one-dimensional case. If d is greater than 1
such an ordering does not longer exist. By means of
space filling curves , all
multidimensional rasterized objects are mapped to a
set of integers. As a principal design goal, space
filling curves achieve good spatial clustering
properties since voxels in close spatial proximity are
encoded by contiguous integers which can be
grouped together to intervals.
Examples for space filling curves include the
lexicographic-, Z- or Hilbert-order (cf. Figure 2),
with the Hilbert-order generating the least intervals
per object (Faloutsos C. et Al., 1989) (Jagadish H.
V., 1990) but being also the most complex linear
lexicographic order Hilbert-orderZ-order
Figure 2. Examples of space-filling curves in the
two-dimensional case.
O 2
IN
d
×∈
ρ:IN
d
IN→
Figure 2: Examples of space-filling curves in the two-
dimensional case
EFFICIENT JOIN PROCESSING FOR COMPLEX RASTERIZED OBJECTS
21