Table 9: Raster Access Table.
It contains the operation name, the schema construct,
the number of accesses, and the access type (Read,
Write).
Although these tables provide an overview of the
database performances, we can derive a unique unit
of measurement to express the cost of operations in
terms of accesses to the database, independently
from the type of data on which they are performed.
To do this, we have performed massive experiments
on real applications involving large heterogeneous
spatial datasets. The testing environment is based on
PostgreSQL 8.1 with GIS extension POSTGIS, Dual
Intel Xeon system with 2G of ram and Windows XP
pro.
We have executed several operations involving
different spatial data types, and have measured
several performance indicators, such as execution
time, RAM usage, and volume of data exchanged
with mass storage devices. The latter is the indicator
on which we have based the parameters for our
estimation model, since it guarantees invariance
with respect to the hardware used, and it is not
affected by the noise introduced by the measurement
software itself.
Traditional estimation methods for alphanumeric
databases only consider the different complexity of
read with respect to write accesses, since they are all
alphanumeric. In our case, other than differentiating
between read and write accesses, we have focused
on the following main types of data: alphanumeric,
points, lines, and polygons. The notation used to
express the different types of accesses and the types
of data on which they are performed follows:
RAN = Access cost to read an alphanumeric data
WAN = Access cost to write an alphanumeric data
RPT = Access cost to read a point
WPT = Access cost to write a point
RLN = Access cost to read a line
RLN = Access cost to read a line
RPL = Access cost to read a polygon
WPL = Access cost to write a polygon
RRS = Access cost to read a raster image
WRS = Access cost to write a raster image
We observed that access performances of the
spatial data point are similar to those of average
size alphanumeric data. Moreover, we observed
that access performances of lines and polygons
grow linearly with the number of vertices. We also
noticed that RLN entails higher costs than RPL
due to the different storage methods that the Open
Geospatial Consortium defines for these two data
types (OGC, 2007). Since we do not have info on
the expected number of vertices for these types of
data during the conceptual design phase, we
needed to estimate the cost of access operations on
them independently from this parameter. To this
end, we observed that varying the number of
vertices from few up to 1000, for both types of
geometries, entailed a volume of data exchanged
with mass storage devices varying from 0,5 to 2
MB. Thus, since the number of vertices of most
real geometries falls in this range, we can assume
that the average number of MB exchanged is O(1).
In conclusion, we have derived the following
relationships:
RPT ≅ RAN; WPT ≅ WAN ≅ 7*RPT;
RPL ≅ 70*RPT; RLN ≅140*RPT;
WLN ≅ WPL ≅ 350*WPT
We have assigned the value 1 to RAN and RPT,
yielding the following relationships:
RPT ≅ RAN ≅ 1; WPT ≅ WAN ≅ 7
RPL ≅ 70; RLN ≅ 140;
WLN ≅ WPL ≅ 350
As for raster images, finding a proper
estimation at conceptual level is more complex a
task, because we do not know yet parameters like
resolution, compression, and bit depth, which
would be useful to estimate the bytes to be
exchanged with mass storage to manipulate them.
Moreover, it is more difficult to characterize the
types of operations and their access costs before
lower level design stages. Nevertheless, in our
experiments we have noticed that for images with
resolution ranging from 48.000 pixels to 2 Mega
pixels, bit rate from 16 to 32 bits, stored through
well known compressed formats they occupy from
0,5 to 10 MB of disk space. Thus, the designer can
opportunely tune the estimated cost of operations
on raster images depending on the knowledge s/he
has about the images to handle at this stage of the
design process. From what said above, for images
without a particularly big size we can estimate an
average number of bytes exchanged which is twice
than that of vector images.
4 CASE STUDY
The ER schema shown in figure 3 represents the
conceptual schema for a portion of a spatial database
EARLY PERFORMANCE ANALYSIS IN THE DESIGN OF SPATIAL DATABASES
175