dimensional variants of this heuristic include the well-
known R-tree, the packed R-tree (Roussopoulos and
Leifker, 1985), the R
∗
-tree (Beckmann et al., 1990),
the R
+
-tree (Sellis et al., 1987), etc. Further heuristics
of computing tight fitting bounding boxes are based
on simulated annealing, or other optimization tech-
niques, for example Powell’s quadratic convergent
methods (Lahanas et al., 2000).
A frequently used heuristic for computing a
bounding box of a set of points is based on princi-
pal component analysis. The principal components
of the point set define the axes of the bounding box.
Once the axis directions are given, the spread of the
bounding box is easily found by the extreme values
of the projection of the points on the corresponding
axis. Two distinguished applications of this heuris-
tic are the OBB-tree (Gottschalk et al., 1996) and the
BOXTREE (Barequet et al., 1996). Both are hier-
archical bounding box structures which support effi-
cient collision detection and ray tracing. Computing a
bounding box of a set of points in R
2
and R
3
by PCA
is simple and requires linear time. The popularity of
this heuristic, besides its speed, lies in its easy im-
plementation and in the fact that usually PCA bound-
ing boxes are tight fitting. Recently, (Dimitrov et al.,
2007b) presented examples of discrete points sets in
the plane, showing that the worst case ratio of the vol-
ume of the PCA bounding box to the volume of the
minimum-volume bounding box tends to infinity (see
Figure 1 for an illustration in R
2
). It has been shown
in (Dimitrov et al., 2007a) that the continuous PCA
version on convex point sets in R
3
guarantees a con-
stant approximation factor for the volume of the re-
sulting bounding box. However, in many applications
this guarantee has to be paid with an extra O(n logn)
run time for computing the convex hull of the input
point set.
In this paper, we study the impact of the rather the-
oretical results above on applications of several PCA
variants in practice. We analyze the advantages and
disadvantages of the different variants on realistic in-
puts, randomly generated inputs, and specially con-
structed (worst case) instances. The main issues of
our experimental study can be subsumed as follows:
• The traditional discrete PCA algorithm works
very well on most realistic inputs. It gives a bad
approximation ratio on special inputs with point
clusters.
• The continuous PCA version can not be fooled by
point clusters. In practice, for realistic and ran-
domly generated inputs, it achieves much better
approximations than the guaranteed bounds. The
only weakness arises from symmetries in the in-
put.
• To improve the performances of the algorithms
we apply two approaches. First, we combine the
run time advantages of PCA with the quality ad-
vantages of continuous PCA by a sampling tech-
nique. Second, we introduce a postprocessing
step to overcome most of the problems with spe-
cially constructed outliers.
The paper is organized as follows: In Section 2,
we review the basics of the principal component anal-
ysis. We also consider the continuous version of PCA
and give the closed form solutions for the case when
the point set is a polyhedron or a polyhedral surface.
To the best of our knowledge, this is the first time that
the continuous PCA over the volume of the 3D body
has been considered. A few additional bounding box
algorithms and the experimental results are presented
in Section 3. The conclusion is given in Section 4.
2 PCA
The central idea and motivation of PCA (Jolliffe,
2002) (also known as the Karhunen-Loeve transform,
or the Hotelling transform) is to reduce the dimen-
sionality of a point set by identifying the most sig-
nificant directions (principal components). Let X =
{x
1
,x
2
,...,x
m
}, where x
i
is a d-dimensional vector,
and c = (c
1
,c
2
,...,c
d
) ∈ R
d
be the center of gravity
of X . For 1 ≤ k ≤ d, we use x
ik
to denote the k-th
coordinate of the vector x
i
. Given two vectors u and
v, we use hu,vi to denote their inner product. For any
unit vector v ∈ R
d
, the variance of X in direction v is
var(X,v) =
1
m
m
∑
i=1
hx
i
−c , vi
2
. (1)
The most significant direction corresponds to the unit
vector v
1
such that var(X, v
1
) is maximum. In gen-
eral, after identifying the j most significant directions
B
j
= {v
1
,...,v
j
}, the ( j + 1)-th most significant di-
rection corresponds to the unit vector v
j+1
such that
var(X,v
j+1
) is maximum among all unit vectors per-
pendicular to v
1
,v
2
,...,v
j
.
It can be verified that for any unit vector v ∈ R
d
,
var(X,v) = hCv, vi, (2)
where C is the covariance matrix of X. C is a sym-
metric d × d matrix where the (i, j)-th component,
c
i j
,1 ≤i, j ≤ d, is defined as
c
i j
=
1
m
m
∑
k=1
(x
ik
−c
i
)(x
jk
−c
j
). (3)
The procedure of finding the most significant direc-
tions, in the sense mentioned above, can be formu-
lated as an eigenvalue problem. If χ
1
> χ
2
> ···> χ
d
GRAPP 2008 - International Conference on Computer Graphics Theory and Applications
16