in the subset yields the most representative cloud,
while maximizing the mean distance represents the
edges well. The entropy metric gives disappointing
results, which we assume largely to be the result of
the difficulty of estimating the entropy of a continu-
ous variable based on a small number of samples.
3.1.2 Subset Selection for Keyframe Selection
Results shown in Figure 2 demeonstrate that it is pos-
sible to select a representative subset of 2D points
from a much larger set. In this section we investi-
gate whether this result also holds for selecting a set
of keyframes from the full set of frames, as illustrated
in Figure 4. The set of keyframes is said to be rep-
resentative to the full set if the reconstruction error is
not significantly affected by restricting the estimation
of the shape basis to the set of keyframes rather than
the full set. To this end, three metrics for the selection
of keyframes are investigated:
- the minimum distance in the shape space,
- the minimum distance in the camera manifold,
- the entropy in the combined shape-camera space.
We use the Euclidian distance between deformation
coefficients as the distance metric in the shape space.
In the camera manifold, the distance between two
camera parameter vectors is defined as the euclidean
distance between a given unit vector transformed to
the cameras’ reference system. The differential en-
tropy is retained as a metric in this paper because it
allows us to combine the deformation coefficients and
the camera parameters into a single metric, which is
not straightforward using distance-based metrics and
would require extensive research.
Figure 3 shows that the minimum distance on the
shape manifold results in a lower reconstruction error
than the other methods, indicating that the observa-
tion of the different deformations is more important
to the overall accuracy than the observation from dif-
ferent vantage points. The results were obtained on a
316-frame sequence from (Torresani et al., 2008) con-
sisting of a person talking to the camera. At each step,
we eliminate the frame which maximizes the respec-
tive metric and perform PPCA reconstruction restrict-
ing the estimation of the shape basis to the keyframes.
Figure 3 also shows that for keyframe set sizes of
about 35 frames there is little loss in accuracy.
3.2 Overview of the Proposed
Algorithm
We start by performing a 3D reconstruction of a boot-
strap sequence with an existing, off-line, reconstruc-
tion method. For this, any off-line reconstruction al-
gorithm can be used, but we have chosen the PPCA
method (Torresani et al., 2008) for simplicity. The
length of the bootstrap sequence must be chosen with
some care: choosing it too short will result in a rough
initialization, while choosing it too long will increase
the initialisation time. For the sequences used in
this paper we have found a bootstrap length of 60
frames to be a good middle ground. We select a
subset of frames from the bootstrap sequence which
accurately represents the whole of the bootstrap se-
quence, through the already discussed keyframe se-
lection. A rough initial reconstruction can be per-
formed for the bootstrap window (through a limita-
tion of the iteration count), which is sufficient for
keyframe selection and which we improve only for
the selected keyframes.
Throughout the execution of the program we will
add frames to the keyframe set, and we cull the
keyframe set using the subset selection whenever its
cardinality exceeds a certain imposed size (heuristi-
cally chosen to be 1.5 times the initial size). The
initial history size is also chosen manually, and tests
have shown (see for example Figure 3) that a set of 30
frames is an acceptable choice.
For the on-line processing, we reconstruct each
frame sequentially using the shape basis we have ex-
tracted from the keyframe set, initialising the camera
matrix using the last estimated value. This consists of
alternately optimizing the camera parameters and the
deformation coefficients for a set number of iterations
using the update steps from (Torresani et al., 2008).
In the next step we add the newly processed frame
to the keyframe set if it represents a deformation or
vantage point which is not yet represented in said set,
i.e. if the subset selection metric does not lower sig-
nificantly when including the newly processed frame.
At the point where the keyframe set has changed sig-
nificantly, i.e. it has grown to a predefined threshold,
we select a new subset of the large keyframe set to
serve as the new history representation from now on.
Finally, we extract an updated shape basis from the
updated history and continue with the on-line process-
ing. The update of the keyframe set and the extraction
of the new shape basis can be done in parallel with
the on-line reconstruction using parallel programming
paradigms (OpenMP or GPGPU). An overview of the
proposed algorithm is given in the form of a flowchart
in Figure 5.
3.3 GOP Processing
Estimating the 3D shape one frame at a time incurs a
large amount of overhead, because we are performing
OnlineNon-rigidStructure-from-MotionbasedonaKeyframeRepresentationofHistory
727