UNSUPERVISED LEARNING FOR TEMPORAL SEARCH SPACE
REDUCTION IN THREE-DIMENSIONAL SCENE RECOVERY
Tom Warsop and Sameer Singh
Research School of Informatics, Holywell Park, Loughborough University, Leicestershire, LE11 3TU, U.K.
Keywords:
Three-dimensional scene recovery, Search space reduction, Unsupervised learning.
Abstract:
Methods for three-dimensional scene recovery traverse scene spaces (typically along epipolar lines) to com-
pute two-dimensional image feature correspondences. These methods ignore potentially useful temporal in-
formation presented by previously processed frames, which can be used to decrease search space traversal.
In this work, we present a general framework which models relationships between image information and
recovered scene information specifically for the purpose of improving efficiency of three-dimensional scene
recovery. We further present three different methods implementing this framework using either a naive Near-
est Neighbour approach or a more sophisticated collection of associated Gaussians. Whilst all three methods
provide a decrease in search space traversal, it is the Gaussian-based method which performs best, as the other
methods are subject to the (demonstrated) unwanted behaviours of convergence and oscillation.
1 INTRODUCTION
Recovering three-dimensional (3D) scene informa-
tion from two-dimensional (2D) image information
can be very useful. The work presented in this paper
is part of a larger project concerned with recovering
3D scene information from a train mounted, forward-
facing camera.
Many methods have previously been applied to
3D scene recovery. As highlighted by (Favaro et al.,
2003), a large proportion of these methods follow a
similar pattern of execution. First point-to-point cor-
respondences among different images are established.
These image correspondences are then used to in-
fer three-dimensional geometry. These feature cor-
respondences can be computed in one of two ways,
either by seaching the 2D image plane or by incorpo-
rating epipolar geometry.
The first set of methods do not take the 3D na-
ture of the problem into account. These methods typ-
ically operate in two steps. First, image features are
detected. Methods presented in literature use Har-
ris corners ((Li et al., 2006)), SIFT features ((Zhang
et al., 2010)) and SURF features ((Bay et al., 2008)).
More recently, to compensate for viewpoint changes
in captured image information (Chekhlov and Mayol-
Cuevas, 2008) artifically enhanced the feature set for a
single image point considered, computing spatial gra-
dient descriptors for multiple affine transformed ver-
sions of the image area surrounding a feature point.
Feature correspondences are then computed by fea-
ture matching in subsequent frames.
It is, however, possible to incorporate 3D informa-
tion into these feature correspondence computations.
One of the most straightforward ways of integrating
3D information uses stereo cameras. Under schemes
such as these, as can be seen in the work of (Zhang
et al., 2009; Fabbri and Kimia, 2010; Li et al., 2010)
and (Grinberg et al., 2010) (to name a few) epipo-
lar scanlines across left and right-hand images are
searched for matching feature correspondence. It is
possible to integrate these concepts into monocular
camera configurations such as in the method intro-
duced by (Klein and Murray, 2007) known as Paral-
lel Tracking and Mapping (PTAM). In which features
are initialised with their 3D positions by searching
along epipolar lines, defined by depth between key
frames of the image sequence. (Davison, 2003; Davi-
son et al., 2007) presented a similar idea of feature
initialisation in monoSLAM.
When recovering 3D information from image se-
quences, if they are processed in reverse chronologi-
cal order new scene elements to process appear at im-
age edges. This provides an interesting property - im-
age areas recovered in subsequent image frames ex-
hibit similar properties to those processed previously,
highlighted in Figure 1. It may therefore be possi-
ble to exploit this information, using relationships be-
549
Warsop T. and Singh S..
UNSUPERVISED LEARNING FOR TEMPORAL SEARCH SPACE REDUCTION IN THREE-DIMENSIONAL SCENE RECOVERY.
DOI: 10.5220/0003308405490554
In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2011), pages 549-554
ISBN: 978-989-8425-47-8
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)