our system. Second, registration and evaluation algo-
rithms for 3D point clouds as well as the alignment of
2D images onto range information acquired by a Li-
DAR system are examined.
In the field of Structure from Motion there are
many different approaches to reconstruct a scene from
a given video sequence. Nist
´
er (2001) describes a sys-
tem which processes the whole chain from images
taken by an uncalibrated camera to a dense recon-
struction of the scene.
An advanced system, working with multiple cam-
eras mounted on a ground vehicle is presented in (Ak-
barzadeh et al., 2006). An urban scene is recon-
structed from long video sequences of high resolu-
tion. The collected GPS and IMU data are only used
to geo-register the reconstructed models. Their sys-
tem is divided into two parts. The first part works
online and generates a sparse 3D point cloud. In the
second part the computational expensive dense recon-
struction for the whole video sequence is performed.
In (Nist
´
er et al., 2006) visual odometry is com-
pared to pose and position estimation based on DGPS
(Differential GPS) and IMU measures. It is shown
that it is possible to use ego motion estimated from
video for an accurate image based navigation. But –
as in the other systems described above – video im-
ages of high resolution are used instead of IR images
as considered in our work.
To evaluate an extracted 3D point cloud there are
several different possibilities. A very good survey of
the field of registration and fusion of point clouds is
given in (Rodrigues et al., 2002).
For the registration of point clouds ICP (Itera-
tive Closest Point), as introduced by Besl and McKay
(1992), has been established as standard approach.
However in the case of one dense and one sparse point
cloud as in our case, the standard algorithm has to
be adjusted. In (Pottmann et al., 2004) an approach
is suggested in which a point cloud is aligned to a
surface. They introduce an alternative concept to the
ICP algorithm which relies on local quadratic approx-
imants to the squared distance function of the surface.
Alignment of video onto 3D point clouds is the
topic of (Zhao et al., 2005). They propose a method to
align a point cloud extracted from video onto a point
cloud obtained by a 3D sensor like a laser scanner.
After that a 3D model of the scene is built, which dif-
fers from our objective to perform an evaluation.
3 EXPERIMENTAL SETUP
As sensor platform a helicopter is used. The different
sensors are installed in a pivot-mounted sensor car-
rier on the right side of the helicopter. The following
sensors are used:
IR Camera. An AIM 640QMW is used to acquire
mid-wavelength (3-5µm) infrared light. The lens
has a focal length of 28mm and a field of view of
30.7
◦
× 23.5
◦
.
LiDAR. The Riegl Laser Q560 is a 2D scanning de-
vice which illuminates in azimuth and elevation
with short laser pulses. The distance is calculated
based on the time of flight of a pulse. It covers
almost the same field of view as the IR camera.
INS. The INS (Inertial Navigation System) is an Ap-
planix POS AV system which is specially de-
signed for airborne usage. It consist of an IMU
and a GPS system. The measured pose and po-
sition are Kalman-filtered to smooth out errors in
the GPS.
The range resolution of the LiDAR system is about
0.02m according to the specifications given by the
manufacturer. The absolute accuracy specifications
of the Applanix system state the following accura-
cies (RMS): position 4-6m, velocity 0.05m/s, roll and
pitch 0.03
◦
and true heading 0.1
◦
. As described in
the introduction, both the SfM system and the LiDAR
system use the same INS and thus have the same ab-
solute position and pose error. Therefore the accuracy
of the INS is irrelevant for the comparison of both sys-
tems.
Because of the low noise of the laser system (some
centimeters) compared to the SfM system (some me-
ters) that noise will be neglected in this paper.
For the later fusion to be successful, the measured
data is registered in time first. Different frequencies
as well as the integration time of the IR camera are
taken into account.
Both the coordinate frame of the IR camera and of
the laser scanner are given in respect of the INS refer-
ence coordinate frame. Therefore coordinate transfor-
mations between the IR camera and the laser scanner
are known and later evaluation is possible.
4 3D RECONSTRUCTION
The developed system automatically calculates a set
of 3D points from given IR images, pose and position
information. For implementation Intel’s computer vi-
sion library OpenCV (Intel, 2006) is used.
A system overview is given in figure 1. After ini-
tialization, detected features are tracked image by im-
age. To minimize the number of mismatches between
the corresponding features in two consecutive images
the algorithm checks the epipolar constraint by means
VISAPP 2008 - International Conference on Computer Vision Theory and Applications
440