a scalar factor in translation. The knowledge of the
correct transformation is important for the accuracy
of a three-dimensional reconstruction. Some recent
approaches address this problem of determining the
true translation in this collaborative stereo configura-
tion.
GNNS (Global Navigation Satellite System) ser-
vices can be used to determine the relative pose of the
mono cameras. The accuracy depends on the GPS lo-
calization (Dias et al., 2013). A high accuracy can not
be assumed permanently and in some environment the
availability can not be assumed all over the time.
Another work (Achtelik et al., 2011) estimates the
relative pose of two UAVs by merging the poses of
downward oriented mono cameras with IMU sensor
data to solve the problem of the scalar factor in trans-
lation. But this approach needs a continuous relative
movement between the UAVs to converge and this
takes up to 8 seconds. Nonetheless, this approach
finds application in swarm-based research projects.
The use of ultra-wideband technology shown in
(Guo et al., 2017) can also determine the distance
between two UAVs. The Jet Propulsion Laboratory
(JPL) are working on similar approaches with two
small (Roland Brockers, 2015). These UAVs are us-
ing a tandem formation during the flight to create
depth maps of the terrain. The distance between the
UAVs is determined by an ”antenna monitoring sys-
tem”. In Addition, some solutions exist (Kwon et al.,
2014) where two UAVs are tracking a visible target
(fiducial marker) on a ground vehicle. However, this
case has the disadvantage that the target must be very
large, or the distance to the target must be very small
to achieve accurate measurements.
Another possible solution is to use an external
motion capture system to localize the UAVs (Ah-
mad et al., 2016). This is only applicable in proper
equipped indoor rooms. Such a motion capture sys-
tem is used in this work to realize a test environment.
3 COLLABORATIVE STEREO
WITH MASTER-SATELLITE
CONFIGURATION
This section outlines a novel method to realize a dy-
namic baseline for mobile robotic applications as pre-
sented in (Sutorma, Andreas and Thiem, J
¨
org, 2018).
A flexible formation of at least three optically mea-
suring systems (e.g. UAVs with cameras) makes it
possible (see Fig.1). A master (calibrated stereo cam-
era with fixed baseline) is located behind the so-called
satellites (mono cameras with variable baseline). This
master-satellites stereo (MSS) configuration allows
the master to estimate the relative pose of the two
mono cameras (satellites), whose positions may also
vary over time. The satellites must be equipped with
markers to estimate the translation and rotation with
respect to each other (extrinsic stereo parameters).
Furthermore, the master may control the baseline of
the satellites configuration to optimize the triangula-
tion and resolution for different (near vs. far) scenar-
ios. The variable baseline increases the accuracy over
standard, fixed-baseline stereo methods.
Satellites
Two Mono Cameras
Master
Calibrated Stereo
Camera
variable baseline
(large)
fixed baseline
(small)
Figure 1: Variable baseline stereo configuration with two
mono cameras and one calibrated stereo camera system.
4 ADVANTAGE
The advantage of collaborative stereo and therefore
also our proposed MSS configuration is the realiza-
tion of a flexible and large baseline. This makes it
possible to achieve a high accurate depth estimation.
However, as in all approaches for collaborative stereo,
the relative pose of the two mono cameras have to be
estimated in realtime with a proper uncertainty. This
aspect has to be analyzed in theory and to be con-
cidered in practical testing. This section therefore will
show the theory of the resulting depth error within dif-
ferent baselines. The reconstructed depth coordinate
Z of an object point P = (X,Y, Z)
T
in a rectified stereo
image pair is calculated by the well-known relation
Eq.1
Z =
b · f
px
d
(1)
with the stereo baseline b, the normalized focal length
f
px
and the disparity d. In our experimental envi-
ronment the camera has a focal length of f
px
= 1389
px. For the further calculation we consider the focal
length as exact and expect a disparity error of 1 pixel.
∆Z
d
=
∂Z
∂d
· ∆d =
−
b · f
px
d
2
· ∆d (2)
Replacing the disparity d with the expression:
d =
b · f
px
Z
(3)
A Testing-environment for a Mobile Collaborative Stereo Configuration with a Dynamic Baseline
469