A Testing-environment for a Mobile Collaborative Stereo Conﬁguration

with a Dynamic Baseline

Andreas Sutorma

, Matthias Domnik

and J

org Thiem

Department of Information Technology, University of Applied Sciences and Arts Dortmund, Sonnenstraße 96,

44139 Dortmund, Germany

Keywords:

Collaborative, Stereo, UAV, Mapping, Dynamic, Baseline, Visual, VICON, Environment, Mobile.

Abstract:

This contribution deals with the construction of a testing-environment for the development of a camera based

collaborative stereo conﬁguration with a dynamic baseline. The use of a variable baseline for the stereo

conﬁguration allows to perform a more accurate depth calculation of the environment. For the development

of such a collaborative stereo conﬁguration it’s necessary to compare the results with ground-truth data. A

VICON systems is a very capable solution for UAV and Robotic studies because of the high accuracy and

low latency. This external localization system is intended to determine the dynamic stereo baseline at the ﬁrst

step of development. At a later progress this task will be taken over by another calibrated stereo camera that

belongs to the mobile collaborative stereo conﬁguration.

1 INTRODUCTION

The development of autonomous UAVs/UGVs (Un-

manned Air/Ground Vehicle) for the exploration of

large areas is a current research task. In most cases

the exploration is done with a RGB camera on a sin-

gle UAV. This can be a mono camera but also a cal-

ibrated stereo camera. In the case of a mono cam-

era, the poses between the individual shots are needed

to determine the depth information. This is usually

solved with the use of an IMU (Inertial Measuring

Unit) or GPS, if this service is available. The IMU

can only measure relative movements, this leads to a

steadily increasing error in pose estimation over the

time (drift). For this reason, it is necessary to cycli-

cally support the measured data with other informa-

tion, for example with visual data from cameras (vi-

sual odometry). Using visual odometry has the advan-

tage of reducing the error of position estimation by

renewing already measured points when they are rec-

ognized (loop closure). That is possible when the fea-

tures of the points are stored in a generated map (map-

ping). This leads to the established Visual-SLAM

(Simultaneous Localization and Mapping) method.

The biggest disadvantage of the Visual-SLAM arises

when the camera system degrades to a monocular

https://orcid.org/0000-0002-7965-2495

https://orcid.org/0000-0001-6501-2246

case. This happens when the distance of both cam-

eras (baseline) is much smaller than the distance of

the object (feature). In this case a larger baseline

within the calibrated stereo camera conﬁguration can

help. But a single UAV is limited in size and porta-

bility what makes it hard to realize a large baseline.

This leads to a stereo conﬁguration with two UAVs

that are equipped with one mono camera to form a

variable baseline. The difﬁculty lies in determining

the exact length of the baseline. There are different

approaches to solve this problem.

2 RELATED WORK

The term “Collaborative Stereo” is used for mobile

robot applications that use a large stereo baseline to

increase the measured 3D reconstruction (Achtelik

et al., 2011; Boulekchour, 2015). This approach is

used on small mobile robots like UAVs, because they

are limited in their weight and size. The idea is to

use at least two UAVs equipped with a mono cam-

era to realize a stereo setting with a variable baseline.

You need additional sensor information if you want to

use a collaborative stereo conﬁguration with just two

UAVs equipped with mono cameras. Using only fea-

ture correspondences in the overlapping ﬁeld of view

from two monocular cameras, we estimate the relative

6 DoF transformation between the robots poses up to

468

Sutorma, A., Domnik, M. and Thiem, J.

A Testing-environment for a Mobile Collaborative Stereo Conﬁguration with a Dynamic Baseline.

DOI: 10.5220/0007951104680474

In Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2019), pages 468-474

ISBN: 978-989-758-380-3

a scalar factor in translation. The knowledge of the

correct transformation is important for the accuracy

of a three-dimensional reconstruction. Some recent

approaches address this problem of determining the

true translation in this collaborative stereo conﬁgura-

tion.

GNNS (Global Navigation Satellite System) ser-

vices can be used to determine the relative pose of the

mono cameras. The accuracy depends on the GPS lo-

calization (Dias et al., 2013). A high accuracy can not

be assumed permanently and in some environment the

availability can not be assumed all over the time.

Another work (Achtelik et al., 2011) estimates the

relative pose of two UAVs by merging the poses of

downward oriented mono cameras with IMU sensor

data to solve the problem of the scalar factor in trans-

lation. But this approach needs a continuous relative

movement between the UAVs to converge and this

takes up to 8 seconds. Nonetheless, this approach

ﬁnds application in swarm-based research projects.

The use of ultra-wideband technology shown in

(Guo et al., 2017) can also determine the distance

between two UAVs. The Jet Propulsion Laboratory

(JPL) are working on similar approaches with two

small (Roland Brockers, 2015). These UAVs are us-

ing a tandem formation during the ﬂight to create

depth maps of the terrain. The distance between the

UAVs is determined by an ”antenna monitoring sys-

tem”. In Addition, some solutions exist (Kwon et al.,

2014) where two UAVs are tracking a visible target

(ﬁducial marker) on a ground vehicle. However, this

case has the disadvantage that the target must be very

large, or the distance to the target must be very small

to achieve accurate measurements.

Another possible solution is to use an external

motion capture system to localize the UAVs (Ah-

mad et al., 2016). This is only applicable in proper

equipped indoor rooms. Such a motion capture sys-

tem is used in this work to realize a test environment.

3 COLLABORATIVE STEREO

WITH MASTER-SATELLITE

CONFIGURATION

This section outlines a novel method to realize a dy-

namic baseline for mobile robotic applications as pre-

sented in (Sutorma, Andreas and Thiem, J

org, 2018).

A ﬂexible formation of at least three optically mea-

suring systems (e.g. UAVs with cameras) makes it

possible (see Fig.1). A master (calibrated stereo cam-

era with ﬁxed baseline) is located behind the so-called

satellites (mono cameras with variable baseline). This

master-satellites stereo (MSS) conﬁguration allows

the master to estimate the relative pose of the two

mono cameras (satellites), whose positions may also

vary over time. The satellites must be equipped with

markers to estimate the translation and rotation with

respect to each other (extrinsic stereo parameters).

Furthermore, the master may control the baseline of

the satellites conﬁguration to optimize the triangula-

tion and resolution for different (near vs. far) scenar-

ios. The variable baseline increases the accuracy over

standard, ﬁxed-baseline stereo methods.

Satellites

Two Mono Cameras

Master

Calibrated Stereo

Camera

variable baseline

(large)

fixed baseline

(small)

Figure 1: Variable baseline stereo conﬁguration with two

mono cameras and one calibrated stereo camera system.

4 ADVANTAGE

The advantage of collaborative stereo and therefore

also our proposed MSS conﬁguration is the realiza-

tion of a ﬂexible and large baseline. This makes it

possible to achieve a high accurate depth estimation.

However, as in all approaches for collaborative stereo,

the relative pose of the two mono cameras have to be

estimated in realtime with a proper uncertainty. This

aspect has to be analyzed in theory and to be con-

cidered in practical testing. This section therefore will

show the theory of the resulting depth error within dif-

ferent baselines. The reconstructed depth coordinate

Z of an object point P = (X,Y, Z)

in a rectiﬁed stereo

image pair is calculated by the well-known relation

Eq.1

Z =

b · f

(1)

with the stereo baseline b, the normalized focal length

and the disparity d. In our experimental envi-

ronment the camera has a focal length of f

= 1389

px. For the further calculation we consider the focal

length as exact and expect a disparity error of 1 pixel.

∆Z



∂Z

∂d



· ∆d =



−

b · f



· ∆d (2)

Replacing the disparity d with the expression:

d =

b · f

(3)

A Testing-environment for a Mobile Collaborative Stereo Conﬁguration with a Dynamic Baseline

469

0 1000 2000 3000 4000 5000

Z/mm

100

120

140

160

Z/mm

Depth Accuracy Z/mm over distance Z/mm

for stereobase b/mm and disparity error of 1px

b = 500

b = 400

b = 300

b = 200

b = 100

0 1000 2000 3000 4000 5000

Z/mm

b/mm

Tolerance of baseline estimation error b/mm

b = 100

b = 200

b = 300

b = 400

b = 500

Figure 2: Depth accuracy over distance for different stereo baseline conﬁgurations (a) and the maximum error tolerance for

the determination of the baseline to aquire a maximum error of 10mm (b).

leads to the simpliﬁed depth error ∆Z

in Eq.4:

∆Z

b · f

· ∆d (4)

As we can see in Fig.2a), the expected depth error

becomes less with a larger stereo baseline.

In this context, a larger stereo base in the appli-

cation allows a greater tolerance for determining the

baseline to achieve the same resolution as with a cam-

era system with a smaller stereo baseline. To guaran-

tee a certain resolution in the 3D map, an error limit

for the determination of the baseline can be calcu-

lated. Same as in Eq.4, this time a derivation to the

baseline b is performed. In addition, an inequality is

used to determine an error limit Z

max

(Resolution).

∆Z



∂Z

∂b



· ∆b =

· ∆b < ∆Z

max

(5)

This leads to the Eq.6 for the allowed error of

baseline determination.

∆b <

∆Z

max

· b (6)

The Fig.2b) shows the result of the baseline esti-

mation error tolerance ∆b. The bigger the baseline,

the bigger is the tolerance for the baseline estimation

error.

5 METHODOLOGY

The following method is used to evaluate multi-

baseline stereo systems as well as collaborative stereo

conﬁgurations. A special focus is laid on the fact that

both static measurements, which allow a maximum

repeatability of the experiments, as well as dynamic

experiments are to be realized. Particularly with re-

gard to the use of the test setup with UAVs, a great

dynamic is to be depicted in future tests.

To determine the camera pose in the task space for

each frame an optical reference system is used (see

section 6). In the case of the Master-Satellite Stereo

Conﬁguration the reference system may be the cal-

ibrated stereo camera of the master. Of course, the

measurements of the reference system contain inher-

ent errors. This reference system is considered as

error-free in the ﬁrst step. However, the expected er-

ror is assumed to be sufﬁciently small to allow a ﬁrst

evaluation for the presented method.

{R}

C(t

)

C(t

)

Figure 3: Schematic drawing of the setup for the indirect er-

ror estimation method presented here. The camera with the

blue outlines represents the scene at time t

and the camera

with the black outlines at time t

. The gray cameras denote

the optical reference system (VICON).

The basic setup is shown in Fig.3. A frame,

ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics

470

I(t

) respectively I(t

), is taken with a pre-calibrated

monocular camera at time t

(camera with blue out-

lines) and at time t

(black outlines). The pose be-

tween frame I(t

) and I(t

) is determined with the

given relation

C(t

)

C(t

)

by the optical reference sys-

tem (gray cameras in Fig.3). For our method, a ref-

erence frame is initially deﬁned for each test cycle,

to provide the corresponding stereo image in consec-

utive frames. Typically, the ﬁrst frame I(t

) is taken

for this purpose, but this is not mandatory. The inher-

ent error of the optical reference system is, in a ﬁrst

approximation, constant over the entire volume and

does not accumulate for consecutive measurements so

that any frame can serve as a reference.

In the ﬁrst step, a checkerboard is used to evalu-

ate the point-by-point 3D reconstruction of the stereo

images. The a priori known width and height of

the checkerboard squares (see Fig.4) make it possi-

ble to carry out a measurement of the square widths

and square heights after the calculation of the three-

dimensional corner points (green crosses) of the indi-

vidual ﬁelds.

Figure 4: Layout of the checkerboard with a ﬁeld with of

and ﬁeld height of C

. The green crosses mark the

corners we used for the measurement process.

Due to the relationship C

= C

= const, only

the square width is mentioned below. The three-

dimensional coordinates of the checkerboard corner

points, relative to the reference frame, are determined

by triangulation. The calculation of the square width

is based on the L

-norm. We used a custom-made

checkerboard of high precision. The width of its

squares is considered as ground truth value. The cal-

culated ﬁeld widths provide a indirect error estimation

for each stereo setting on the 3D reconstruction per-

formed with the baseline b and the vergence angle Θ

given by

C(t

)

C(t

)

6 EXPERIMENTAL SETUP

The used optical reference system is the VICON Vero

2.2 motion capture system, which scans a volume of

Figure 5: Our measuring setup consists of a camera sys-

tem (ZED Stereolabs) with reﬂective markers mounted on a

mobile robot platform. This camera is aligned to a checker-

board around 4 meters away.

approximately 3 m × 6 m × 3 m (WxDxH) at 300 Hz.

For tracking objects in space, the motion capture sys-

tem requires reﬂective markers, as shown in Fig.5.

For the ﬁrst experiments, a mobile robot platform

with a built-in universal robot UR5 was used, to ac-

curately move a camera simulating the satellite cam-

era system. This allows us to capture multiple frames

at one pose. The frames are taken by the calibrated

stereo camera ZED from stereolabs. However, we

mainly use only the left ZED camera for our ﬁrst

step. This camera system has a ﬁxed stereo base-

line and provides a datastream through ROS nodes

but also can be used directly with USB 3.0. This cal-

ibrated stereo camera with a ﬁxed stereo baseline is

used exclusively for the previously described calibra-

tion process. Further, the stereo camera is attached to

a Nvidia Jetson TX2 development kit, that runs a ROS

node for distributing the frames to the network. The

frames are taken with a resolution of 1920 × 1080

(1080p) in single shot mode. The checkerboard has

10 squares in the horizontal direction and 7 squares

in the vertical direction with a total size of 655.0 mm

× 458.5 mm. A distance of approximately 4.5 m is

provided between the checkerboard and the camera.

Fig.6 shows an exemplary plot of the camera posi-

tions (green identiﬁes the reference frame) measured

by the optical reference system and the corresponding

checkerboard position. The increment between the

consecutive camera positions for the variable baseline

conﬁguration is approximately 100 mm.

7 EXPERIMENTAL RESULTS

The results from the experimental setup show that the

theoretical basics in this setup are fulﬁlled. Fig.8

shows the results of a series of measurements in

A Testing-environment for a Mobile Collaborative Stereo Conﬁguration with a Dynamic Baseline

471

200

400

600

800

1000

1200

1400

1600

Z/mm

2000

Y/mm

-1000

-1500

200

X/mm

-200

-400

Figure 6: Exemplary plot of the camera positions in the ex-

perimental setup with the reference camera frame (green).

This positions are estimated by the optical reference system.

which 30 images were taken for each stereo baseline

conﬁguration. A larger stereo baseline increases the

accuracy of the measurement as it is expected from

Eq.4 in the case of measuring Z. The standard devi-

ation also decreases as the stereo baseline increases.

Note, this is an indirect measurement of the depth ac-

curacy ∆Z. So, we measure the error of the checker-

board square width ∆C and not the depth accuracy ∆Z

directly.

8 FUTURE WORK

The proposed method serves a direct error estima-

tion of the regarding measurement. The next step

is to extend the system to provide quantitative depth

information. An extension of the previous method

can be done by determining all poses of the measure-

ment setup, that the determined depth Z by the stereo

camera system becomes comparable to a quantitative

value Z

re f

. The poses

C(t

)

, and

C(t

)

the current setup (see Fig.7) are given with respect to

a global reference point in the testing-environment.

However, since the relative poses are crucial, the

global reference point is not signiﬁcant in this case.

The future measurement setup should work with

absolute poses. This will be necessary to perform a

continuous calibration of the stereo camera system

(consisting of the satellites) during the ﬂight in the

previously presented formation of three UAVs (see

Fig.1). For this reason, it is necessary to identify the

unknown poses

and

C(t

)

. The marker

point on the checkerboard {CB

} can be placed very

precisely by knowing the accurate geometric dimen-

sions of the checkerboard. So, the error in the pose

, between the reference point and the origin

C(t

)

C(t

)

C(t

)

{R}

)

{C(t

)}

{C(t

)}

{CB

}

{CB}

)

C(t

)

)}

Figure 7: Schematic representation of the relations between

the checkerboard, the camera over time and the optical ref-

erence system.

of the checkerboard can be disregarded. The pose

)

, between the marker point on the camera

and the origin of the camera coordinate system, is not

directly measurable. For determination, the known

transformations are concatenated:

= 

⊕



(7)

The temporal dependence of the poses can be ne-

glected for this consideration, since the relative pose

does not change over time. The knowledge

of all poses makes it possible to determine the ref-

erence distance to the origin of the checkerboard and

thus to check the measurements of the stereo system

at any time. The largest error is contributed by the

pose

C(t

)

. Therefore, this pose can be checked in

the following calibration. For initial error estimation,

therefore, a stereo camera is used whose extrinsic pa-

rameters are known by the calibration.

For the initial calibration process, a frame is

recorded from a static position. Subsequently, the

three-dimensional reconstruction takes place with ref-

erence to the left as well as the right camera image.

Using the resulting poses, considering Eq.7, the poses

between C

and C

, respectively C

, are determined

(see Fig.9). The following relationship is derived with

the known pose of the extrinsic camera parameters

⊕

(8)

The concatenation of

and

corre-

sponds to the identity matrix:



= I (9)

If this is not the case, the pose

is highly

susceptible to errors and the calibration must be re-

peated. For practical implementation, suitable error

limits with regard to rotation and translation must be

deﬁned. Note, that the errors in the calibration pro-

cess of the stereo system are assumed to be negligibly

small.

ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics

472

Figure 8: The mean value with the standard deviation of measured ﬁeld width ∆C with different stereo baseline conﬁgurations.

}

)

(t0)

)

(t0)

{CB}

Figure 9: Theoretical veriﬁcation approach for calculated

pose between the marker on the camera and the camera it-

self.

In the upcoming development, this calibration

process will take place on real UAVs, where vibra-

tions during the ﬂight and the synchronization of the

stereo image pairs must be considered.

9 CONCLUSION

In this paper, an approach for a testing environment

for a camera based collaborative stereo conﬁguration

was presented. Our method shows a ﬁrst step towards

a reliable and ﬂexible testing-environment for multi-

baseline systems with a great dynamic, like UAVs.

The results show a comprehensible behavior regard-

ing the theoretical part in section 4. Our presented

method does not establish a direct relation between

the depth accuracy ∆Z and the error of the checker-

board ﬁeld width ∆C. Therefore, we have to further

investigate the proposed concept in section 8, that al-

lows to measure the depth accuracy ∆Z directly. How-

ever, our ﬁrst effort shows promising results towards

a reliable setup for testing multi-baseline stereo sys-

tems as well as collaborative stereo conﬁgurations.

The experimental setup presented here will be

used in the further development and evaluation of the

master-satellites stereo method. This is necessary to

check the calibration process of the satellites, that will

be done by the master UAV. The VICON camera sys-

tem provides the Ground-Truth data and enables a

ﬂexible and reconstructible test design.

The proposed MSS method can always be of great

importance in applications where constant high ac-

curacy in a dynamic environment is required even at

large object distances. This approach enables the us-

age when a high-resolution accuracy of a 3D depth

map is required, e.g. such as surveying engineering

ofﬁces for the planning of new construction projects.

The survey method provides access to signiﬁcantly

more accurate 3D maps. Evidence protection tasks

such as the progress of the construction, deviations

from the construction plans or changes in the build-

ing structure could be examined even more precisely

and digitized for eternity. In addition, the security

sector can also make use of this collaborative stereo

setup on UAVs. Environmental disasters have in-

creased rapidly because of climate change. As a re-

sult, ﬂoods, storms and forest ﬁres have increased in

recent years. Disaster situations are typically chaotic,

confusing, stressful and dangerous. UAVs offers the

ability to scan a large area quickly and accurately, re-

ducing search time and making deployment planning

more reliable. Areas where earthquakes and ﬂoods

occur in frequency, this 3D measurement tasks will

beneﬁt from a collaborative stereo approach. It will

A Testing-environment for a Mobile Collaborative Stereo Conﬁguration with a Dynamic Baseline

473

be possible to locate particularly damaged areas (so-

called hotspots) from the high-resolution 3D maps

and thus to initiate appropriate relief measures in a

targeted manner. In addition to environmental disas-

ters, there are other scenarios for the security sector

from safety-critical infrastructures, which need to be

monitored on a regular basis. These include, for ex-

ample, power plant and energy systems, road and rail-

way routes and high-voltage roads.

REFERENCES

Achtelik, M. W., Weiss, S., Chli, M., Dellaerty, F., and

Siegwart, R. (2011). Collaborative stereo. In

2011 IEEE/RSJ International Conference on Intelli-

gent Robots and Systems, pages 2242–2248. IEEE.

Ahmad, A., Ruff, E., and B

ulthoff, H. H. (2016). Dynamic

baseline stereo vision-based cooperative target track-

ing. In 2016 19th International Conference on Infor-

mation Fusion (FUSION), pages 1728–1734.

Boulekchour, M. (July 2015). Robust Convex Optimisa-

tion Techniques for Autonomous Vehicle Vision-based

Navigation. Phd thesis, Cranﬁeld University.

Dias, A., Almeida, J., Silva, E., and Lima, P. (2013). Multi-

robot cooperative stereo for outdoor scenarios. In Cal-

ado, J. M. F., editor, 2013 13th International Confer-

ence on Autonomous Robot Systems (Robotica), pages

1–6, Piscataway, NJ. IEEE.

Guo, K., Qiu, Z., Meng, W., Xie, L., and Teo, R.

(2017). Ultra-wideband based cooperative relative lo-

calization algorithm and experiments for multiple un-

manned aerial vehicles in gps denied environments.

International Journal of Micro Air Vehicles, page

1756829317695564.

Kwon, J.-W., Seo, J., and Kim, J. H. (2014). Multi-uav-

based stereo vision system without gps for ground ob-

stacle mapping to assist path planning of ugv. Elec-

tronics Letters, 50(20):1431–1432.

Roland Brockers (2015). Adaptive resolu-

tion stereo-vision (ares-v), https://www-

robotics.jpl.nasa.gov/tasks/completed.cfm.

Sutorma, Andreas and Thiem, J

org (2018). Collaborative

stereo conﬁguration of master and satellite unmanned

air vehicles for a dynamic baseline. Indoor Position-

ing and Navigation Conference.

ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics

474