
The 3 × 1 column vector (u v w)
T
at the left-hand
side of this equation represents the three-dimensional
pixel coordinates of an observed point. Note that the
third component is added to express the vector in ho-
mogeneous coordinates. On the right-hand side, the
3 × 3 matrix K is the intrinsic matrix of the camera,
the 3 × 3 matrix R is the rotational matrix, and the
3 × 1 column vector t represents the translation of the
camera center. I
3×3
represents the identity matrix of
dimension 3. The final 4 × 1 column vector (X
W
1)
T
represents the observed 3D point in homogeneous
coordinates in the World Coordinate System.
Moreover, the residual r can be expressed as:
r =
x
y
−
u
w
v
w
(2)
where (x y)
T
represents the observed pixel coordi-
nates of a feature. These coordinates will be pro-
vided as input to the BA algorithm. The second
vector
u
w
v
w
T
is calculated from the estimated ho-
mogeneous coordinates of the point in the three-
dimensional WCS and represents the reprojection of
an observed point in the image plane. By changing
the values of K, R and t, these reprojected coordi-
nates will vary accordingly.
In essence, a bundle adjustment algorithm will per-
form a nonlinear least squares (NLLS) optimization
in order to minimize the sum of the squared residuals
r over all considered points and camera poses:
min
C,X
∑
|| u
i j
− π(C
j
, X
i
) ||
2
(3)
In (3), u
i j
represents the observations in pixel coor-
dinates. This is inherently equal to (x y)
T
defined in
(2). π(C
j
, X
i
) represents the projection of a point X
i
into the camera plane of camera C
j
. Hence, this term
corresponds to
u
w
v
w
T
defined in (2). Note that this
is a nonlinear operation.
3 METHOD
As a first building block, a pipeline to perform bun-
dle adjustment based on the measurements made by a
single agent is proposed. This is referred to as Single-
Agent Bundle Adjustment (SABA). We start with a
description of this algorithm, before moving on to the
Multi-Agent Bundle Adjustment (MABA) system.
3.1 Single-Agent Bundle Adjustment
As illustrated by (3), the first step is to reproject the
observed features in the camera plane. This can be
Figure 3: MABA algorithm pipeline.
easily achieved by multiplying the estimated World
Coordinates of the points with the camera pose matrix
T ∈ SE(3), then with the camera intrinsic matrix to
obtain the pixel coordinates that are ultimately used
to calculate the residual.
Once all features are transformed to the Pixel Co-
ordinate System (PCS), the actual bundle adjustment
is carried out using the Ceres library. This library al-
lows for the solution of NLLS problems using various
solution methods, such as the Gauss-Newton method
or the Levenberg-Marquardt algorithm. Optionally,
the resulting camera trajectory can be smoothed to
reduce the effect of outliers, caused by an incorrect
pose estimation. This will result in a smaller Abso-
lute Pose Error (APE) and Relative Pose Error (RPE)
between the estimated trajectory and the ground truth.
When calculating the optimal camera poses and point
locations in the map, the algorithm can be run in two
different modes. The most intuitive approach is to
run the algorithm in the so-called ’unbounded’ mode,
which imposes no additional constraints on the re-
fined parameters. While being the least constrained
problem, it might result in some camera positions be-
ing heavily displaced in order to minimize the global
cost function. To avoid this effect, which has a neg-
ative impact on the APE and RPE after alignment
with the ground truth, the algorithm can be run in the
’bounded’ mode. Now, the displacement of the cam-
era poses and the point coordinates is limited to avoid
huge outliers. In any of the three principal directions,
the maximal displacement of the camera positions is
clamped to [−δ, δ], where δ ∈ R
+
is a user-input pa-
rameter. Camera rotations and 3D point coordinates
are clamped in a similar manner.
While imposing additional constraints to the opti-
mization problem, this method tends to avoid having
great outliers, hence improving the alignment of the
estimated trajectory with the ground truth.
Multi-Agent Monocular SLAM
215