rate the depth-map. The bundle adjustment used for
SfM methods are not suitable for small motions, the-
refore a modified bundle adjustment is proposed in
(Ha et al., 2016) under inverse depth representation.
In this case, the reprojection error is estimated from
mapping the points in the Distorted to Undistorted
domain. The sparse three-dimensional points are cre-
ated by random depth initialization (Yu and Gallup,
2014), then plane sweeping based image matching
(Collins, 1996) is employed to create the depth-map.
Finally, Markov Random Field (Komodakis and Pa-
ragios, 2009) approach is employed to regularize the
estimated depth-map effectively.
1.2 Problem Statement
Although DfSM algorithm is specially designed for
small baselines, the estimated camera poses become
unreliable if the motion is unreasonably small. It is
assumed that the required minimum baseline to apply
this approach is reasonable when large number of fra-
mes are acquired, approximately 30 frames (Ha et al.,
2016). As a result of limited memory space on mobile
devices and the execution time issue, we are restricted
to only use 10-15 frames in the depth-map generation
on the mobile devices. The consequence of using this
small number of frame means that the self calibrating
bundle adjustment may not converge fast enough or
not converge at all. In addition, as a result of lack of
features near the image border, the estimated radial
distortion parameters diverge beyond their bound and
may not give meaningful estimation.
Due to these problem, the BA do not always give
correct estimates of the camera parameters and in-
verse depth values. One solution to tackle this issue
might be to use very high number of feature points in
the order of 10,000 and above, or include an additi-
onal photometric bundle adjustment (Alismail et al.,
2017) step if one is restricted to small number of fe-
ature points. In addition, one can bound the camera
parameters during the optimization. However, these
solutions only introduce additional complexity to the
system optimization. Therefore, a good initialization
for the bundle adjustment is vital for the depth-map
accuracy, so we proposed to use factorization techni-
que based on Rank-1 suitable for inverse depth repre-
sentation.
1.3 Summary
In this paper, we describe an uncalibrated Depth
from Small Motion technique using rank-1 initializa-
tion. This approach provides a better initialization
Figure 1: DfSM framework. Our main technical contributi-
ons are in the dashed enclosed boxes.
for the bundle adjustment procedure that takes too
much or doesn’t converge under DfSM. This is par-
ticularly suitable and targeted to speedup processes
for the deployment of the DfSM algorithm on con-
sumer smartphone devices. The Rank-1 factorization
does not only speed up the convergence process but
also allow good initialization for accurate depth-map
generation. Thanks to rank-1 initialization, self cali-
brating Bundle Adjustment (BA) is able to converge
in as little as 10-20 iterations with 10 images. We
also proposed a grided feature extraction to speedup
feature tracking process of the algorithm. Finally,
we optimized various parts of the original algorithm
(Ha et al., 2016) using GPU OpenCL and other CPU
multi-threading techniques. This makes it possible to
produce a detail experiment on a mobile device under
ANDROID platform.
In the next section, we present the uncalibrated
rank-1 factorization for the DfSM problem. Expe-
riments and performance evaluation with the propo-
sed method as compared to optimized CPU only im-
plementations are provided in section 3. Finally, we
made conclusion and future direction in this work.
2 DfSM WITH RANK-1
INITIALIZATION
Fig. 1 illustrates the general overview of the DfSM
algorithm for the depth-map generation in this work.
Some consistent good features over all the video fra-
mes were extracted using grided feature tracking ap-
proach proposed in this work. Then, we initialize the
bundle adjustment procedure using the rank-1 facto-
rization technique, the outcome are the optimized ca-
mera parameters and the inverse depth point values.
Finally, the estimated inverse depth point values and
the camera parameters are used under a dense mat-
ching method to create the depth-map. In the follo-
wing part of the section, we start first with coordinate
representation used in this paper, then we explain the
proposed grided-feature extraction, and Rank-1 initi-
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
522