Table 1: The calibration performed under different constraints.
Condition focal X focal Y principal point reprojection error
Common 2209 2190 (988,887) (0.19,0.25)
no distortion 2962 2884 (1307,414) (0.33,0.32)
aspect=1 2166 2166 (1003,928) (0.19,0.25)
principal point fixed 2243 2243 (1023.5,767.5) (0.2,0.25)
Table 2: The average minimum costs of different methods (SGBM, BM, MC-CNN).
SGBM BM MC-CNN
doll 120.236 20.92 1.1056
toy brick 120.6858 21.78 1.0581
toy brick and cup 126.8454 19.55 0.9194
toy brick and lamp 113.8195 16.643 1.3634
our framework for performance comparison are BM,
SGBM and MC-CNN. BM and SGBM run with the
window size of 13 × 13, and P
1
, and P
2
are 18 and 32,
respectively. MC-CNN uses the fast model trained by
KITTI datasets. The parameter α is fixed as 1, and β is
20, 10 and 0.5 for SGBM, BM and MC-CNN, respec-
tively. The cost aggregation are tabulated in Table 2,
and the results from different methods are shown in
Tables 3–5. We choose the ‘Q’ Middlebury dataset as
the input and the evaluation method are BPR1.0.
In the cost function Equation (6), we only test CT
and NCC because the image intensity will be shifted
by the camera lens change and auto-exposure. The
results from our framework are not very significant
when adopted to MC-CNN. This might be due to the
design of the cost function for MC-CNN training. If
only the best solution is considered during training,
it will not provide the best correspondence candidates
for the algorithm, and the normal MC-CNN has a sim-
ilar process like SGBM. Figure 7 shows the results
with or without the SGBM process. There are clear
differences mainly because the basic algorithm can-
not provide the suitable candidates.
Figure 8 shows the result comparison of several
algorithms evaluated in this paper. For a fair com-
parison, we choose the same input size and remove
some machine learning based methods. We choose
SGBM+Dis as the result for comparison. The overall
performance of the framework is robust in the whole
dataset although some of them are not very significant
mainly due to the different illumination conditions of
the left and right images.
3.1 Optical Zoom Image Dataset
We use zoom lens cameras to construct a new dataset
with ground truth. The evaluation method using our
dataset is changed to BPR2.0 because the ground truth
labeled manually is not very precise. We test the
zoom correspondence methods mentioned previously,
and test the effect of the candidate quantity. The toy
brick+BM result appears very strange. When we cal-
culate the BPR with zoom2 disparity map in BM, its
value is near to 80%. Thus, we believe that the two
initial disparity maps must have a certain level of cor-
rectness.
We then increase the number of disparity candi-
dates to test the relationship between the result and the
candidate quantity. In the disparity candidates, some
algorithms such as SGBM cannot just sort the cost
to choose candidates, so only MC-CNN and BM are
tested. Figure 9 shows that if we increase the number
of disparity candidates with no constraints, the results
will be unpredictable. Thus, we add a new constraint
that the correspondence pair only takes the candidates
in the top three, and the results are better improved.
MC-CNN encounters a problem that we cannot make
sure if the top three in the cost are the same as the
required candidates. The cost curve shown in Figure
7 only indicates the best disparity point. If the point
is removed the whole line will look like a horizontal
line with noise. With this test, it can be sure that the
overall framework is stable if the number of candidate
is sufficient.
4 CONCLUSION
In this work, a stereo matching framework using
zoom images is proposed. With zoom image pairs, we
are able to reduce the error and the uncertain region
in the disparity map. Compared to the existing stereo
matching algorithms, our approach can improve the
disparity results with less computation. The proposed
framework can adapt to the existing local and global
methods for stereo matching, even the machine learn-
ing based matching algorithms. In the future work,
VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications
162