
2.4 Adjusted WTA and Postprocessing
The WTA method is used to select the optimal dispar-
ity d
r,c
for the pixel at the position (r, c) in the left im-
age. The WTA method takes into account the number
of pixels that support the decision by choosing among
the trustworthy disparity candidates. The trustworthy
disparity candidates have at least N
s
= K
p
·max{N
r,c
p
}
pixels participating in the cost aggregation, where N
r,c
p
is D× 1 vector with number of the participating pix-
els in the cost aggregation for each possible disparity
value. K
p
is the ratio coefficient 0 < K
p
≤ 1. The
optimal disparity d
r,c
is found as:
d
r,c
= argmin
d
i
{C
r,c
nSSD
(d
i
)| N
r,c
p
(d
i
) > N
s
}, (4)
where r = 1, . . . , R and c = 1, . . . ,C, for the image of
the dimension R ×C pixels. The postprocessing step
performs median L× L filtering on the disparity map
d to eliminate spurious disparities.
3 EXPERIMENT RESULTS AND
DISCUSSION
We have used the Middlebury stereo benchmark
(Scharstein and Szeliski, 2002) to evaluate the per-
formance of the sparse window technique. The pa-
rameters of the algorithm are fixed for all four stereo
pairs: T
L
= 10, T
R
= 10, w
x
= 15, W
x
= 31, σ
2
n
= 0.5.
In the process of pixel selection, we declare the win-
dow as textureless if in more than w
x
+ 1 columns
and in more than w
x
+ 1 rows, more than half pix-
els from the left window are selected for match-
ing. The structuring element in erosion step is square
N
E
× N
E
, N
E
= 5. Dilation is performed with squared
N
D
× N
D
, N
D
= 3 structuring element, if there are less
than N
min
columns with less than N
min
pixels or if
there are less than N
min
rows with less than N
min
pix-
els, N
min
= 5. WTA parameter is K
p
= 0.5. Post-
processing step is L× L median filtering with L = 5.
These parameters have been found empirically.
The disparity maps obtained by our algorithm
(with offset compensation) for the stereo pairs from
the Middlebury database are shown in the third col-
umn in Figure 1. The leftmost column contains the
left images of the four stereo pairs. In the first row
are images of the Tsukuba stereo pair, followed by
Venus, Teddy and Cones. Ground truth (GT) dispar-
ity maps are in the second column. The forth column
shows the bad disparity maps where the wrong dispar-
ities are shown in black. The occlusion regions are in
gray and the white regions denote correctly calculated
disparity values. The quantitative results in the Mid-
dlebury stereo evaluation framework are presented in
Table 1. The table shows the ranking of the results to-
gether with the error percentages for the nonoccluded
region (NONOCC), error for all pixels (ALL), and the
error percentage in the discontinuity region (DISC).
We consider the ranking of the NONOCC column
most important. We do not deal with the occluded
and discontinuity regions in our algorithm. The re-
sults show that with our hybrid technique edges of
the objects are preserved. The disparities of some
narrow structures are successfully detected and recov-
ered, although their dimensions are much smaller than
the size of the window. Such example of the narrow
objects are most noticeable in Tsukuba disparity map
(the lamp reconstruction) and in Cones disparity map
(pens in a cup in the lower right corner). On the other
hand, the disparities of the large low textured surfaces
in stereo pairs Venus and Teddy are also successfully
recovered with the same sparse window technique.
The images in the Middlebury database have dif-
ferent sizes and disparity ranges, as well as differ-
ent radiometric properties. The stereo pairs Tsukuba
(384×288 pixels) and Venus (434×383) have dispar-
ity ranges from 0 to 15 and from 0 to 19. The radio-
metric properties of the images in these stereo pairs
are almost identical, and our algorithm gives even bet-
ter results without the offset compensation given by
eq. (2). The error percentages for the nonoccluded
regions for these two pairs without the offset compen-
sation are 2.53% and 0.62% respectively, see Figure
2. Figure 2 shows in the upper row the disparity maps
calculated using the sparse window technique without
the offset compensation step for all four stereo pair
from the evaluation framework and the lower row of
figure 2 contains corresponding bad pixel maps with
color coding as in the previousfigure. The stereo pairs
Teddy (450×375 pixels) and Cones (450×375) have
disparity ranges from 0 to 59. The images of these
stereo pairs are not radiometrically identical. The
sparse window matching without the offset compen-
sation step results in very large errors, see Figure 2.
The error percentages for the nonoccluded regions for
the stereo pairs Teddy and Cones without the offset
compensation are 17.5% and 13.8%.
4 CONCLUSIONS
We introduced a new sparse window technique for
local stereo matching. The algorithm is simple for
implementation, as it is based on pixel selection by
thresholding, normalized sum of squared differences
cost and plain median filtering in the postprocessing
step. Our algorithm does not suffer from the com-
mon pitfalls of the window-based matching. It does
VISAPP 2011 - International Conference on Computer Vision Theory and Applications
692