COMBINATION OF CORRELATION MEASURES FOR DENSE
STEREO MATCHING
Sylvie Chambon
Institut Franc¸ais des Sciences et Technologies des Transports, de l’Am´enagement et des R´eseaux (IFSTTAR)
Champs-sur-Marne, France
Alain Crouzil
Institut de Recherche en Informatique de Toulouse (IRIT), Toulouse, France
Keywords:
Stereovision, Matching, Correlation, Classic measures, Robust statistics, Fusion.
Abstract:
In the context of dense stereo matching of pixels, we study the combination of different correlation mea-
sures. Considering the previous work about correlation measures, we use some measures that are the most
significant in five kinds of measures based on: cross-correlation, classic statistics, image derivatives, non-
parametric statistics and robust statistics. More precisely, this study validates the possible improvement of
stereo-matching by combining complementary correlation measures and it also highlights the two measures
that can be combined in order to take advantage of the different methods: Gradient Correlation measure (GC)
and Smooth Median Absolute Deviation measure (SMAD). Finally, we introduce an algorithm of fusion that
allows to combine automatically correlation measures.
1 INTRODUCTION
Finding homologous pixels in a stereo pair of im-
ages is one of the most important step in order to
recover the 3D structure of a scene by stereovision.
Many methods have been proposed in the literature
where local methods are distinguished from global
ones. More precisely, matching methods can be de-
scribed with essential components, this term has been
firstly introduced by (Scharstein and Szeliski, 2002).
These components are: the matching cost, the opti-
mization method, the introduction of multiple passes,
i.e. to improve the matching performances, some ap-
proaches are based on several methods applied in se-
quence. This description leads to a four type cate-
gorization: local methods, global ones (without cor-
relation measure), mixed method (global ones with a
correlation measure) and the methods with multiple
passes. Our purpose is to introduce a multipass algo-
rithm based on combination of local methods.
Local methods are easy to implement, low time
consuming, quite efficient, and consequently inten-
sively used. Unfortunately, characterising how the
existing correlation measures are effective, i.e. ob-
taining correct matching in different areas of the im-
age, is still an open issue. In our work (Chambon
and Crouzil, 2011) on local costs, the influence of dif-
ferent measures on the quality of stereo matching re-
sults have been studied, in particular, near occluded
regions and, in (Chambon and Crouzil, 2004), we
demonstrated that a measure based on a robust statis-
tics tool combined with a cross correlation measure
allows to obtain better performances than using a cor-
relation measure alone. These results raise up three
new questions:
Which are the correlation measures that are the
most complementary to cover all the matching
difficulties?
Is it advantageous to combine numerous measures
and how many?
Following up the previous questions, can we pro-
pose an algorithm that combines more than one
measure to obtain a whole dense and correct
matching (or disparity) map
1
and is it more effi-
cient than the method based on a sole measure?
1
A disparity (or matching) map represents for each
pixel, the distance between the pixel and its correspondent.
When the disparity map is represented by a grey level im-
age, the clearer the pixel, the larger the distance is. Black
pixels are occluded pixels.
598
Chambon S. and Crouzil A..
COMBINATION OF CORRELATION MEASURES FOR DENSE STEREO MATCHING.
DOI: 10.5220/0003333305980603
In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2011), pages 598-603
ISBN: 978-989-8425-47-8
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
Existing methods are briefly presented before the de-
scription of the data set used for validating the pro-
posed method. Then, combination study is described
leading to the proposal for matching algorithm based
on merging the results obtained from various correla-
tion measures. Finally, results are presented.
2 CORRELATION MEASURES
The principle of a local cost, i.e. a correlation mea-
sure, is to consider that two homologous pixels and
their respective neighborhoods, are similar, from a
photometric point of view. The main difficulties of
these methods are: illumination changes, untextured
areas and occlusions. Many measures have been in-
troduced to tackle out these difficulties. Based on the
results of 41 measures on a benchmark of 42 images,
presented in (Chambon and Crouzil, 2011), we pro-
pose to study the complementarity of these measures,
and, in particular, the best measures of each families.
Table 1: Notations used for the description of the measures.
I
w
The images with w {l,r} (left and right).
I
i, j
w
,
p
i, j
w
The grey level of the pixel p
i, j
w
of coordinates
(i, j) in image I
w
is I
i, j
w
. Moreover, p
x,y
r
is the
correspondant pixel of p
i, j
l
.
N
The number of pixels in the neighborhood:
N
f
= (2N
v
+ 1) ×(2N
h
+ 1), N
v
, N
h
N
.
f
w
The vector of grey levels of pixels in the cor-
relation windows (in I
w
):
f
w
= (···I
i+p, j+q
w
·· ·)
T
= (··· f
k
w
·· ·)
T
where
T
is the matrix transposition operator
and p [N
v
;N
v
], q [N
h
;N
h
].
f
w
The mean of the grey levels in f
w
.
f
k
w
The element k of vector f
w
.
L
P
The L
P
norms: kf
w
k
P
= (
N
f
1
k=0
| f
k
w
|
P
)
1/P
with P N
and kf
w
k = kf
w
k
2
.
In the following description, when no explicit ref-
erence is given, the reader should consult (Aschwan-
den and Guggenbl, 1992). We briefly present the no-
tations in Table 1 and the ve best measures of the
different families that are considered.
(1) Family 1. Cross Correlation-based Measures
All these measures are based on a scalar prod-
uct (Moravec, 1980) and NCC (Normalized Cross
Correlation) is the most efficient one:
NCC(f
l
,f
r
) =
f
l
· f
r
kf
l
kkf
r
k
. (1)
(2) Family 2. Classical Statistics-based Measures
These types of measures can be used: measures
based on a distance or/and that are locally cen-
tered, variance-based or fourth-order cumulant-
based measures (Rziza and Aboutajdine, 2001).
The best one is the LSAD (Locally scaled Sum of
Absolute Differences) defined by:
LSAD(f
l
,f
r
) = kf
l
f
l
f
r
f
r
k
1
. (2)
(3) Family 3. Derivatives-based Measures – Instead
of using grey levels, these measures employ the
derivatives of the images at different orders (Seitz,
1989). Most of the existing measures use only
the direction of the gradient vectors (Ullah et al.,
2001), but, this kind of information can induce er-
rors, in particular, with low norm gradient vectors
whose direction is not reliable. In consequence,
the most performant measure is based on the simi-
larity of the image gradient vectors, GC (Gradient
Correlation) (Crouzil et al., 1996). If the gradi-
ent vector at p
i, j
w
in I
w
is I
i, j
w
and the norm is
denoted by kI
i, j
w
k, the definition of GC is:
GC(f
l
,f
r
) =
A
kI
i+p, j+q
l
I
v+p,w+q
r
k
A
(kI
i+p, j+q
l
k + kI
v+p,w+q
r
k)
,
(3)
with
A
=
N
v
p=N
v
N
h
q=N
h
.
(4) Family 4. Non-parametric Statistics-based Mea-
sures They are based on the order of the
grey levels inside the correlation window (Kaneko
et al., 2002; Bhat and Nayar, 1998). Using
the order of the grey levels allows these mea-
sures to be robust against noises and occlusions
but, sometimes, it also gives an ambiguous re-
sult, i.e. the best correlation score is obtained for
the wrong correspondant. The most performant
measure of this family is a non-parametric one,
CENSUS (Zabih and Woodfill, 1994). The simi-
larity measure uses a transform that produces a bit
chain which represents the pixels with an intensity
lower than the central pixel:
R
τ
(f
w
) =
O
k[0;N
f
1]
ξ( f
N
f
/2
w
, f
k
w
),
where ξ( f
N
f
/2
w
, f
k
w
) = 1 if f
k
w
< f
N
f
/2
w
and 0 else-
where. CENSUS is the sum of the Hamming dis-
tances, denoted by D
H
, between the codes of each
pixel of the correlation window:
CENSUS(f
l
,f
r
) =
N
f
1
k=0
D
H
(R
τ
(f
l
),R
τ
(f
r
)). (4)
COMBINATION OF CORRELATION MEASURES FOR DENSE STEREO MATCHING
599
(5) Family 5. Robust Measures We are particu-
larly concerned with the occlusion problem which
appears in the vicinity of a pixel near a depth
discontinuity. In fact, some pixels lie on a first
level of depth whereas the other pixels lie on a
second level. It can disturb the matching pro-
cess and introduce erroneous matches. To take
this problem into account, robust statistics tools
are introduced as correlation measures, like par-
tial correlation (Lan and Mohr, 1997) or pseudo-
norms (Delon and Roug, 2004). The most effi-
cient is SMAD, the Smooth Median Absolute De-
viation (Rousseeuw and Croux, 1992):
SMAD(f
l
,f
r
) =
h1
k=0
(f
l
f
r
med(f
l
f
r
))
2
k:N
f
1
,
(5)
where the ordered values of f
w
are represented by:
( f
w
)
0:N
f
1
... ( f
w
)
N
f
1:N
f
1
. It can be inter-
preted as a robust centered (median) and troncated
distance and, in our experiments, h =
N
f
2
.
Robust and non-parametric measures (families 4 and
5) are efficient in the presence of noises and/or oc-
clusions whereas the classic ones (families 1 and 2)
obtain better results when there is no major problems.
The derivatives measures have been designed to be
more efficient in the presence of noises, but, most of
the time, they are really less efficient than the other
ones, except GC which seems to have better results
than the others, in particular in low textured areas.
Interested readers can find more details about all the
measures in (Chambon and Crouzil, 2011).
3 EVALUATION PROTOCOL
To validate our approach, 42 images, with their
ground truth or reference disparity maps, have been
tested (see Figure 1 for examples): 1 random-
dot stereogram, 2 synthetic pairs (Murs) and one
real image pair
2
, and, finally, 38 real pairs in-
troduced by Scharstein and Szeliski (9 in 2002
(Tsukuba) (Scharstein and Szeliski, 2002), 2 in 2003
(Cones) (Scharstein and Szeliski, 2003), 6 in 2005
and 22 in 2006 (Aloe)). The last ones are the most
complex scenes. The most consequent evaluationpro-
tocol to highlight the different performances of global
methods is given by the authors of (Scharstein and
Szeliski, 2002)
3
. Compared to their protocol, our
2
http://www.irit.fr/Benoit.Bocquillon/MYCVR/
research.php http://www.irit.fr/Benoit.Bocquillon/
MYCVR/research.php
3
http://vision.middlebury.edu/stereo/eval/
NAME
(a) (b) NAME (a) (b)
Murs Tsukuba
(2002)
Cones
(2003)
Aloe
(2006)
Figure 1: Examples of data used in our tests (left images
(a) and disparities
1
(b)). Interested readers can find more
explanations about the estimation of these reference maps
(ground truth), both in the cited papers and in the cited web
page of section 3 (active vision is used and/or some con-
straints about the geometry of the scene are introduced).
comparison is based on all their 38 images instead of
4.
Many criteria can be used to evaluate the quality
of the results based on ground truth (Chambon and
Crouzil, 2004). However, for this evaluation, we use
the percentage of erroneous matches, noted ER, and
the evaluation of the complementarity of the results
(also based on ER) because they are the two most im-
portant aspects to consider in order to evaluate the im-
pact of the proposed fusion algorithm.
4 COMPLEMENTARITY STUDY
To evaluate the complementarity of similarity mea-
sures, we analyse the percentage of erroneous
matches (ER) for each measure used alone, and for
each combination, by supposing that the correct cor-
respondent is always kept (when one of the measures
that are combined finds the exact correspondent), see
Table 2 for the combination of 2 measures and Fig-
ure 2 for the percentage of erroneous matches with
more than 2 measures. We use these notations:
M
i
, with i {1;...;N
m
}, the N
m
tested measures;
d
th
(p
l
) the disparity of the pixel p
l
given by the
ground truth;
d
i
(p
l
), the disparity given by the algorithm based
on the correlation measure M
i
;
d
N
m
tc
(p
l
) the theoretical or optimal combination of
N
m
measures.
More formally, the optimal combinations of the re-
sults over N
m
measures (N
m
{2;...;41} because in
our previous work 41 measures have been studied),
denoted d
N
m
tc
, is simply estimated by following this
rule, for each pixel p
l
:
if i {1;...;N
m
} where d
i
(p
l
) = d
th
(p
l
)
then d
N
m
tc
= d
th
(p
l
) and d
N
m
tc
is correct
else d
N
m
tc
is erroneous.
We have tested these configurations:
VISAPP 2011 - International Conference on Computer Vision Theory and Applications
600
(C
1
) N
m
= 2: All the 41× 41 combinations have been
evaluated and it highlights the best combination:
GC and SMAD with only 14.13% for the mean
percentage ER on the 42 images.
(C
2
) N
m
= 41: All the 41 measures have been theo-
retically combined and the results show that the
percentage ER can be decreased to 7.26%.
(C
3
) N
m
{3;.. .;40}: When we used the best com-
bination GC-SMAD, any kind of measures can
be added, the performances are quite equivalent.
With 3 measures, the percentage ER decreases to
about 13% and, then, it goes slowly to the min-
imum percentage ER (about 1% for each added
measure) reached by the optimal combination of
41 measures. Moreover, when more than 10 mea-
sures are used, ER is close to this minimum.
First, the results show how the local matching with
one correlation measure can be theoretically im-
proved, and, second, which measures are the most
complementary. In Table 2, we can remark that com-
bining different measures can highly improve the re-
sults: on the whole image, from 7% of improvement
(2 measures combined) to 17% (41 measures).
Table 2: Percentages of erroneous matches (ER) with each
of the 5 best correlation measures and the best combinations
of 2 correlation measures.
MEASURE
ER MEASURE ER
NCC 23.2 LSAD 23.3
GC 21 CENSUS 20.2
SMAD
27.9 GC-SMAD 14.13
Figure 2: Percentage of erroneous matches versus the num-
ber of correlation measures theoretically combined. This
graph illustrates the maximal number of measures that are
interesting to combine (10) but it also highlights the biggest
improvement obtained with only 2 measures.
The following analyses illustrate these results. Us-
ing four maps, see Figure 3, we propose to visualize:
(1) The comparison of the two most complementary
measures As expected by the definitions of these
measures, this visualization illustrates that SMAD
compensates for the weaknesses of GC in occlu-
sion areas (or near occlusion areas) whereas GC
compensates for the weaknesses of SMAD in non-
occluded areas and, in particular, areas that are
low textured, see (a) in Figure 3.
(2) The areas with 1 correct correspondent over 5,
10 or 41 correspondents The most distinctive
measure is GC, i.e. it is the most complementary
measure to the other measures. SMAD is the sec-
ond most complementary measure to the others,
see (b) for 5, (c) for 10 and (d) for 41 in Figure 3.
Moreover, with 10 different correlation measures,
the results are quite near the results with 41 mea-
sures. And, our last conclusion is that combin-
ing more than 2 measures seems to be interesting
because, most of the time, more than one mea-
sure obtains the correct correspondent. This last
remark has inspired the fusion algorithm that is
described in the next section.
(a) (b)
(c) (d)
COLOR CODES
Measure NCC LSAD GC CENSUS SMAD
Non-
occ.
Occ.
Figure 3: Study of complementarity (an example with im-
age of Figure 1) The pixels in grey levels correspond to
pixels with more than 1 correct match over the N
m
combina-
tions. The darker the pixel, the higher the number of correct
matches. The color codes are used when only one measure
gives the correct match. Moreover, we discern the occluded
areas (Occ.) from the non-occluded areas (Non-occ.). In
(a), it shows that SMAD is efficient near occlusions whereas
GC is more efficient than SMAD in low textured areas.
In conclusion, we have decided to present an algo-
rithm that combines N
m
different measures and we il-
lustrate the interest of this kind of algorithm by using
the two complementary measures: GC and SMAD.
COMBINATION OF CORRELATION MEASURES FOR DENSE STEREO MATCHING
601
5 ALGORITHM OF FUSION
In our first proposition of combination (Chambon and
Crouzil, 2004), the algorithm was designed to take
into account occlusions. In consequence, as expected,
the results are good in non-occluded areas and also
in occluded areas. The goal of this new algorithm
is to improve this work by combining the advantages
of each measure according to each kind of regions in
order to take into account more difficulties, like low
textured or noisy regions. Towards this goal, instead
of detecting the occlusions, we work directly on how
to merge the disparities by taking into account their
variations in several matching maps (each map has
been obtained with a different correlation measure).
Our method of fusion is based on two steps, the
principle being to estimate a disparity map with each
measure and then to merge the results applying the
following two rules:
(1) If more than one disparity map give the same
match, the correspondence is validated and this
result is considered as reliable.
(2) In an “undetermined area” (i.e. rule (1) is not
respected), the “most reliable” disparity is kept.
The difficulty is to determine the most reliable. In
this paper, we consider the disparities found in the
neighborhood in the matching map of each con-
sidered measure.
Formally, these two rules can be defined as:
(1) Initialization for each pixel p
l
The term d
N
m
f
is
the final disparity, after the fusion of N
m
correla-
tion measures.
If d |
(d = argmax
e
M
e
(p
l
) && (d
N
m
2
)
then d
N
m
f
(p
l
) d
else the disparity is undetermined.
We define :
M
e
(p
l
) = #{i|d
i
(p
l
) == e}.
(2) Refinement For each pixel p
l
without dispar-
ity, we estimate the ambiguity, denoted by A, of
each possible disparity d
i
(p
l
). For the estimation
of the function A, which represents how much the
estimated disparity is reliable, we suppose that if
most of the neighbors have the same disparity (in
the same result obtained with the same correla-
tion measure) the estimated disparity can be con-
sidered as sure. In consequence, for estimating A,
we compare the studied disparity with the mean
of the disparities in the neighborhood, denoted by
N . The disparity with the lowest ambiguity is
kept only if this ambiguity is not important, i.e.
higher than a given threshold ε.
For (each pixel p
l
)
d = argmin
i∈{1,N
m
}
A(d
i
(p
l
)) with
A(d
i
(p
l
)) = |d
i
(p
l
)
1
#N (p
l
)
kN (p
l
)
d
i
(p
k
)|
with N (p
l
) the neighborhood of p
l
4
.
If (d < ε)
5
then d
N
m
f
(p
l
) d
else p
l
is occluded.
6 MATCHING RESULTS
For this part, the fusion algorithm has been tested
with the fusion of the two most complementary mea-
sures: GC and SMAD. In order to try to detect oc-
clusions and erroneous matches, we use the symme-
try constraint that consists in estimating correspon-
dences from the left image to the right image and
then from the right to the left and in considering non-
coherent matches as occluded pixels (these occluded
pixels are shown in black in each disparity map). Ta-
ble 3 shows the improvements of the percentage of
erroneous matches obtained with the new algorithm
of fusion on all the 42 tested images. The decreasing
of this percentage is from 2.47 to 4.08 (with complex
images), i.e. the images difficult to match because of
the occlusion areas or the untextured areas. However,
this improvement did not reach the theoretically max-
imal improvement that is showed in Table 2. Another
way to appreciate the quality of the results is to look
at the disparity maps that are given in Figure 4. The
disparity maps obtained by fusion are the best ones
because they contain less false negatives than the oth-
ers. Moreover, the occlusion areas are better delim-
ited (the contours are clean and contain no “holes”).
As in the first step we have to estimate each dispar-
ity map induced by each measure, the execution time
is the sum of the execution time of each correlation-
based algorithm. The fusion algorithm does not take
much time in comparison to the second step. In con-
sequence, the higher the number of merged results,
the higher the execution time and the execution time
depends on the chosen measures. In our test, for ex-
ample with Tsukuba, GC takes 17.6 s and SMAD
39.77 s, so finally, the fusion algorithm takes about
1 minute.
4
The 8 neighbors have been taken into account.
5
We have chosen ε = 1.
VISAPP 2011 - International Conference on Computer Vision Theory and Applications
602
Table 3: Precentage of erroneous matches H represents
the images with untextured areas, like Tsukuba pair, O, the
images with a lot of occlusions, like Aloe pair and R, the
images with no major difficulties, like Cones pair (see Fig-
ure 4 for these images). The term Tc refers to the results
obtained with a theoretical or optimal fusion, see Table 2.
The percentage of erroneous matches with the new method
is better than those obtained with the GC measure alone and
in particular with complex scenes.
METHOD H+O O H R Total
GC alone 25.6 17.5 19.6 15.9 20.9
Fusion 22.1 13.5 16.9 13.5 17.5
Tc 19.4 10.8 15.4 10.8 15.3
Image Tsukuba Image Cones Image Aloe
(a)
(b)
(c)
(d)
Figure 4: Disparity maps – (a), left image, (b) disparity map
with SMAD, (c), with GC, (d), with FUSION. The fusion
results present less false negatives, in particular for Cones
and Aloe. The example of Tsukuba illustrates the limits of
the method and the need to combine more than 2 measures.
7 CONCLUSIONS
In this paper, we proposed a study of the comple-
mentarity of correlation measures, illustrated with vi-
sualization maps, and we introduced a new way to
combine complementary measures. Moreover, we
highlight the most complementary measures: GC and
SMAD. The tests on 42 images illustrate the im-
provement of performances of the new fusion algo-
rithm compared to classic correlation matching, i.e.
based on one correlation measure alone. These re-
sults are encouraging but also exhibit the limit of this
approach that might lead to investigate the fusion ap-
proach based on a voting method in the neighborhood
of the studied pixel or to distinguish the most reliable
measures (in the first step of the algorithm). More-
over, we will study the influence of the number of
measures involved in the proposed algorithm.
REFERENCES
Aschwanden, P. and Guggenbl, W. (1992). Experimental
results from a comparative study on correlation type
registration algorithms. In Frstner, W. and Ruwiedel,
S., editors, Robust computer vision: Quality of Vision
Algorithms, pages 268–282. Wichmann.
Bhat, D. and Nayar, S. (1998). Ordinal measures for image
correspondence. PAMI, 20(4):415–423.
Chambon, S. and Crouzil, A. (2004). Towards correlation-
based matching algorithms that are robust near occlu-
sions. In ICPR, volume 3, pages 20–23.
Chambon, S. and Crouzil, A. (2011). Occlusions handling
in dense stereo matching. Pattern Recognition. sub-
mitted.
Crouzil, A., Massip-Pailhes, L., and Castan, S. (1996). A
new correlation criterion based on gradient fields sim-
ilarity. In ICPR, volume 1, pages 632–636.
Delon, J. and Roug, B. (2004). Analytic study of the stereo-
scopic correlation. Research report 2004-19, CMLA
(ENS Cachan).
Kaneko, S., Murase, I., and Igarashi, S. (2002). Robust im-
age registration by increment sign correlation. Pattern
Recognition, 35(10):2223–2234.
Lan, Z. and Mohr, R. (1997). Robust location based par-
tial correlation. Technical Report RR-3186, INRIA,
France.
Moravec, H. (1980). Obstacle Avoidance and Navigation in
the Real World by a Seeing Robot Rover. Phd thesis,
Carnegie Mellon University.
Rousseeuw, P. and Croux, C. (1992). L
1
-statistical analysis
and related methods. In Dodge, Y., editor, Explicit
Scale Estimators with High Breakdown Point, pages
77–92. Elsevier.
Rziza, M. and Aboutajdine, D. (2001). Dense disparity
map estimation using cumulants. In Conference on
Telecommunications, ConfTele.
Scharstein, D. and Szeliski, R. (2002). A taxomomy and
evaluation of dense two-frame stereo correspondence
algorithms. IJCV, 47(1):7–42.
Scharstein, D. and Szeliski, R. (2003). High-Accuracy
Stereo Depth Maps Using Structured Light. In CVPR,
volume 1, pages 195–202.
Seitz, P. (1989). Using local orientational information as
image primitive for robust object recognition. In Vi-
sual Communication and Image Processing IV, vol-
ume SPIE–1199, pages 1630–1639.
Ullah, F., Kaneko, S., and Igarashi, S. (2001). Orienta-
tion Code Matching For Robust Object Search. IE-
ICE Transactions on Information and Systems, E-84-
D(8):999–1006.
Zabih, R. and Woodfill, J. (1994). Non-parametric local
transforms for computing visual correspondence. In
ECCV, pages 151–158.
COMBINATION OF CORRELATION MEASURES FOR DENSE STEREO MATCHING
603