COMBINATION OF CORRELATION MEASURES FOR DENSE

STEREO MATCHING

Sylvie Chambon

Institut Franc¸ais des Sciences et Technologies des Transports, de l’Am´enagement et des R´eseaux (IFSTTAR)

Champs-sur-Marne, France

Alain Crouzil

Institut de Recherche en Informatique de Toulouse (IRIT), Toulouse, France

Keywords:

Stereovision, Matching, Correlation, Classic measures, Robust statistics, Fusion.

Abstract:

In the context of dense stereo matching of pixels, we study the combination of different correlation mea-

sures. Considering the previous work about correlation measures, we use some measures that are the most

signiﬁcant in ﬁve kinds of measures based on: cross-correlation, classic statistics, image derivatives, non-

parametric statistics and robust statistics. More precisely, this study validates the possible improvement of

stereo-matching by combining complementary correlation measures and it also highlights the two measures

that can be combined in order to take advantage of the different methods: Gradient Correlation measure (GC)

and Smooth Median Absolute Deviation measure (SMAD). Finally, we introduce an algorithm of fusion that

allows to combine automatically correlation measures.

1 INTRODUCTION

Finding homologous pixels in a stereo pair of im-

ages is one of the most important step in order to

recover the 3D structure of a scene by stereovision.

Many methods have been proposed in the literature

where local methods are distinguished from global

ones. More precisely, matching methods can be de-

scribed with essential components, this term has been

ﬁrstly introduced by (Scharstein and Szeliski, 2002).

These components are: the matching cost, the opti-

mization method, the introduction of multiple passes,

i.e. to improve the matching performances, some ap-

proaches are based on several methods applied in se-

quence. This description leads to a four type cate-

gorization: local methods, global ones (without cor-

relation measure), mixed method (global ones with a

correlation measure) and the methods with multiple

passes. Our purpose is to introduce a multipass algo-

rithm based on combination of local methods.

Local methods are easy to implement, low time

consuming, quite efﬁcient, and consequently inten-

sively used. Unfortunately, characterising how the

existing correlation measures are effective, i.e. ob-

taining correct matching in different areas of the im-

age, is still an open issue. In our work (Chambon

and Crouzil, 2011) on local costs, the inﬂuence of dif-

ferent measures on the quality of stereo matching re-

sults have been studied, in particular, near occluded

regions and, in (Chambon and Crouzil, 2004), we

demonstrated that a measure based on a robust statis-

tics tool combined with a cross correlation measure

allows to obtain better performances than using a cor-

relation measure alone. These results raise up three

new questions:

• Which are the correlation measures that are the

most complementary to cover all the matching

difﬁculties?

• Is it advantageous to combine numerous measures

and how many?

• Following up the previous questions, can we pro-

pose an algorithm that combines more than one

measure to obtain a whole dense and correct

matching (or disparity) map

and is it more efﬁ-

cient than the method based on a sole measure?

A disparity (or matching) map represents for each

pixel, the distance between the pixel and its correspondent.

When the disparity map is represented by a grey level im-

age, the clearer the pixel, the larger the distance is. Black

pixels are occluded pixels.

598

Chambon S. and Crouzil A..

COMBINATION OF CORRELATION MEASURES FOR DENSE STEREO MATCHING.

DOI: 10.5220/0003333305980603

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2011), pages 598-603

ISBN: 978-989-8425-47-8

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

Existing methods are brieﬂy presented before the de-

scription of the data set used for validating the pro-

posed method. Then, combination study is described

leading to the proposal for matching algorithm based

on merging the results obtained from various correla-

tion measures. Finally, results are presented.

2 CORRELATION MEASURES

The principle of a local cost, i.e. a correlation mea-

sure, is to consider that two homologous pixels and

their respective neighborhoods, are similar, from a

photometric point of view. The main difﬁculties of

these methods are: illumination changes, untextured

areas and occlusions. Many measures have been in-

troduced to tackle out these difﬁculties. Based on the

results of 41 measures on a benchmark of 42 images,

presented in (Chambon and Crouzil, 2011), we pro-

pose to study the complementarity of these measures,

and, in particular, the best measures of each families.

Table 1: Notations used for the description of the measures.

The images with w ∈ {l,r} (left and right).

i, j

The grey level of the pixel p

i, j

of coordinates

(i, j) in image I

is I

i, j

. Moreover, p

x,y

is the

correspondant pixel of p

i, j

∗

The number of pixels in the neighborhood:

= (2N

+ 1) ×(2N

+ 1), N

, N

∈ N

∗

The vector of grey levels of pixels in the cor-

relation windows (in I

= (···I

i+p, j+q

·· ·)

= (··· f

·· ·)

where

is the matrix transposition operator

and p ∈ [−N

], q ∈ [−N

The mean of the grey levels in f

The element k of vector f

The L

norms: kf

= (

∑

−1

k=0

| f

)

1/P

with P ∈ N

∗

and kf

k = kf

In the following description, when no explicit ref-

erence is given, the reader should consult (Aschwan-

den and Guggenbl, 1992). We brieﬂy present the no-

tations in Table 1 and the ﬁve best measures of the

different families that are considered.

(1) Family 1. Cross Correlation-based Measures –

All these measures are based on a scalar prod-

uct (Moravec, 1980) and NCC (Normalized Cross

Correlation) is the most efﬁcient one:

NCC(f

) =

· f

kkf

. (1)

(2) Family 2. Classical Statistics-based Measures –

These types of measures can be used: measures

based on a distance or/and that are locally cen-

tered, variance-based or fourth-order cumulant-

based measures (Rziza and Aboutajdine, 2001).

The best one is the LSAD (Locally scaled Sum of

Absolute Differences) deﬁned by:

LSAD(f

) = kf

−

. (2)

(3) Family 3. Derivatives-based Measures – Instead

of using grey levels, these measures employ the

derivatives of the images at different orders (Seitz,

1989). Most of the existing measures use only

the direction of the gradient vectors (Ullah et al.,

2001), but, this kind of information can induce er-

rors, in particular, with low norm gradient vectors

whose direction is not reliable. In consequence,

the most performant measure is based on the simi-

larity of the image gradient vectors, GC (Gradient

Correlation) (Crouzil et al., 1996). If the gradi-

ent vector at p

i, j

in I

is ∇I

i, j

and the norm is

denoted by k∇I

i, j

k, the deﬁnition of GC is:

GC(f

) =

∑

k∇I

i+p, j+q

− ∇I

v+p,w+q

∑

(k∇I

i+p, j+q

k + k∇I

v+p,w+q

(3)

with

∑

p=−N

∑

q=−N

(4) Family 4. Non-parametric Statistics-based Mea-

sures – They are based on the order of the

grey levels inside the correlation window (Kaneko

et al., 2002; Bhat and Nayar, 1998). Using

the order of the grey levels allows these mea-

sures to be robust against noises and occlusions

but, sometimes, it also gives an ambiguous re-

sult, i.e. the best correlation score is obtained for

the wrong correspondant. The most performant

measure of this family is a non-parametric one,

CENSUS (Zabih and Woodﬁll, 1994). The simi-

larity measure uses a transform that produces a bit

chain which represents the pixels with an intensity

lower than the central pixel:

) =

k∈[0;N

−1]

ξ( f

, f

where ξ( f

, f

) = 1 if f

< f

and 0 else-

where. CENSUS is the sum of the Hamming dis-

tances, denoted by D

, between the codes of each

pixel of the correlation window:

CENSUS(f

) =

−1

∑

k=0

),R

)). (4)

COMBINATION OF CORRELATION MEASURES FOR DENSE STEREO MATCHING

599

(5) Family 5. Robust Measures – We are particu-

larly concerned with the occlusion problem which

appears in the vicinity of a pixel near a depth

discontinuity. In fact, some pixels lie on a ﬁrst

level of depth whereas the other pixels lie on a

second level. It can disturb the matching pro-

cess and introduce erroneous matches. To take

this problem into account, robust statistics tools

are introduced as correlation measures, like par-

tial correlation (Lan and Mohr, 1997) or pseudo-

norms (Delon and Roug, 2004). The most efﬁ-

cient is SMAD, the Smooth Median Absolute De-

viation (Rousseeuw and Croux, 1992):

SMAD(f

) =

h−1

∑

k=0

− f

− med(f

− f

))

k:N

−1

(5)

where the ordered values of f

are represented by:

( f

)

0:N

−1

≤ ... ≤ ( f

)

−1:N

−1

. It can be inter-

preted as a robust centered (median) and troncated

distance and, in our experiments, h =

Robust and non-parametric measures (families 4 and

5) are efﬁcient in the presence of noises and/or oc-

clusions whereas the classic ones (families 1 and 2)

obtain better results when there is no major problems.

The derivatives measures have been designed to be

more efﬁcient in the presence of noises, but, most of

the time, they are really less efﬁcient than the other

ones, except GC which seems to have better results

than the others, in particular in low textured areas.

Interested readers can ﬁnd more details about all the

measures in (Chambon and Crouzil, 2011).

3 EVALUATION PROTOCOL

To validate our approach, 42 images, with their

ground truth or reference disparity maps, have been

tested (see Figure 1 for examples): 1 random-

dot stereogram, 2 synthetic pairs (Murs) and one

real image pair

, and, ﬁnally, 38 real pairs in-

troduced by Scharstein and Szeliski (9 in 2002

(Tsukuba) (Scharstein and Szeliski, 2002), 2 in 2003

(Cones) (Scharstein and Szeliski, 2003), 6 in 2005

and 22 in 2006 (Aloe)). The last ones are the most

complex scenes. The most consequent evaluationpro-

tocol to highlight the different performances of global

methods is given by the authors of (Scharstein and

Szeliski, 2002)

. Compared to their protocol, our

http://www.irit.fr/∼Benoit.Bocquillon/MYCVR/

research.php http://www.irit.fr/∼Benoit.Bocquillon/

MYCVR/research.php

http://vision.middlebury.edu/stereo/eval/

NAME

(a) (b) NAME (a) (b)

Murs Tsukuba

(2002)

Cones

(2003)

Aloe

(2006)

Figure 1: Examples of data used in our tests (left images

(a) and disparities

(b)). Interested readers can ﬁnd more

explanations about the estimation of these reference maps

(ground truth), both in the cited papers and in the cited web

page of section 3 (active vision is used and/or some con-

straints about the geometry of the scene are introduced).

comparison is based on all their 38 images instead of

Many criteria can be used to evaluate the quality

of the results based on ground truth (Chambon and

Crouzil, 2004). However, for this evaluation, we use

the percentage of erroneous matches, noted ER, and

the evaluation of the complementarity of the results

(also based on ER) because they are the two most im-

portant aspects to consider in order to evaluate the im-

pact of the proposed fusion algorithm.

4 COMPLEMENTARITY STUDY

To evaluate the complementarity of similarity mea-

sures, we analyse the percentage of erroneous

matches (ER) for each measure used alone, and for

each combination, by supposing that the correct cor-

respondent is always kept (when one of the measures

that are combined ﬁnds the exact correspondent), see

Table 2 for the combination of 2 measures and Fig-

ure 2 for the percentage of erroneous matches with

more than 2 measures. We use these notations:

• M

, with i ∈ {1;...;N

}, the N

tested measures;

• d

) the disparity of the pixel p

given by the

ground truth;

• d

), the disparity given by the algorithm based

on the correlation measure M

;

• d

) the theoretical or optimal combination of

measures.

More formally, the optimal combinations of the re-

sults over N

measures (N

∈ {2;...;41} because in

our previous work 41 measures have been studied),

denoted d

, is simply estimated by following this

rule, for each pixel p

if ∃ i ∈ {1;...;N

} where d

) = d

)

then d

= d

) and d

is correct

else d

is erroneous.

We have tested these conﬁgurations:

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

600

) N

= 2: All the 41× 41 combinations have been

evaluated and it highlights the best combination:

GC and SMAD with only 14.13% for the mean

percentage ER on the 42 images.

) N

= 41: All the 41 measures have been theo-

retically combined and the results show that the

percentage ER can be decreased to 7.26%.

) N

∈ {3;.. .;40}: When we used the best com-

bination GC-SMAD, any kind of measures can

be added, the performances are quite equivalent.

With 3 measures, the percentage ER decreases to

about 13% and, then, it goes slowly to the min-

imum percentage ER (about 1% for each added

measure) reached by the optimal combination of

41 measures. Moreover, when more than 10 mea-

sures are used, ER is close to this minimum.

First, the results show how the local matching with

one correlation measure can be theoretically im-

proved, and, second, which measures are the most

complementary. In Table 2, we can remark that com-

bining different measures can highly improve the re-

sults: on the whole image, from 7% of improvement

(2 measures combined) to 17% (41 measures).

Table 2: Percentages of erroneous matches (ER) with each

of the 5 best correlation measures and the best combinations

of 2 correlation measures.

MEASURE

ER MEASURE ER

NCC 23.2 LSAD 23.3

GC 21 CENSUS 20.2

SMAD

27.9 GC-SMAD 14.13

Figure 2: Percentage of erroneous matches versus the num-

ber of correlation measures theoretically combined. This

graph illustrates the maximal number of measures that are

interesting to combine (10) but it also highlights the biggest

improvement obtained with only 2 measures.

The following analyses illustrate these results. Us-

ing four maps, see Figure 3, we propose to visualize:

(1) The comparison of the two most complementary

measures – As expected by the deﬁnitions of these

measures, this visualization illustrates that SMAD

compensates for the weaknesses of GC in occlu-

sion areas (or near occlusion areas) whereas GC

compensates for the weaknesses of SMAD in non-

occluded areas and, in particular, areas that are

low textured, see (a) in Figure 3.

(2) The areas with 1 correct correspondent over 5,

10 or 41 correspondents – The most distinctive

measure is GC, i.e. it is the most complementary

measure to the other measures. SMAD is the sec-

ond most complementary measure to the others,

see (b) for 5, (c) for 10 and (d) for 41 in Figure 3.

Moreover, with 10 different correlation measures,

the results are quite near the results with 41 mea-

sures. And, our last conclusion is that combin-

ing more than 2 measures seems to be interesting

because, most of the time, more than one mea-

sure obtains the correct correspondent. This last

remark has inspired the fusion algorithm that is

described in the next section.

(a) (b)

COLOR CODES

Measure NCC LSAD GC CENSUS SMAD

Non-

occ.

Occ.

Figure 3: Study of complementarity (an example with im-

age of Figure 1) – The pixels in grey levels correspond to

pixels with more than 1 correct match over the N

combina-

tions. The darker the pixel, the higher the number of correct

matches. The color codes are used when only one measure

gives the correct match. Moreover, we discern the occluded

areas (Occ.) from the non-occluded areas (Non-occ.). In

(a), it shows that SMAD is efﬁcient near occlusions whereas

GC is more efﬁcient than SMAD in low textured areas.

In conclusion, we have decided to present an algo-

rithm that combines N

different measures and we il-

lustrate the interest of this kind of algorithm by using

the two complementary measures: GC and SMAD.

COMBINATION OF CORRELATION MEASURES FOR DENSE STEREO MATCHING

601

5 ALGORITHM OF FUSION

In our ﬁrst proposition of combination (Chambon and

Crouzil, 2004), the algorithm was designed to take

into account occlusions. In consequence, as expected,

the results are good in non-occluded areas and also

in occluded areas. The goal of this new algorithm

is to improve this work by combining the advantages

of each measure according to each kind of regions in

order to take into account more difﬁculties, like low

textured or noisy regions. Towards this goal, instead

of detecting the occlusions, we work directly on how

to merge the disparities by taking into account their

variations in several matching maps (each map has

been obtained with a different correlation measure).

Our method of fusion is based on two steps, the

principle being to estimate a disparity map with each

measure and then to merge the results applying the

following two rules:

(1) If more than one disparity map give the same

match, the correspondence is validated and this

result is considered as reliable.

(2) In an “undetermined area” (i.e. rule (1) is not

respected), the “most reliable” disparity is kept.

The difﬁculty is to determine the most reliable. In

this paper, we consider the disparities found in the

neighborhood in the matching map of each con-

sidered measure.

Formally, these two rules can be deﬁned as:

(1) Initialization for each pixel p

– The term d

the ﬁnal disparity, after the fusion of N

correla-

tion measures.

If ∃ d |



(d = argmax

) && (d ≥

)



then d

) ← d

else the disparity is undetermined.

We deﬁne :

) = #{i|d

) == e}.

(2) Reﬁnement – For each pixel p

without dispar-

ity, we estimate the ambiguity, denoted by A, of

each possible disparity d

). For the estimation

of the function A, which represents how much the

estimated disparity is reliable, we suppose that if

most of the neighbors have the same disparity (in

the same result obtained with the same correla-

tion measure) the estimated disparity can be con-

sidered as sure. In consequence, for estimating A,

we compare the studied disparity with the mean

of the disparities in the neighborhood, denoted by

N . The disparity with the lowest ambiguity is

kept only if this ambiguity is not important, i.e.

higher than a given threshold ε.

For (each pixel p

)

d = argmin

i∈{1,N

}

A(d

)) with

A(d

)) = |d

) −

#N (p

)

∑

k∈N (p

)

with N (p

) the neighborhood of p

If (d < ε)

then d

) ← d

else p

is occluded.

6 MATCHING RESULTS

For this part, the fusion algorithm has been tested

with the fusion of the two most complementary mea-

sures: GC and SMAD. In order to try to detect oc-

clusions and erroneous matches, we use the symme-

try constraint that consists in estimating correspon-

dences from the left image to the right image and

then from the right to the left and in considering non-

coherent matches as occluded pixels (these occluded

pixels are shown in black in each disparity map). Ta-

ble 3 shows the improvements of the percentage of

erroneous matches obtained with the new algorithm

of fusion on all the 42 tested images. The decreasing

of this percentage is from 2.47 to 4.08 (with complex

images), i.e. the images difﬁcult to match because of

the occlusion areas or the untextured areas. However,

this improvement did not reach the theoretically max-

imal improvement that is showed in Table 2. Another

way to appreciate the quality of the results is to look

at the disparity maps that are given in Figure 4. The

disparity maps obtained by fusion are the best ones

because they contain less false negatives than the oth-

ers. Moreover, the occlusion areas are better delim-

ited (the contours are clean and contain no “holes”).

As in the ﬁrst step we have to estimate each dispar-

ity map induced by each measure, the execution time

is the sum of the execution time of each correlation-

based algorithm. The fusion algorithm does not take

much time in comparison to the second step. In con-

sequence, the higher the number of merged results,

the higher the execution time and the execution time

depends on the chosen measures. In our test, for ex-

ample with Tsukuba, GC takes 17.6 s and SMAD

39.77 s, so ﬁnally, the fusion algorithm takes about

1 minute.

The 8 neighbors have been taken into account.

We have chosen ε = 1.

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

602

Table 3: Precentage of erroneous matches – H represents

the images with untextured areas, like Tsukuba pair, O, the

images with a lot of occlusions, like Aloe pair and R, the

images with no major difﬁculties, like Cones pair (see Fig-

ure 4 for these images). The term Tc refers to the results

obtained with a theoretical or optimal fusion, see Table 2.

The percentage of erroneous matches with the new method

is better than those obtained with the GC measure alone and

in particular with complex scenes.

METHOD H+O O H R Total

GC alone 25.6 17.5 19.6 15.9 20.9

Fusion 22.1 13.5 16.9 13.5 17.5

Tc 19.4 10.8 15.4 10.8 15.3

Image Tsukuba Image Cones Image Aloe

(a)

(b)

(c)

(d)

Figure 4: Disparity maps – (a), left image, (b) disparity map

with SMAD, (c), with GC, (d), with FUSION. The fusion

results present less false negatives, in particular for Cones

and Aloe. The example of Tsukuba illustrates the limits of

the method and the need to combine more than 2 measures.

7 CONCLUSIONS

In this paper, we proposed a study of the comple-

mentarity of correlation measures, illustrated with vi-

sualization maps, and we introduced a new way to

combine complementary measures. Moreover, we

highlight the most complementary measures: GC and

SMAD. The tests on 42 images illustrate the im-

provement of performances of the new fusion algo-

rithm compared to classic correlation matching, i.e.

based on one correlation measure alone. These re-

sults are encouraging but also exhibit the limit of this

approach that might lead to investigate the fusion ap-

proach based on a voting method in the neighborhood

of the studied pixel or to distinguish the most reliable

measures (in the ﬁrst step of the algorithm). More-

over, we will study the inﬂuence of the number of

measures involved in the proposed algorithm.

REFERENCES

Aschwanden, P. and Guggenbl, W. (1992). Experimental

results from a comparative study on correlation type

registration algorithms. In Frstner, W. and Ruwiedel,

S., editors, Robust computer vision: Quality of Vision

Algorithms, pages 268–282. Wichmann.

Bhat, D. and Nayar, S. (1998). Ordinal measures for image

correspondence. PAMI, 20(4):415–423.

Chambon, S. and Crouzil, A. (2004). Towards correlation-

based matching algorithms that are robust near occlu-

sions. In ICPR, volume 3, pages 20–23.

Chambon, S. and Crouzil, A. (2011). Occlusions handling

in dense stereo matching. Pattern Recognition. sub-

mitted.

Crouzil, A., Massip-Pailhes, L., and Castan, S. (1996). A

new correlation criterion based on gradient ﬁelds sim-

ilarity. In ICPR, volume 1, pages 632–636.

Delon, J. and Roug, B. (2004). Analytic study of the stereo-

scopic correlation. Research report 2004-19, CMLA

(ENS Cachan).

Kaneko, S., Murase, I., and Igarashi, S. (2002). Robust im-

age registration by increment sign correlation. Pattern

Recognition, 35(10):2223–2234.

Lan, Z. and Mohr, R. (1997). Robust location based par-

tial correlation. Technical Report RR-3186, INRIA,

France.

Moravec, H. (1980). Obstacle Avoidance and Navigation in

the Real World by a Seeing Robot Rover. Phd thesis,

Carnegie Mellon University.

Rousseeuw, P. and Croux, C. (1992). L

-statistical analysis

and related methods. In Dodge, Y., editor, Explicit

Scale Estimators with High Breakdown Point, pages

77–92. Elsevier.

Rziza, M. and Aboutajdine, D. (2001). Dense disparity

map estimation using cumulants. In Conference on

Telecommunications, ConfTele.

Scharstein, D. and Szeliski, R. (2002). A taxomomy and

evaluation of dense two-frame stereo correspondence

algorithms. IJCV, 47(1):7–42.

Scharstein, D. and Szeliski, R. (2003). High-Accuracy

Stereo Depth Maps Using Structured Light. In CVPR,

volume 1, pages 195–202.

Seitz, P. (1989). Using local orientational information as

image primitive for robust object recognition. In Vi-

sual Communication and Image Processing IV, vol-

ume SPIE–1199, pages 1630–1639.

Ullah, F., Kaneko, S., and Igarashi, S. (2001). Orienta-

tion Code Matching For Robust Object Search. IE-

ICE Transactions on Information and Systems, E-84-

D(8):999–1006.

Zabih, R. and Woodﬁll, J. (1994). Non-parametric local

transforms for computing visual correspondence. In

ECCV, pages 151–158.

COMBINATION OF CORRELATION MEASURES FOR DENSE STEREO MATCHING

603