SIMULATING DYNAMICAL SYSTEMS FOR EARLY VISION

Babette Dellen

1,2

and Florentin W¨org¨otter

Bernstein Center for Computational Neuroscience,Max-Planck Institute for Dynamics and Self-Organization

Bunsenstrasse 10, G¨ottingen, Germany

Institut de Rob`otica i Inform`atica Industrial (CSIC-UPC), Llorens i Artigas 4-6, 08028 Barcelona, Spain

Bernstein Center for Computational Neuroscience, University G¨ottingen, Bunsenstrasse 10, G¨ottingen, Germany

Keywords:

Early vision, Stereo matching, Energy minimization, Dynamical systems.

Abstract:

We propose a novel algorithm for stereo matching using a dynamical systems approach. The stereo correspon-

dence problem is ﬁrst formulated as an energy minimization problem. From the energy function, we derive a

system of differential equations describing the corresponding dynamical system of interacting elements, which

we solve using numerical integration. Optimization is introduced by means of a damping term and a noise

term, an idea similar to simulated annealing. The algorithm is tested on the Middlebury stereo benchmark.

1 INTRODUCTION

In stereo vision, 3D information is reconstructed from

stereo image pairs, i.e. two images of the same scene

taken from a different viewpoint. Algorithmic so-

lutions to this problem are not only of interest for

the ﬁeld of computer vision [Scharstein and Szeliski,

2002], but also for related ﬁelds, such as computa-

tional neuroscience [Roe et al., 2007]. Different ap-

proacheshavebeen compared in a study by Scharstein

and Szeliski (2002). In general, we distinguish be-

tween local algorithms and methods based on global

optimization. Local methods are mainly character-

ized by their matching cost computation and cost ag-

gregation step, while global algorithms formulate a

global energy function which is then minimized. This

energy minimization problem is known to be NP hard.

The algorithms are distinguished based on the mini-

mization procedure used. Common methods are sim-

ulated annealing [Marroquin et al., 1987, Geman and

Geman, 1984, Barnard, 1989], graph cuts [Scharstein

and Szeliski, 2002, Boykov et al., 2001], and max

ﬂow [Roy, 1999]. If global optimization is reduced

to independent scanlines, methods such as dynamic

programming or scanline optimization can be used to

compute a solution in polynomal time [Scharstein and

Szeliski, 2002].

In this paper, we propose a novel framework for

computing approximate solutions to the energy min-

imization problem on the example of early stereo vi-

sion. From the energy function, a system of ordi-

nary differential equations, determining the temporal

evolution of the system, can be derived. Each pixel

represents a “mass point”, moving along a single di-

mension with an amplitude encoding the disparity es-

timate (or label) of the pixel. Each mass is moving un-

der the inﬂuence of a data force, which is derivedfrom

the image data, and interacts with its neighbors via an

interaction force. The resulting system of differen-

tial equations is solved using a Runge Kutta method

of 4th order with ﬁxed step size. A damping force

ensures that the dynamical systems settles at a stable

state.

2 THE MODEL SYSTEM

2.1 Stereo Vision as Energy

Minimization

The general framework we consider can be deﬁned as

follows. Let P be the set of pixels in an image. The

goal is to ﬁnd a disparity z

for each pixel p ∈ P which

minimize a gobal energy

E(z

) = E

data

) +

∑

q∈N(p)

int

) , (1)

where N(p) is the neighborhood of pixel p. The data

term E

data

measures how well the disparity values are

in agreement with the input data. The interaction term

525

Dellen B. and Wörgötter F. (2009).

SIMULATING DYNAMICAL SYSTEMS FOR EARLY VISION.

In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, pages 525-528

DOI: 10.5220/0001800905250528

 SciTePress

∑

q∈N(p)

int

) encodes the smoothing assump-

tions of the algorithm.

In this work, the data energy is derived from the

stereo image I

left

and I

right

data

) = k|I

left

+ z

) − I

right

)| (2)

using absolute differences and a parameter k.

We further assume symmetric interactions be-

tween two pixels p and q with

int

) = f(z

− z

) . (3)

The function f will be speciﬁed later on.

2.2 Dynamical Systems Formulation

The energy function corresponds to a system of inter-

acting elements moving under the inﬂuence of a data

force

data

) = −∇E

data

) , (4)

where ∇E

data

) is the gradient of the data potential,

and an interaction force (on pixel p)

int

p,q

) = −∇E

int

) . (5)

A schematic of the model is shown in Fig. 1.

data

int

Figure 1: Schematic of the dynamical system. The pixels

“mass points” are connected via elastic interaction forces

int

. The image data excerts a data force F

data

on each mass

point, which tends to move the mass towards the position

which corresponds to a minimum data cost.

We deﬁne the interaction energy implicitly by

choosing a discontinuity preserving interaction force

with

int

p,q

) = κ(d

max

− |z

− z

|)(z

− z

)/d

max

(6)

if |z

− z

| ≦ d

max

and zero otherwise. The parameter

max

deﬁnes the maximum disparity and κ determines

the maximum amount of smoothing.

The dynamics of the system is described by a sys-

tem of ordinary differential equations

/dt = v

(7)

/dt = F

data

) + τ− γv

−

∑

q∈N(p)

int

p,q

) , (8)

where τ is a noise term and γv

a damping term with

damping constant γ. These additional forces have

been added to move the dynamical system towards a

local minimum.

2.3 Finding a Local Minimum

The system of differential equations is solved using a

fourth order Runge Kutta technique with a step size of

0.1, starting from random initial conditions. Cooling

is introduced through the damping force and the noise

term τ. With the course of time, we decrease the noise

according to

τ = p

− t)/n

(9)

where n

is the total number of iterations and t is the

current iteration number. The number p

is drawn

from a Gaussian distribution with a standard devia-

tion of 5 pixels. We further found it advantageous

to decrease the smoothing parameter accordingly as

well, such that

κ = κ

− t)/n

. (10)

2.4 Boundary Conditions

The amplitude of the dynamical variable z

is re-

stricted to predeﬁned disparity range. We realize this

boundary conditions by including a potential barrier

with E(z

) = c if z

> d

max

or z

< 0. The parame-

ter c is chosen to be larger than the maximum absolute

difference between image pixels. Further if during the

computations z

> d

max

+ 1, we push the value back

to z

= d

max

+1. The same strategy is used if z

< −1.

Then the value is pushed back to z

= −1.

3 RESULTS

We evaluated the performance of the algo-

rithm using the Middlebury stereo benchmark

(www.middlebury.edu/stereo) [Scharstein and

Szeliski, 2002], containing four stereo pairs,

Tsukuba, Venus, Teddy, and Cones. The parameters

were kept constant for all stereo pairs with κ

= 10,

γ = 0.2, and k = 0.1. On the grid, each pixel was

allowed to interact with its left, right, up, and down

nearest neighbor.

The results of the algorithm are presented in

Fig. 2. Since the disparities are formulated as con-

tinuous variables, the algorithm returns subpixel dis-

parities. The resulting disparity maps capture the

basic structure of the scene. Depth discontinuities

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

526

(a) Tsukuba Left (b) Tsukuba Result

(e) Teddy Left (f) Teddy Result

(g) Cones Left (h) Cones Result

Figure 2: Disparity results (b,d,f,h) for the Middlbury stereo

data set (a,c,e,g). The algorithm returns a dense disparity

map with captures the basic 3D structure of the scene. Near

occlusion edges, however, errors are visible as can be seen

for the lamp arm (b).

are mostly resolved, however, at regions with occlu-

sions, the correct disparity at boundaries could not al-

ways be found, for example the lamp arm in Tsukuba

(Fig. 2b), or for the Teddy stereo pair (Fig. 2f). De-

creasing κ

may decrease these undesired blurring ef-

fects. However, a lower smoothing parameter may

also increase the convergence time.

We ranked the method the Middlebury stereo eval-

uation benchmark. In Fig. 3, a table of results for an

error threshold of 0.75 pixels is shown. On average,

the results of the method are comparable with those

of other stereo algorithms such as dynamic program-

ming and scanline optimization. In its current stage,

the performance of the alorithm is inferior to graph

cuts.

The algorithm was run for n

= 8000 iterations,

however, the result are usually stable after 4000 −

6000 iterations. The computation for 1000 iterations

take ≈ 3.4 min, using non-optimized code. This is

in the range of computational times of other two-

dimensional algorithms for energy optimization, i.e.

graph cuts and simulated annealing.

4 CONCLUSIONS

4.1 Summary

We proposed a dynamical systems approach to the en-

ergy minimization problem in early stereo vision. The

stereo problem is ﬁrst posed as an energy function

minimization problem. Then, we derive a system of

ordinary differential equations describing the dynam-

ical system corresponding to the energy function. We

explicitly calculate the development of the dynamical

system using a Runge Kutta technique. The inclusion

of a damping force ensures that the system converges

to a local minimum. Overall, the algorithm delivers

satisfying results for the images tested, using the Mid-

dlebury stereo database. The basic 3D structure of the

scene could be captured correctly. Its performance

is comparable to other approaches based on global

optimization [Scharstein and Szeliski, 2002], except

graph cuts [Boykov et al., 2001], which shows better

results.

Energy functions for stereo vision have also been

minimized by applying a gradient descent method

to the associated Euler-Lagrange partial differential

equation [Alvarez and S´anchez, 2000, Maier et al.,

2003]. Since the solutions to the Euler-Lagrange

equation are equivalent to Newton’s laws of motion

(in classical mechanics), possible relations to our

method should be investigated in the future.

4.2 Future Work

In the future, the performance of the algorithm may

be improved by modifying the parameters of the sys-

tem, the energy function itself, or by increasing the

number of nearest neighbors. The performance could

also be improved by increasing the number of itera-

tions. However, this would also increase the compu-

tation time.

Occlusions, which occur in almost all images, are

not handled by the algorithm. This causes errors in the

disparity estimation in particular near object bound-

aries. These problems could be decreased by incor-

SIMULATING DYNAMICAL SYSTEMS FOR EARLY VISION

527

Figure 3: Ranking from the Middlebury stereo database for an error threshold of 0.75 pixels. The results are comparable to

those obtained using dynamic programming or scanline optimization.

porating an occlusion detection method to our algo-

rithm [Egnal and Wildes, 2002].

The speed of the algorithm could be improved us-

ing a coarse-to-ﬁne approach. A parallel implemen-

tation would be feasible as well because of the local

character of the algorithm, e.g. using a graphics pro-

cessing unit.

The proposed model might be of interest for mod-

els of human stereo vision. Interacting neuronal ele-

ments might be able to utilize damping and/or noise

to optimize their responses.

ACKNOWLEDGEMENTS

This work has received support from the BMBF

funded BCCN G¨ottingen and the EU Project Drivsco

under Contract No. 016276-2.

REFERENCES

Alvarez, L. and S´anchez, J. (2000). 3-d geometry recon-

struction using a color image stereo pair and par-

tial differential equations. Cuadernos del Instituto

Universitario de Ciencias y Tecnologas Ciberneticas,

6:1–26.

Barnard, S. T. (1989). Stochastic stereo matching oer scale.

International Journal of Computer Vision, 3(1):17–

32.

Boykov, Y., Veksler, O., and Zabih, R. (2001). Fast ap-

proximate energy minimization via graph cuts. IEEE

Trans. Pattern Analysis and Machine Intelligence,

23(11):1222–1239.

Egnal, G. and Wildes, R. (2002). Detecting binocular

half-occlusions: Empirical comparisons of ﬁve ap-

proaches. IEEE Trans. Pattern Analysis and Machine

Intelligence, 24(8):1122–1133.

Geman, S. and Geman, D. (1984). Stochastic relaxation,

gibbs distribution, and the baysian restoration of im-

ages. IEEE Trans. Pattern Analysis and Machine In-

telligence, 6(6):721–741.

Maier, D., R¨ossle, A., Hesser, J., and M¨anner, R. (2003).

Dense disparity maps respecting occlusions and object

separation using partial differential equations. In Sun,

C., Talbot, H., Ourselin, S., and Adriaanen, T., editors,

Proc. VIIth Digital Image Computing: Techniques and

applications, pages 613–622.

Marroquin, J., Mitter, S., and Poggio, T. (1987). Probabilis-

tic solution of ill-posed problems in computational vi-

sion. Journal of the American Statistical Association,

82(397):76–89.

Roe, A. W., Parker, A. J., Born, R. T., and DeAngelis, G. C.

(2007). Disparity channels in early vision. Journal of

Neuroscience, 27(44):11820–31.

Roy, S. (1999). Stereo without epipolar lines: A maximum

ﬂow formulation. International Journal of Computer

Vision, 34(2/3):147–161.

Scharstein, D. and Szeliski, R. (2002). A taxonomy and

evaluation of dense two-frame stereo correspondence

algorithms. International Journal of Computer Vision,

47:7–42.

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

528