SIMULATING DYNAMICAL SYSTEMS FOR EARLY VISION
Babette Dellen
1,2
and Florentin W¨org¨otter
3
1
Bernstein Center for Computational Neuroscience,Max-Planck Institute for Dynamics and Self-Organization
Bunsenstrasse 10, G¨ottingen, Germany
2
Institut de Rob`otica i Inform`atica Industrial (CSIC-UPC), Llorens i Artigas 4-6, 08028 Barcelona, Spain
3
Bernstein Center for Computational Neuroscience, University G¨ottingen, Bunsenstrasse 10, G¨ottingen, Germany
Keywords:
Early vision, Stereo matching, Energy minimization, Dynamical systems.
Abstract:
We propose a novel algorithm for stereo matching using a dynamical systems approach. The stereo correspon-
dence problem is first formulated as an energy minimization problem. From the energy function, we derive a
system of differential equations describing the corresponding dynamical system of interacting elements, which
we solve using numerical integration. Optimization is introduced by means of a damping term and a noise
term, an idea similar to simulated annealing. The algorithm is tested on the Middlebury stereo benchmark.
1 INTRODUCTION
In stereo vision, 3D information is reconstructed from
stereo image pairs, i.e. two images of the same scene
taken from a different viewpoint. Algorithmic so-
lutions to this problem are not only of interest for
the field of computer vision [Scharstein and Szeliski,
2002], but also for related fields, such as computa-
tional neuroscience [Roe et al., 2007]. Different ap-
proacheshavebeen compared in a study by Scharstein
and Szeliski (2002). In general, we distinguish be-
tween local algorithms and methods based on global
optimization. Local methods are mainly character-
ized by their matching cost computation and cost ag-
gregation step, while global algorithms formulate a
global energy function which is then minimized. This
energy minimization problem is known to be NP hard.
The algorithms are distinguished based on the mini-
mization procedure used. Common methods are sim-
ulated annealing [Marroquin et al., 1987, Geman and
Geman, 1984, Barnard, 1989], graph cuts [Scharstein
and Szeliski, 2002, Boykov et al., 2001], and max
flow [Roy, 1999]. If global optimization is reduced
to independent scanlines, methods such as dynamic
programming or scanline optimization can be used to
compute a solution in polynomal time [Scharstein and
Szeliski, 2002].
In this paper, we propose a novel framework for
computing approximate solutions to the energy min-
imization problem on the example of early stereo vi-
sion. From the energy function, a system of ordi-
nary differential equations, determining the temporal
evolution of the system, can be derived. Each pixel
represents a “mass point”, moving along a single di-
mension with an amplitude encoding the disparity es-
timate (or label) of the pixel. Each mass is moving un-
der the influence of a data force, which is derivedfrom
the image data, and interacts with its neighbors via an
interaction force. The resulting system of differen-
tial equations is solved using a Runge Kutta method
of 4th order with fixed step size. A damping force
ensures that the dynamical systems settles at a stable
state.
2 THE MODEL SYSTEM
2.1 Stereo Vision as Energy
Minimization
The general framework we consider can be defined as
follows. Let P be the set of pixels in an image. The
goal is to find a disparity z
p
for each pixel p P which
minimize a gobal energy
E(z
p
) = E
data
(z
p
) +
qN(p)
E
int
(z
p
,z
q
) , (1)
where N(p) is the neighborhood of pixel p. The data
term E
data
measures how well the disparity values are
in agreement with the input data. The interaction term
525
Dellen B. and Wörgötter F. (2009).
SIMULATING DYNAMICAL SYSTEMS FOR EARLY VISION.
In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, pages 525-528
DOI: 10.5220/0001800905250528
Copyright
c
SciTePress
qN(p)
E
int
(z
p
,z
q
) encodes the smoothing assump-
tions of the algorithm.
In this work, the data energy is derived from the
stereo image I
left
and I
right
E
data
(z
p
) = k|I
left
(x
p
+ z
p
,y
p
) I
right
(x
p
,y
p
)| (2)
using absolute differences and a parameter k.
We further assume symmetric interactions be-
tween two pixels p and q with
E
int
(z
p
,z
q
) = f(z
p
z
q
) . (3)
The function f will be specified later on.
2.2 Dynamical Systems Formulation
The energy function corresponds to a system of inter-
acting elements moving under the influence of a data
force
F
data
p
(z
p
) = E
data
(z
p
) , (4)
where E
data
(z
p
) is the gradient of the data potential,
and an interaction force (on pixel p)
F
int
p,q
(z
p
,z
q
) = E
int
(z
p
,z
q
) . (5)
A schematic of the model is shown in Fig. 1.
F
data
F
int
Figure 1: Schematic of the dynamical system. The pixels
“mass points” are connected via elastic interaction forces
F
int
. The image data excerts a data force F
data
on each mass
point, which tends to move the mass towards the position
which corresponds to a minimum data cost.
We define the interaction energy implicitly by
choosing a discontinuity preserving interaction force
with
F
int
p,q
(z
p
,z
q
) = κ(d
max
|z
p
z
q
|)(z
p
z
q
)/d
max
(6)
if |z
p
z
q
| d
max
and zero otherwise. The parameter
d
max
defines the maximum disparity and κ determines
the maximum amount of smoothing.
The dynamics of the system is described by a sys-
tem of ordinary differential equations
dz
p
/dt = v
p
(7)
dv
p
/dt = F
data
p
(d
p
) + τ γv
p
qN(p)
F
int
p,q
(z
p
,z
q
) , (8)
where τ is a noise term and γv
p
a damping term with
damping constant γ. These additional forces have
been added to move the dynamical system towards a
local minimum.
2.3 Finding a Local Minimum
The system of differential equations is solved using a
fourth order Runge Kutta technique with a step size of
0.1, starting from random initial conditions. Cooling
is introduced through the damping force and the noise
term τ. With the course of time, we decrease the noise
according to
τ = p
r
(n
i
t)/n
i
(9)
where n
i
is the total number of iterations and t is the
current iteration number. The number p
r
is drawn
from a Gaussian distribution with a standard devia-
tion of 5 pixels. We further found it advantageous
to decrease the smoothing parameter accordingly as
well, such that
κ = κ
n
(n
i
t)/n
i
. (10)
2.4 Boundary Conditions
The amplitude of the dynamical variable z
p
is re-
stricted to predefined disparity range. We realize this
boundary conditions by including a potential barrier
with E(z
p
) = c if z
p
> d
max
or z
p
< 0. The parame-
ter c is chosen to be larger than the maximum absolute
difference between image pixels. Further if during the
computations z
p
> d
max
+ 1, we push the value back
to z
p
= d
max
+1. The same strategy is used if z
p
< 1.
Then the value is pushed back to z
p
= 1.
3 RESULTS
We evaluated the performance of the algo-
rithm using the Middlebury stereo benchmark
(www.middlebury.edu/stereo) [Scharstein and
Szeliski, 2002], containing four stereo pairs,
Tsukuba, Venus, Teddy, and Cones. The parameters
were kept constant for all stereo pairs with κ
n
= 10,
γ = 0.2, and k = 0.1. On the grid, each pixel was
allowed to interact with its left, right, up, and down
nearest neighbor.
The results of the algorithm are presented in
Fig. 2. Since the disparities are formulated as con-
tinuous variables, the algorithm returns subpixel dis-
parities. The resulting disparity maps capture the
basic structure of the scene. Depth discontinuities
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
526
(a) Tsukuba Left (b) Tsukuba Result
(c) Venus Left (d) Venus Result
(e) Teddy Left (f) Teddy Result
(g) Cones Left (h) Cones Result
Figure 2: Disparity results (b,d,f,h) for the Middlbury stereo
data set (a,c,e,g). The algorithm returns a dense disparity
map with captures the basic 3D structure of the scene. Near
occlusion edges, however, errors are visible as can be seen
for the lamp arm (b).
are mostly resolved, however, at regions with occlu-
sions, the correct disparity at boundaries could not al-
ways be found, for example the lamp arm in Tsukuba
(Fig. 2b), or for the Teddy stereo pair (Fig. 2f). De-
creasing κ
n
may decrease these undesired blurring ef-
fects. However, a lower smoothing parameter may
also increase the convergence time.
We ranked the method the Middlebury stereo eval-
uation benchmark. In Fig. 3, a table of results for an
error threshold of 0.75 pixels is shown. On average,
the results of the method are comparable with those
of other stereo algorithms such as dynamic program-
ming and scanline optimization. In its current stage,
the performance of the alorithm is inferior to graph
cuts.
The algorithm was run for n
i
= 8000 iterations,
however, the result are usually stable after 4000
6000 iterations. The computation for 1000 iterations
take 3.4 min, using non-optimized code. This is
in the range of computational times of other two-
dimensional algorithms for energy optimization, i.e.
graph cuts and simulated annealing.
4 CONCLUSIONS
4.1 Summary
We proposed a dynamical systems approach to the en-
ergy minimization problem in early stereo vision. The
stereo problem is first posed as an energy function
minimization problem. Then, we derive a system of
ordinary differential equations describing the dynam-
ical system corresponding to the energy function. We
explicitly calculate the development of the dynamical
system using a Runge Kutta technique. The inclusion
of a damping force ensures that the system converges
to a local minimum. Overall, the algorithm delivers
satisfying results for the images tested, using the Mid-
dlebury stereo database. The basic 3D structure of the
scene could be captured correctly. Its performance
is comparable to other approaches based on global
optimization [Scharstein and Szeliski, 2002], except
graph cuts [Boykov et al., 2001], which shows better
results.
Energy functions for stereo vision have also been
minimized by applying a gradient descent method
to the associated Euler-Lagrange partial differential
equation [Alvarez and S´anchez, 2000, Maier et al.,
2003]. Since the solutions to the Euler-Lagrange
equation are equivalent to Newton’s laws of motion
(in classical mechanics), possible relations to our
method should be investigated in the future.
4.2 Future Work
In the future, the performance of the algorithm may
be improved by modifying the parameters of the sys-
tem, the energy function itself, or by increasing the
number of nearest neighbors. The performance could
also be improved by increasing the number of itera-
tions. However, this would also increase the compu-
tation time.
Occlusions, which occur in almost all images, are
not handled by the algorithm. This causes errors in the
disparity estimation in particular near object bound-
aries. These problems could be decreased by incor-
SIMULATING DYNAMICAL SYSTEMS FOR EARLY VISION
527
Figure 3: Ranking from the Middlebury stereo database for an error threshold of 0.75 pixels. The results are comparable to
those obtained using dynamic programming or scanline optimization.
porating an occlusion detection method to our algo-
rithm [Egnal and Wildes, 2002].
The speed of the algorithm could be improved us-
ing a coarse-to-fine approach. A parallel implemen-
tation would be feasible as well because of the local
character of the algorithm, e.g. using a graphics pro-
cessing unit.
The proposed model might be of interest for mod-
els of human stereo vision. Interacting neuronal ele-
ments might be able to utilize damping and/or noise
to optimize their responses.
ACKNOWLEDGEMENTS
This work has received support from the BMBF
funded BCCN G¨ottingen and the EU Project Drivsco
under Contract No. 016276-2.
REFERENCES
Alvarez, L. and S´anchez, J. (2000). 3-d geometry recon-
struction using a color image stereo pair and par-
tial differential equations. Cuadernos del Instituto
Universitario de Ciencias y Tecnologas Ciberneticas,
6:1–26.
Barnard, S. T. (1989). Stochastic stereo matching oer scale.
International Journal of Computer Vision, 3(1):17–
32.
Boykov, Y., Veksler, O., and Zabih, R. (2001). Fast ap-
proximate energy minimization via graph cuts. IEEE
Trans. Pattern Analysis and Machine Intelligence,
23(11):1222–1239.
Egnal, G. and Wildes, R. (2002). Detecting binocular
half-occlusions: Empirical comparisons of five ap-
proaches. IEEE Trans. Pattern Analysis and Machine
Intelligence, 24(8):1122–1133.
Geman, S. and Geman, D. (1984). Stochastic relaxation,
gibbs distribution, and the baysian restoration of im-
ages. IEEE Trans. Pattern Analysis and Machine In-
telligence, 6(6):721–741.
Maier, D., R¨ossle, A., Hesser, J., and M¨anner, R. (2003).
Dense disparity maps respecting occlusions and object
separation using partial differential equations. In Sun,
C., Talbot, H., Ourselin, S., and Adriaanen, T., editors,
Proc. VIIth Digital Image Computing: Techniques and
applications, pages 613–622.
Marroquin, J., Mitter, S., and Poggio, T. (1987). Probabilis-
tic solution of ill-posed problems in computational vi-
sion. Journal of the American Statistical Association,
82(397):76–89.
Roe, A. W., Parker, A. J., Born, R. T., and DeAngelis, G. C.
(2007). Disparity channels in early vision. Journal of
Neuroscience, 27(44):11820–31.
Roy, S. (1999). Stereo without epipolar lines: A maximum
flow formulation. International Journal of Computer
Vision, 34(2/3):147–161.
Scharstein, D. and Szeliski, R. (2002). A taxonomy and
evaluation of dense two-frame stereo correspondence
algorithms. International Journal of Computer Vision,
47:7–42.
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
528