lateral detection are complementary: a vehicle has ei-
ther texture which yields strong local features that can
be matched across different views, or has large flat re-
gions which can be processed by the proposed quadri-
lateral detector. This method does not need vehicle
models and therefore is more robust to variability of
the vehicle types.
This paper is organized as follows: Section 2
briefly depicts general stereo vision issues; Section 3
presents the stereo construction algorithm; 3D points
processing is discussed in Section 4; Quadrilateralde-
tector is discussed in Section 5; Section 6 presents the
results of the proposed method; Section 7 concludes
and gives perspectives for this approach.
2 GENERAL STEREO VISION
ISSUES
Our approach requires a stereo vision algorithm ef-
ficient and fast enough to work in real time. Two
major issues must be examined: the cameras system
calibration and the identification of matching points
between the two images. We will focus here on the
second point, assuming that the first one is already
treated (Zhang, 1998), (Zhang, 2000). There exists a
considerable amount of methods on the stereo corre-
spondences problem. We can classify them into two
main categories : dense matching methods, that give
a correspondence map for each pixel in the image,
and sparse matching methods, that give correspon-
dence only for some points of interest. The dense
matching methods, exhaustively listed by Scharstein
and Szeliski in (Scharstein and Szeliski, 2002), do
not suit the application requirements since the ob-
jects we try to detect have uniform texture or have
reflective surfaces (roads and cars). Matching pixels
of such surfaces is very difficult. Furthermore these
algorithms generally demand a lot of computation re-
sources. The sparse matching methods require firstly
a features identification step (edge detection, corner
detection...), which is done separately in the two im-
ages. These features can be matched with local or
global algorithms. On one hand, global algorithms
search a global matching solution for all features by
minimizing cost functions. We can cite dynamic pro-
gramming methods (Ohta and Kanade, 1985), (Kim
et al., 2005), graph cut methods (Boykov et al., 2001)
and belief propagation methods (Yang et al., 2006).
These last methods are efficient but both belief prop-
agation and graph cut are typically computationally
expensive and therefore real-time performance is dif-
ficult to achieve (Yang et al., 2006). The dynamic
programming method has been tested in the frame-
work of this project but does not improve significately
the results. On the other hand, local methods depend-
ing only on values within a finite window around the
considerate pixel, are definitely faster.
Our approach uses multiple types of features and
match them by normalised cross-correlation, which is
at the same time simple and robust. The resulting dif-
ference of horizontal coordinates between two match-
ing points, called disparity, gives an estimate of the
distance of the points in the 3D world.
3 STEREO CONSTRUCTION
Both acquired images are first rectified with the cam-
eras calibration data and corrected if optical distor-
tions appear. The use of rectified images reduces sig-
nificately the complexity of process, since two corre-
sponding points in the left and right image will have
equal vertical coordinates.
Different features are identified separately on each
image. The first characteristic points used in our im-
plementation are maximum phase congruency points
(where the Fourier components of the image are max-
imally in phase). These are less sensitive to differ-
ence of overall contrast between two images and give
more points than more classic features such as Har-
ris corners. The phase congruency is computed with
wavelets transforms as described in (Kovesi, 1999).
When all maximum phase congruency points are ob-
tained, each point of the left image is compared to
the points lying on the same horizontal line in the
right image. A maximum disparity is set to reduce
the search space and accelerate the process. Several
similarity measurement systems between surround-
ing pixels area have been studied in the literature.
Our method uses normalized cross-correlation of the
phase congruencyvalues in a squarewindow (W
1
, W
2
)
around the two points, defined by
C(W
1
, W
2
) =
∑
(p
1
(i, j) − p
1
)(p
2
(i, j) − p
2
)
||(p
1
(i, j) − p
1
)(p
2
(i, j) − p
2
)||
(1)
where the sum is taken over (i,j), index of points
in the square windowsW
1
and W
2
, p
1
(i, j) and p
2
(i, j)
are the phase congruency at the pixel (i, j) in the im-
age 1 and image 2 respectively, and p
1
, p
2
, their mean
over the square windows W
1
, W
2
.
A list of scores in the right image is obtained for
each point of the left image and in a similar way for
each point of the right image. A ”winner-take-all”
strategy is then used to match the different points:
a match is considerate as valid if the correlation is
maximum among all the correlations computed on the
MULTI-FEATURE STEREO VISION SYSTEM FOR ROAD TRAFFIC ANALYSIS
555