adjustable velocity. This prototype allows us to
capture video sequences with rotations, translations
and combined motions including zoom effect and
abrupt motions.
Experiments data set consists of nine video
sequences acquired at 30 fps frame rate in a real
scene with brightness changes. We have chosen
rotation, translation in the y-direction to create a
zoom effect, and combined motion with rotation and
translation, to compare the different operators,
because these kinds of motions are the most
disturbing for matching process. Table 1 give details
on the video sequences related to 3 types of motion:
number of frame, shift or rotation angle between
frames. Several velocities of the robot arm were
tested during acquisition: low, medium, or high
velocity. Furthermore, to simulate more abrupt
motions and considerable transformations, we
matched distant key frames. Velocities of motions
present in these video sequences are faster than
normal motions of a human being. For example, the
lowest velocity of translation is 45 cm per second
and the lowest velocity of rotation 100 degrees per
second.
Based on the state of art presented in the
previous section, we have chosen to compare SIFT
because it’s the most robust, cross correlation with
Harris corner detector because it’s the fastest and
SURF descriptor which is considered as a good
compromise between computation time and
robustness. To have well distributed Harris points,
we have divided the images in buckets of size 15×15
pixels. The ZNCC correlation score is applied in
11×11 pixels ROI, with a minimum threshold of 0.8.
The cross correlation is used with Harris and also
with SURF detector to highlight the influence of the
detector on matching process.
For evaluation, we observe the robustness and
the computation time. The most popular metrics for
robustness are ROC and Recall-Precision curves.
Both are based on the number of correct matches
and the number of false matches obtained for an
image pair. We use the total number of correct
matches (inliers) and the percentage of inliers
compared to the total number of matched points
(inliers and outliers), described by the equation (1).
esfalsematchchescorrectmat
chescorrectmat
inliers
%
(1)
The number of correct matches and false
matches is determined with Least Median of Square
algorithm (Zhang, 1998) by estimating the
fundamental matrix in the image pair. The maximum
distance from point to epipolar line, beyond which
the point is considered an outlier and is not used for
computing the final fundamental matrix is equal to 1
pixel. The desirable level of confidence that the
matrix is correct is equal to 99%. The only
constraint of this method is that we must have at
least eight matched features. The computation of the
two-view geometry requires that the matches
originate from a 3D scene and that the motion is
more than a pure rotation. That is respected as the
camera is fixed slightly out of the rotation axis on
the robot clip.
To develop our comparative study, we perform
the following process for each video sequence:
1. Fix the number of frames to skip (frame jump)
between images to match.
2. Extract distinctive features in images and match
them using the different descriptors.
3. Select inliers from these candidates by estimating
the fundamental matrix using LMedS method.
3 RESULTS
In this section, we present in Figure 1 and Table 2 an
extract of the results for all carried experiments and
discuss the performance of the tested descriptors.
3.1 Image Rotation
Matching is tested between images with a rotation
angle between 7 and 120 degrees by varying
velocity and image jump. The number of inliers
clearly decreases for higher rotation velocity. SIFT
descriptor is the most robust to rotation followed by
SURF, which fails in fast rotation. Harris based
matching is more disturbed than SURF based
detector.
3.2 Image Scale Change
Scale change is achieved by a translation up to 1370
mm. All descriptors have a similar robustness, (% of
inliers), slightly lower for cross correlation. SURF
presents the lowest number of inliers for all
velocities. The number of inliers decreases when
increasing velocity of robot arm but much less than
for the rotation case.
3.3 Combined Motion
Combined motion is performed by simultaneously
rotating and translating the robot arm (between 4
and 92 degrees with 1370 mm shift). The performan-
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
428