V
r
, so that a frame I
t
belong to S
r
only if the distance
|t − k| for a frame I
k
∈ V
r
is the minimal among all
the frames in S.
In both tests w was heuristically set to the video
frame rate, since it implies a reasonable camera shake
of about 1s, while in the decimated test r = ⌈w/2⌉.
For this setup it was experimentally verified that |S| ≃
|V|/2 while the maximum distance equals roughly the
video frame rate. This also implies a four times faster
computation, as 3D reconstruction algorithms have a
time complexity of at least O(n
2
).
The 3D reconstruction was achieved by using the
state-of-the-art freely available SfM pipeline Visu-
alSFM (Wu, 2013) where matches between keypoints
are computed on a window of h successive frames,
where h = 7 for the Monk sequence and h = 13 for the
other video sequences. Note that the windows size is
intended in terms of successive frames given as input
to VisualSFM. In the case of the full sequence V, the
window size is doubled in order to preserve the spatial
consistency between the frame matches.
No further methods are included in the evalua-
tion, since no other similar methods exist to the best
of our knowledge, and a comparison with keyframe
selection strategies (Seo et al., 2008) or deblurring
methods (Lee et al., 2011) would be unfair as they
differ in purposes and use additional information. In
particular, DWAFS aims at providing a fast data pre-
processing to be used for other tasks, working on a
simple gradient statistic, that does not require com-
plex time-consuming image processing, as the com-
putation of image feature keypoints, previous poses
and 3D structure as most of keyframe selectors and
deblurring methods, which represent possible final
tasks which can benefit of DWARF.
3.2 Results
Results in the case of the full and decimated se-
quences are reported respectively in Figs. 6 and 7.
In particular, the histograms of the total number of
3D point of the reconstructed model are reported, as
well as the corresponding mean reprojection error and
the track length associated, together with the aver-
age number of feature points found on each frame.
Note that for the full sequence V, the average track
length bar is halved to get the same frame spatial track
length, since V frames are about twice those of S and
C.
Reasonably, it can be stated that the product of the
average track length times the number of 3D points
must be roughly equals to product of the mean num-
ber of features per images times the total number of
image frames. So, in order to provide a more accu-
rate, defined and dense 3D reconstruction, not only
a higher number of 3D points must be found, but
also more features on the images or longer tracks.
Both cases improve the reconstruction accuracy and
decrease the estimation errors, by providing a denser
3D point cloud in the former case or a more robust
and stable reconstruction in the latter case.
Referring to Fig. 6, the DWAFS strategy notably
improves the reconstruction in the case of the Monk
and Desktop2 sequences. The relatively small de-
crease of the average track length can be attributed
to a major number of features found on images. More
robust and stable point are retained first so that im-
provements can be only done by adding points last-
ing less on the sequence with clearly shorter tracks,
decreasing the average track length. Nevertheless, re-
projection error still remains lower.
In the case of the Desktop0 and Desktop1, al-
though with respect to the full sequence V less 3D
points are found, an higher number of features with
longer track lengths are found in the image, which
together with a lower reprojection error means that
the V models contain fragmented and unstable tracks.
This implies that multiple tracks are associated to the
same 3D point which appears duplicated, that implies
a misleading untrue denser model.
Concerning the comparison between the DWAFS
sequence S and its complement C, there is noticeable
difference between them for fast camera movements
and shakes with noticeable blur (Monk, Desktop0 and
Desktop1 videos). Also for slow camera movements
(Desktop2 sequence), although reduced, better results
are obtained for S. Note that the reprojection error for
C is lower then that of the full sequences V as this got
a higher number of points per image. Nevertheless,
the reprojection error ofC is higher than that of S even
if there is an higher number of points per image for the
DWAFS sequence S.
Finally, from Fig. 7 it can be observed that better
3D reconstructions are obtained in the case of deci-
mated sequences. The difference lowers as the video
contains slow camera movements, in that case results
are quite similar. Note that in the decimated case
denser 3D models are found with respect to the full
image sequence, since less dense sequences are used
with the same frame windows h (see Sect. 3.1) to get
the feature matches, that implies higher effective spa-
tial window.
Moreover, by inspecting all histograms, the per-
centile parameter m seems to be quite stable, provid-
ing a peak for m = 95.
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
264