2. An adaptation of this approach to use these local
refinements to stitch videos (Section 4);
3. Quantitative evaluation for both image and video
stitching using different cameras. (Section 5).
2 RELATED WORK
In the field of computer vision, one of the oldest
types of algorithms is image alignment and stitching
methods (Szeliski, 2006). It is possible to create
panoramas using these types of algorithms. Several
techniques and methods have already been developed
for tackling this problem (Szeliski, 2006). (Brown
and Lowe, 2007) proposed a technique to build
panoramic images based on invariant features and
homography warping. (Jia and Tang, 2008), on
the other hand, used a structure deformation method
based on 1D features.
Moreover, the advent of dual-fisheye cameras
enabled a new method for the creation of 360
◦
panoramas and it is gaining a noticeable adoption,
as evidenced by applications using such cameras
that range from surveillance (Al-Harasis and
Sababha, 2019) to visual feedback for telepresence
robots (Dong et al., 2019). This also includes the
generation of panoramas through fisheye cameras
mounted on drones (Zia et al., 2019) and a series
of 360
◦
journalistic videos made by the New York
Times (Times, 2017). However, in this particular type
of stitching, challenges arise due to the distortion
caused by the fisheye lenses and, in the particular
case of dual-fisheye images, the limited region of
overlap.
In fisheye image stitching, most methods include
three stages: fisheye image unwarping, image
alignment, and image blending. An example is the
work of (Ho and Budagavi, 2017), who developed
a method to stitch dual-fisheye images. In the
unwarping step, they perform an equirectangular
projection of each fisheye image. Differently from
our work, the alignment step uses an affine warp
based on a precomputed calibration procedure. In the
final step, a ramp function is applied to generate a
seamless blend. In a following paper, they adopt a
similar approach, but, instead of an affine warp, they
used a RMLS deformation to align the points in a
local manner (Ho et al., 2017). As in their previous
work, the alignment is precomputed in an offline
calibration phase to determine how the deformation
will take place. However, in configurations too
different from the setup used to calibrate the control
points and target points, it is noticeable the creation
of artifacts.
In another work, (Souza et al., 2018) uses clusters
of image features to estimate a homography that
aligns the images. In (Lo et al., 2018), the alignment
step is computed by a local mesh warping based on
features extracted from the image, which does not
require a calibration stage. Also, the blending step is
performed using seam cuts and multi-band blending.
The common thing about these two studies is that
they only perform global alignment to compose the
panoramic image, while our work also applies local
refinements to it.
Regarding the refinement of the stitched image,
various approaches were developed. In (Dhiman
et al., 2018), a method is proposed to minimize
ghosting and brightness artifacts in 360
◦
High
Dynamic Range (HDR) images. Along with this,
numerous dual-fisheye camera calibration methods
were developed, such as those presented in (Gao and
Shen, 2017) and (Aghayari et al., 2017).
After the stitching has been done, it is important
to have a way to assess the quality of the final
panorama. In (Azzari et al., 2008), a methodology
and a synthetic dataset, with a reference panorama,
are proposed to allow a quantitative evaluation of
image stitching algorithms based on the mean squared
error (MSE) metric. Meanwhile, (Ghosh et al.,
2012) robustly apply several metrics, some that use
a previous panorama ground truth and some that do
not. The main idea is to quantitatively assess some
aspects of the stitched image, such as image blending
quality and image registration quality. Furthermore,
in (Dissanayake et al., 2015), a quantitative method
to assess the quality of stitched images is presented
that does not use a reference ground truth. Moreover,
the research utilized a qualitative approach based on
surveys and the ranking of the images to provide
an alternative metric and to validate the quantitative
methods thus presented.
3 IMAGE STITCHING
Similar to (Souza et al., 2018), our method uses
local invariant features and template matching to
perform global alignment of dual-fisheye images. The
main difference is the addition of local refinements
to improve the stitching quality in three steps.
Figure 1 illustrates the flow of our method. First,
we use graph cut to perform an seam estimation
of the stitching seam (Kwatra et al., 2003). Then
we employ RMLS to apply a local transformation
around misaligned features (Schaefer et al., 2006).
Thereafter, we evaluate each of the improvements
to check if it resulted in a better stitched image to
VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications
18