Adaptive Reference Image Selection for Temporal Object Removal from
Frontal In-vehicle Camera Image Sequences
Toru Kotsuka
1
, Daisuke Deguchi
2
, Ichiro Ide
1
and Hiroshi Murase
1
1
Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya-shi, Aichi, Japan
2
Information & Communication Headquarters, Nagoya University, Furo-cho, Chikusa-ku, Nagoya-shi, Aichi, Japan
Keywords:
In-vehicle Camera, Temporal Object Removal, Adaptive Reference Image Selection.
Abstract:
In recent years, image inpainting is widely used to remove undesired objects from an image. Especially,
the removal of temporal objects, such as pedestrians and vehicles, in street-view databases such as Google
Street View has many applications in Intelligent Transportation Systems (ITS). To remove temporal objects,
Uchiyama et al. proposed a method that combined multiple image sequences captured along the same route.
However, when spatial alignment inside an image group does not work well, the quality of the output image
of this method is often affected. For example, large temporal objects existing in only one image create regions
that do not correspond to other images in the group, and the image created from aligned images becomes
distorted. One solution to this problem is to select adaptively the reference image containing only small
temporal objects for spatial alignment. Therefore, this paper proposes a method to remove temporal objects
by integration of multiple image sequences with an adaptive reference image selection mechanism.
1 INTRODUCTION
In recent years, image inpainting is widely used to re-
move undesired objects from an image. Especially,
there is a strong need for removal of temporal ob-
jects (ex. pedestrians and vehicles) from street-view
databases such as Google Street View
1
so that they
can be used for Intelligent Transportation Systems
(ITS) technologies such as geo-localization of vehi-
cles (Matsumoto et al., 2000).
Methods to remove temporal objects in an image
can be categorized into three approaches; (1) using
a single image, (2) using a single image sequence,
and (3) using multiple image sequences. The first ap-
proach synthesizes the background scene of a target
region selected manually (Bertalmio et al., 2000). It
requires only one image as an input, but it is impossi-
ble to restore the true background scene.
The second approach integrates frames captured
as one image sequence (Kawai et al., 2014). Using
the difference of appearances between frames, this
method can remove temporal objects automatically
and restore most of the background scene. How-
ever, some temporal objects, for example, parked ve-
hicles, cannot be removed since they are observed as
1
http://www.google.co.jp/help/maps/streetview/
static objects in the image sequence. The third ap-
proach integrates multiple image sequences captured
along the same route (Uchiyama et al., 2010). By us-
ing multiple image sequences, this approach can re-
move temporal objects even if they are observed as
static objects in a certain image sequence. Therefore a
method that removes temporal objects using multiple
image sequences is the most suitable for constructing
a street-view database.
The method proposed by Uchiyama et al. uses
frame alignment between image sequences and spa-
tial alignment inside an image group as a prepro-
cessing step to integrate multiple image sequences.
Through frame alignment, all images captured at the
same location are grouped. Then, spatial alignment
is performed by aligning all images with a reference
image selected from each image group. Finally, an
image without temporal objects is generated by fusion
of images in each image group. However, this method
can result in poor output image quality due to failure
of spatial alignment inside an image group. For ex-
ample, when a temporal object region existing in only
one image cannot be aligned with other images. Gen-
erally, in such cases, it is necessary to estimate cor-
respondences from the surroundings of the temporal
object. But, if the temporal object in the reference
image is large, this estimation will not work correctly.
233
Kotsuka T., Deguchi D., Ide I. and Murase H..
Adaptive Reference Image Selection for Temporal Object Removal from Frontal In-vehicle Camera Image Sequences.
DOI: 10.5220/0005357102330239
In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 233-239
ISBN: 978-989-758-089-5
Copyright
c
2015 SCITEPRESS (Science and Technology Publications, Lda.)