Removing Reﬂection from In-vehicle Camera Image

Keisuke Inoue, Fumihiko Sakaue and Jun Sato

Nagoya Institute of Technology, Japan

{inoue@cv., sakaue@, junsato@}nitech.ac.jp

Keywords:

In-vehicle Camera, Reﬂection Removal, Optical Flow.

Abstract:

When taking images with an in-vehicle camera, objects in the vehicle are often reﬂected on the windshield due

to sunlight, and they appear in the camera image. Since these reﬂections cause malfunction of autonomous

driving systems, it is very important to remove the reﬂections from in-vehicle camera images. Thus, we in this

paper propose a method for separating reﬂections and background scenes, and for generating images without

reﬂections. Unlike the existing reﬂection removal methods, our method conducts the signal separation and the

motion ﬁeld computation simultaneously, so that we can separate images without using edge information. The

efﬁciency of the proposed method is demonstrated by comparing with existing state-of-the-art methods.

1 INTRODUCTION

When we look at outdoor scenes through glass win-

dows, we often have reﬂections from indoor objects

on the glass windows, which make it difﬁcult to see

outdoor scenes. Such reﬂections cause serious prob-

lems for controlling autonomous vehicles using in-

vehicle cameras. We often have strong reﬂections

from in-vehicle objects at the windshield on sunny

days, and they appear in images observed by in-

vehicle cameras as shown in Fig. 1. If such a re-

ﬂection occurs, the reﬂection may be misrecognized

as an object in front of the vehicle, leading to a mal-

function of the autonomous driving system such as an

automatic braking system, which may cause a serious

accident. Thus, we in this paper consider a method

for removing such reﬂections in images.

In recent years, some methods have been pro-

posed for separating the reﬂections from the back-

ground scenes in images. Xue et al. (Xue et al., 2015)

used the optical ﬂow estimation for removing reﬂec-

tion in images, and succeeded to separate complex re-

ﬂections from background scene images. However,

their method requires edge information in images for

estimating optical ﬂows, and thus if the change in

intensity of reﬂection is not enough, it cannot sepa-

rate image signals properly. More recently, some au-

thors (Fan et al., 2017; Zhang et al., 2018) proposed

reﬂection removal methods by using a deep learning

technique. Although the efﬁciency of deep learning

has been shown by these authors, their methods suf-

fer from the domain shift problem, that is if the test

images mismatch the training dataset, their networks

Figure 1: Image captured by in-vehicle camera, which con-

tains reﬂections on the windshield.

no longer work properly.

Thus, we in this paper propose a new method for

separating reﬂections and back ground scenes accu-

rately without using edge information and without us-

ing neural networks. The fundamental idea of our

method is to conduct the image signal separation and

the motion ﬁeld computation simultaneously by min-

imizing a single cost function given a sequence of im-

ages. Although this is a very difﬁcult problem, we

make it tractable by representing the image motion

ﬁeld parametrically, and by using the property of in-

vehicle camera. That is, the in-vehicle camera is al-

ways ﬁxed at the same position in the passenger com-

partment and does not move relative to the vehicle.

Thus, when taking an image with an in-vehicle cam-

era attached to a running vehicle, the outside scene

moves in the image, but the reﬂection of objects in

the vehicle does not move. By using these properties,

we separate the outside scene image and the reﬂected

image from the inside of the vehicle efﬁciently. Our

222

Inoue, K., Sakaue, F. and Sato, J.

Removing Reﬂection from In-vehicle Camera Image.

DOI: 10.5220/0008941002220228

In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 4: VISAPP, pages

222-228

ISBN: 978-989-758-402-2; ISSN: 2184-4321

method does not rely on edge information in images,

and does not use image prior learned from the train-

ing dataset. Thus, it works properly even if there is

no strong edge information in reﬂection, and does not

suffer from the domain shift problem.

2 RELATED WORK

In recent years, various approaches have been pro-

posed for image signal separation. These methods fall

into two classes, i.e. single image based methods and

multiple image based methods.

The single image based methods are ill-posed, and

hence it is necessary to combine a priori knowledge in

these methods. Levin et al.(Levin et al., 2004) pro-

posed a technique for separating reﬂections so that

the brightness gradient of the recovered image ap-

proaches to that of the image in the natural image

database. More recently, deep learning techniques

are used for learning image features of reﬂection and

separating reﬂections from images (Fan et al., 2017;

Zhang et al., 2018; Wan et al., 2018). These meth-

ods enable us to separate reﬂections more accurately

and faster than before. However, since these methods

learn the image features of reﬂection based only on

the training data, an enormous amount of training data

is required to deal with various types of reﬂections in

various scenes. As a result, these methods often suffer

from the domain shift problem caused by the limited

number of data. On the contrary, our method is based

on the imaging model of reﬂection and does not rely

on training dataset, so it does not suffer from the do-

main shift problem.

The multiple image based methods use the prop-

erty that the motion of the background scene is dif-

ferent from that of the reﬂection on glass windows.

Yu and Brown (Yu and Brown, 2013) proposed a

method for separating images into foreground and

background by matching image features using SIFT.

Xue et al. (Xue et al., 2015) recovered dense motion

ﬁelds from sparse motion ﬁelds obtained by edge in-

formation, and showed that the dense motion ﬁelds

enable us to separate foreground and background im-

ages more accurately. Nandoriya et al. (Nandoriya

et al., 2017) also used image edge information for

obtaining initial motion ﬁeld and for separating fore-

ground and background information in video frames.

However, all these methods require edge information

for obtaining motion ﬁelds in images, and hence they

cannot separate foreground and background informa-

tion accurately, when we do not have abrupt change

in intensity and cannot obtain image features in input

images. On the contrary, our method does not require

Figure 2: In-vehicle camera observes not only background

scene, but also reﬂection of objects in vehicle.

Figure 3: Observed image which contains reﬂection from

objects in vehicle, i.e. two gray squares.

abrupt change in images and does not need to extract

image features for separating image signals.

3 TRANSMISSION AND

REFLECTION

We ﬁrst consider an imaging model of an in-vehicle

camera, in which reﬂected light and transmitted light

are simultaneously captured in a single image.

In the image of an in-vehicle camera, we often ob-

serve light reﬂected by the windshield of the vehicle

as well as light transmitted through the windshield

. Fig. 2 shows the scene where two types of light I

and I

enter the in-vehicle camera, and Fig. 3 shows

observed image I

by the camera. The image I

in-

cludes regions that are darker than the surrounding

area. This is the reﬂection caused by objects in the

vehicle.

Xue et al. formulated that the observed image is

expressed by the alpha blending of the background

image and the foreground image at a ﬁxed ratio as

follows:

(x) = (1− α(x))I

(x) + α(x)I

(x) (1)

where x = [x, y]

is an image pixel, and α(x) denotes

the mixing ratio at x, which ranges from 0 to 1.

It seems that Eq. (1) is correct, but actually it is

physically wrong, since in this model the light I

from

the outside of the vehicle is attenuated with (1− α),

which does not happen in reality. Reﬂection is caused

Removing Reﬂection from In-vehicle Camera Image

223

(a) Background image (b) reﬂection image

Figure 4: Background image I

and reﬂection image I

derived from the observed image I

in Fig. 3.

by the addition of the light I

from the inside of the

vehicle to the light I

from the outside of the vehi-

cle as shown in Fig. 2, and the light I

from the out-

side of the vehicle is not attenuated. Also, for us-

ing this equation, we have to estimate not only two

images, I

and I

, but also a mixing parameter α,

which is over parameterized for representing reﬂected

light and transmitted light. Thus, we next consider

an imaging model based on the reﬂection principle

where light inside the vehicle is added to light outside

the vehicle.

Let us consider the case where light from outside

of the vehicle I

and light from inside of the vehicle

are incident as shown in Fig. 3. If the light I

and

the light I

are added to form an observed image, the

intensity I

of the observed image can be expressed

as follows:

(x) = I

(x) + I

(x) (2)

Fig. 4 shows the background image I

and the

reﬂection image I

when considered based on this

model. The observed image in Fig. 3 is the sum of

these two images according to Eq.(2). As is clear

from the image in Fig.4 (b), the two gray squares in

the observed image in Fig. 3 is generated not because

the reﬂection of gray square objects exists, but be-

cause the intensity of the square objects is lower than

the surrounding area. The reﬂection model in Eq. (2)

accurately models this actual reﬂection process.

We next consider the difference in the character-

istics of the external image I

and in-vehicle image

over time. In the outside image I

, the position of

3D object in the image changes with the motion of the

vehicle, so the brightness I

(x) at a pixel x in the im-

age changes with time. On the other hand, since the

object in the vehicle is stationary relative to the in-

vehicle camera, the intensity I

(x) of the in-vehicle

image at pixel x can be considered to be constant in a

short period of time.

In the following sections, we propose a method

for separating the outsize image I

and the inside im-

age I

from the observed image I

assuming that the

inside image I

is constant in a short period of time.

4 SEPARATING OUTSIDE AND

INSIDE IMAGES

Suppose we have an observed image I

at time t.

Then, assuming that the optical ﬂow does not change

in a short period of time, the observed image

time s (s 6= t) can be described by using the observed

image I

at time t as follows:

, ∆x) = I

(x− (s− t) · ∆x) (3)

where ∆x denotes the optical ﬂow at x.

Furthermore, we represent the optical ﬂow of

background scene motion parametrically by using

afﬁne transformation, that is afﬁne ﬂow (Sabater et al.,

2012; Ju et al., 1996). In the afﬁne ﬂow, the motion

vector ∆x = [∆x, ∆y]

⊤

at x = [x, y]

⊤

can be described

as follows:



∆x



= A





(4)

A =





0 0 1





(5)

where A denotes afﬁne transformation matrix with six

parameters, and we represent the whole optical ﬂow

in the image just by these six parameters. It seems that

six parameters are not enough for representing whole

scene ﬂow. However, since the afﬁne ﬂow can repre-

sent divergence as well as rotation and translation, it

can describe scene ﬂow of in-vehicle camera images

efﬁciently.

Now, let us consider three consecutive images,

(x), I

(x) and I

(x). From Eq. (3), these three im-

ages can be described by using reﬂection image I

afﬁne parameter A and background image I

at time

0 as follows:

(x) =

, ∆x(A)) + I

(x) (6)

(x) =

, ∆x(A)) + I

(x) (7)

(x) =

, ∆x(A)) + I

(x) (8)

where ∆x(A) denotes motion vector at x in the afﬁne

ﬂow represented by an afﬁne transformation matrix

A. Assuming that ∆x(A) does not change even if the

background images change in a short time, we can ob-

tain constraints on optical ﬂow in the background im-

age. As a result, we can describe a series of observed

images I

(t = 0, ··· , 2)using a background image I

a reﬂection image I

and an afﬁne transformation A.

Since the observed images I

at three consecutive

time are represented as shown in Eq. (6), Eq. (7) and

Eq. (8), we can estimate I

, I

and A by minimizing

the following cost function E

, I

, A) =

∑

t=0

||I

−

, A)− I

(9)

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

224

(a) Background image I

(b) Background image I

(d) ||I

−

Figure 5: Separating images under the existence of inde-

pendent motions.

In Eq.(9), we use three observations I

(t = 1, ··· , 2),

since we estimate two images, I

and I

, and an afﬁne

transformation.

5 SEPARATING IMAGES UNDER

INDEPENDENT MOTIONS

In section 4, we estimated outside image I

and inside

image I

by assuming that the entire optical ﬂow in

the image can be represented by a single afﬁne trans-

formation. Although this assumption is valid when

we observe a static scene from a moving vehicle, it is

no loner valid if we have some independently mov-

ing objects in the scene. Thus, in this section, we ex-

tend the method described in section 4, and propose

a method for extracting outside image and inside im-

age from the observed image under the existence of

multiple independent motions.

Suppose we have background images, I

and I

and afﬁne ﬂow A which are obtained by using the pro-

posed method, as shown in Fig. 5 (a) and (b). Then

the background image

can be synthesized by us-

ing I

and A according to Eq. (3) as shown in Fig. 5

(c). Since the afﬁne ﬂow A represents the background

scene motion, the static objects, such as buildings and

trees, in I

and

coincide with each other. However,

if we have independently moving objects, such as ve-

hicles, they do not match in I

and

as shown in

Fig. 5 (b) and (c). Thus, we can extract independently

moving regions as shown in Fig. 5 (d) by extracting

image pixels which hold the following inequality:

||I

−

, A)|| ≥ th (10)

where, th is a threshold value. Then, by minimizing

the cost function E

on this region, the outside im-

age I

and the inside image I

as well as the afﬁne

ﬂow A of the independently moving region can be

obtained. By applying these procedures iteratatively,

we can extract outside images and inside images from

captured images. Hence, we extract outside image

, inside image I

and afﬁne ﬂows of N regions

= { A

, ··· , A

} by solving the following mini-

mization problem:

{

} = arg min

∑

i=1

, I

, A

)

+ α||L (I

)||

+ β||L (I

)||

(11)

where, L (·) denotes the Laplacian for smoothness

constraints, and α and β is its weight.

6 SEGMENTATION IMAGE

For separating outside and inside images more accu-

rately, we further introduce image segmentation into

our method.

In Sec 2.2, we assumed that the intensity of the

outside image changes with time, while that of the

inside image does not change. However in the real

scene image, the image motions of some outside ob-

jects, such as road, are not observable since they do

not have enough texture on their surface. As a result,

these objects are classiﬁed into inside objects and ap-

pear in the inside image. For solving this problem, we

utilize image segmentation, and set additional con-

straints to be classiﬁed into inside intensity when the

image point belongs to texture-less objects.

More speciﬁcally, we ﬁrst classify image pixels in

an image into four categories, “Buildings”, “Road”,

“Sky”, and “Others” by using an image segmenta-

tion network, and “Buildings”, “Road” and “Sky” are

marked as texture-less area. We next deﬁne a maxi-

mum intensity I

max

(x) for inside image I

at an image

pixel x as follows:

max

(x) = λI

(x) (12)

where, λ is a scalar in the range of [0, 1]. If an im-

age pixel x is in a texture-less area, then inside im-

age intensity I

(x) at x is estimated in the range of

[0, I

max

(x)], and if x is not in a texture-less area, I

(x)

is estimated in the range of [0, 1] by using Eq.(11). In

this way, we can avoid outside objects, such as road

and sky, being classiﬁed as inside objects and appear-

ing in the inside image. Fig. 6 shows an overview of

this algorithm.

Removing Reﬂection from In-vehicle Camera Image

225

Figure 6: Overview of generating I

max

(x) for a texture-less

area.

Figure 7: Experimental setup of in-vehicle camera and re-

ﬂection on windshield.

7 EXPERIMENTS

We next show the results of generating reﬂection re-

moval images using our proposed method in various

scenes.

In our experiments, road scene images were taken

by an in-vehicle camera. As shown in Fig. 7, the

in-vehicle camera was attached on the windshield

so that the reﬂection from the object on the dash-

board appears in the image. We took three con-

secutive images every 0.1 seconds in the driving

scene, and cropped these images with the size of

200 × 200. To obtain the segmentation images, we

used a standard image segmentation network based

on pix2pix (Isola et al., 2016), which was trained on

Cityscapes dataset (Cordts et al., 2016).

Fig. 8 (a), (b) and (c) showthree sequential images

of four difference scenes observed by the in-vehicle

camera. As shown in these images, inside objects

were reﬂected by the windshield and appear in the ob-

served images with outside scenes. These sequential

images were used for extracting outside image and in-

side image. The outside and inside images obtained

from our method are shown in Fig. 8 (d) and (e) re-

spectively. As shown in these images, outside scenes

and inside objects are separated appropriately in these

images by using our method. Fig. 8 (f) shows op-

tical ﬂow estimated by our method. As we can see

in these images, the estimated optical ﬂow represents

the movement of the buildings, road and trees appro-

priately.

We next compare our method with two state-of-

the art reﬂection removal methods. The ﬁrst one is

Xue’s method (Xue et al., 2015) which uses edge in-

formation for computing optical ﬂows and separat-

ing signals. This method can separate signals clearly

when we have sharp reﬂection in images, but it de-

grades when the reﬂected objects are vague in im-

ages. The second one is Zhang’smethod (Zhang et al.,

2018) which is based on a trained neural network.

Fig. 9 compares the results of our method with

those of Xue’s method and Zhang’s method. As

shown in this ﬁgure, Xue’s method cannot remove

the reﬂection properly, since the reﬂected objects do

not have sharp edges in the observed images, and the

motion ﬁeld cannot be obtained accurately in their

method. Zhang’s method succeeded to remove re-

ﬂection partially, but not all of the reﬂection was re-

moved accurately. Since the efﬁciency of learning

based methods heavily depends on the dataset used

in the learning stage, Zhang’s method cannot remove

reﬂection accurately, when the input images are taken

under different domains.

Finally, we show results from images which con-

tain independent motions. Fig. 10 (a) show sequential

images at t = 0 and t = 2. As shown in these im-

ages, the vehicle is moving forward while the bicycle

and the pedestrian are moving independently. Fig. 10

(b) shows reﬂection removal images obtained from

the proposed method, and Fig. 10 (c) shows optical

ﬂow estimated at the same time. As shown in Fig. 10

(b), the reﬂection at the center of the image was elimi-

nated accurately even if we have independent motions

in images. Fig. 10 (c) shows that both the background

scene motions and the independent motions are esti-

mated accurately in the proposed method.

From these results, we ﬁnd that the proposed

method works efﬁciently under various situations.

8 CONCLUSIONS

In this paper, we proposed a novel method for sepa-

rating background scenes and reﬂections from images

observed by an in-vehicle camera.

Our method estimates background scene images,

reﬂection images and scene ﬂows simultaneously,

representing the scene ﬂows parametrically by using

afﬁne transformations. The method does not reply on

edge information in images, and hence it works efﬁ-

ciently even if the reﬂections do not have sharp image

features unlike the existing methods.

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

226

(a) t = 0 (b) t = 1 (c) t = 2 (d) outside (e) inside (f) ﬂow

Figure 8: Result of signal separation from in-vehicle image into outside image and inside image. (a), (b) and (c) show three

sequential images obtained by in-vehicle camera. (d) and (e) show outside image and inside image obtained by the proposed

method from images in (a), (b) and (c). The estimated optical ﬂow is shown in (f).

(a)Input (b)Our results (c)Xue’s method (d)Zhang’s method

Figure 9: Comparison of reﬂection removal images obtained by our proposed method, Xue’s method (Xue et al., 2015) and

Zhang’s method (Zhang et al., 2018).

Removing Reﬂection from In-vehicle Camera Image

227

(a) input image (b) reﬂection removal image (c) optical ﬂow

test scene 1test scene 2

t=0 t=2 t=0 t=2

Figure 10: Results of generating reﬂection removal image and computing optical ﬂow under the existence of independent

motions.

The experimental results show that the pro-

posed method outperforms the existing state-of-the-

art methods.

REFERENCES

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,

M., Benenson, R., Franke, U., Roth, S., and Schiele,

B. (2016). The cityscapes dataset for semantic urban

scene understanding. In Proc. of the IEEE Conference

on Computer Vision and Pattern Recognition (CVPR).

Fan, Q., Yang, J., Hua, G., Chen, B., and Wipf, D. (2017).

A generic deep architecture for single image reﬂection

removal and image smoothing. In Proceedings of the

IEEE International Conference on Computer Vision,

pages 3238–3247.

Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2016).

Image-to-image translation with conditional adversar-

ial networks. arxiv.

Ju, S. X., Black, M. J., and Jepson, A. D. (1996). Skin

and bones: Multi-layer, locally afﬁne, optical ﬂow

and regularization with transparency. In Computer

Vision and Pattern Recognition, 1996. Proceedings

CVPR’96, 1996 IEEE Computer Society Conference

on, pages 307–314. IEEE.

Levin, A., Zomet, A., and Weiss, Y. (2004). Separating

reﬂections from a single image using local features.

In ECCV.

Nandoriya, A., Elgharib, M., Kim, C., Hefeeda, M., and

Matusik, W. (2017). Video reﬂection removal through

spatio-temporal optimization. In The IEEE Interna-

tional Conference on Computer Vision (ICCV).

Sabater, N., Leprince, S., and Avouac, J.-P. (2012). Con-

tract invariant and afﬁne sub-pixel optical ﬂow. Inter-

national Conference on Image Processing.

Wan, R., Shi, B., Duan, L.-Y., Tan, A.-H., and Kot, A. C.

(2018). Crrn: Multi-scale guided concurrent reﬂection

removal network. In Proceedings of the IEEE Con-

ference on Computer Vision and Pattern Recognition,

pages 4777–4785.

Xue, T., Rubinstein, M., Liu, C., and Freeman, W. T.

(2015). A computational approach for obstruction-

free photography. ACM Transactions on Graphics

(Proc. SIGGRAPH), 34(4).

Yu, L. and Brown, M. (2013). Exploiting reﬂection change

for automatic reﬂection removal. pages 2432–2439.

Zhang, X., Ng, R., and Chen, Q. (2018). Single image re-

ﬂection separation with perceptual losses. In Proceed-

ings of the IEEE Conference on Computer Vision and

Pattern Recognition, pages 4786–4794.

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

228