
3 LF IMAGE SYNTHESIS BASED
ON MIXED-FOCUS IMAGE
3.1 Light-Field Reconstruction
The mixed-focus image described above is an image
that contains images taken at different focal lengths
in a single image. Therefore, it can be thought of as
a sparse image taken with only a limited number of
pixels at each focal length and then added together.
In order to estimate LF from such images, this sec-
tion first considers how to estimate LF from multiple
images taken at different focal lengths. For the pur-
pose of simplifying the discussion, we will consider
the case where a 2D LF is acquired as a 1D image
and the 2D LF is estimated from this image.
First, let us consider how to compose an image
from LF. Let L(x, u) denote a ray of light traveling
from a point x on the lens in the direction of u. If the
distance from the lens to the imaging surface is 1 and
the focal length is f , the incident position x
′
of the ray
L(x, u) on the imaging surface can be expressed by the
lens imaging formula as follows:
x
′
= x −
x
f
− u (2)
Assuming that all rays incident on the lens reach the
image sensor, the brightness I(x
′
) observed at x
′
can
be expressed as follows:
I(x
′
) =
Z
x
L(x, −x
′
+ x −
x
f
)dx (3)
This can be regarded as a sub-aperture image L(x, u),
where L(x, u) is the set of rays passing through the
point x on the lens, and the sub-aperture images are
added together by translating them in accordance with
the focal length.
This can be regarded as a sub-aperture image
L(x, u), where L(x, u) is the set of rays passing
through the point x on the lens, and the sub-aperture
images are added together by translating them in ac-
cordance with the focal length. If multiple images
with different focal length f are obtained, the rela-
tionship between the equations can be obtained for
each focal length as a simultaneous Eq. (3). There-
fore, if a larger number of focal length images than
the density of LF to be estimated can be obtained, LF
can be estimated by solving the obtained simultane-
ous equations. However, the number of pixels that
can be captured at each focal length is limited in the
mixed-focus images used in this study. Therefore, a
more efficient method of representing and estimating
LF is needed.
3.2 Light Field Representation
Let us consider LF as a set of sub-aperture images
taken from different viewpoints. In this case, light
uniformly irradiated in all directions from objects in
the scene, i.e., diffuse reflection components, can
be represented by applying a disparity-based view-
point transformation to the image taken from a cer-
tain viewpoint. On the other hand, specular reflec-
tion components and areas hidden by occlusion can-
not be represented by this method. However, since
such components are very rare in general scenes, they
can be represented as sparse images.
In this study, Deep Image Prior (DIP) (Ulyanov
et al., 2018) is used to represent the all-in-focus im-
age. In this DIP, a noise image is input to a neural
network with an Encoder-Decoder structure, and the
image is obtained by optimizing the parameters of the
neural network according to the objective. This en-
ables image inpainting and noise reduction without
the need for training data (Hashimoto et al., 2021). In
the following, the image obtained under the network
parameter θ with noise N input is denoted as I(N, θ).
Thus, estimating the all-in-focus image is equivalent
to finding the optimal θ.
Next, we consider how to apply a viewpoint trans-
formation to the all-in-focus image I(N, θ) to repre-
sent sub-aperture images from different viewpoints.
To perform a viewpoint transformation, the dispar-
ity per pixel between the two images can be esti-
mated, and each pixel can be shifted using this dis-
parity. However, such pixel shifting operations are
highly nonlinear and difficult to use in an optimization
framework. In this study, we use a disparity transfor-
mation method using weight maps (Luo et al., 2018).
In this method, an image I
j
is prepared for the image
to be transformed by shifting the image by T
j
pixels.
Assuming that a mask image W
j
is obtained, where
the disparity of a pixel is set to 1 when the disparity is
T
j
and 0 otherwise, the disparity transformed image I
′
can be generated as follows:
I
′
=
∑
j
diag(W
j
)I
j
(4)
In other words, if W
j
can be estimated appropri-
ately, viewpoint transformation can be applied with-
out shifting pixels. Furthermore, since disparity is a
determined by the depth of an object, a common W
j
can be used for all sub-aperture images by preparing
an appropriate shifted image. In other words, the I
k
d
image at the kth viewpoint can be described as fol-
lows:
I
k
d
=
∑
j
diag(W
j
)I
jk
(5)
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
788