Figure 1: Schematic of the image-to-image translation
network.
It could also support communication between patients
and healthcare professionals by aiding in
explanations of MRI images from other patients.
In the remainder of this paper, we discuss related
work in Section 2, introduce our proposed method in
Section 3, present our experimental methods and
results in Section 4, and give our conclusions in
Section 5.
2 RELATED WORK
GANs have been used previously for image-to-image
domain adaptation. Pix2pix (Isola et al., 2017) is an
early, well-known method for image-to-image
translation that uses conditional GANs to learn paired
images. However, Pix2pix uses supervised learning
and a large number of paired images, so some
labeling cost is required. CycleGAN (Zhu et al.,
2017) is a method for image-to-image translation
without learning paired images. The model has a
cycle architecture that consists of two generators and
two discriminators and cycle consistency loss for
unsupervised learning. CycleGAN requires a
relatively small number of unpaired images for
training. However, it is difficult to adapt the
generated image to specific characteristics (color,
illumination, etc.) in a dataset. Recently, StarGAN-v2
(Choi et al., 2018; Choi et al., 2020) has been
proposed for image-to-image translation across
domains and can reflect styles by obtaining style
codes from datasets automatically. However, it
requires a large dataset for training, and it is difficult
to specify a specific style in a non-biased dataset
because style codes are automatically obtained by
capturing bias in a training dataset. DeepHist (Avi-
Aharon et al., 2020) can reflect specific color
characteristics in generated images by using a
differentiable network with kernel density estimation
of color histograms from the generated image.
However, paired images are required for training.
3 PROPOSED METHOD
Here we propose a network model for image-to-image
translation that is based on the cyclic architecture of
CycleGAN and is able to reflect target color
characteristics. For training, our method requires only
a relatively small amount of data and does not require
pairing images. Furthermore, we introduced an archite-
cture to accept color histograms for the target domain.
This section describes the details of this architecture
and the estimation of losses during training.
3.1 Overview of the Network
The network consists of two generators and two
discriminators as shown in Figure 1. 𝑋 and 𝑌 denote
the domains in image-to-image translation. 𝐺 and 𝐹
denote the generator from 𝑋 to 𝑌 and 𝑌 to 𝑋
respectively. Each of them generates a synthetic
image (𝑋
and 𝑌
) in the target domain from
a real image (𝑋
and 𝑌
) in the source domain.
𝐷
and 𝐷
denote the discriminators and 𝑋
and
𝑌
denote the color histograms for 𝑋 and 𝑌,
respectively. These histograms are input into 𝐹 and
𝐺, respectively, so that the generated synthetic images
reflect the color characteristics obtained from
reference images (𝑋
and 𝑌
) in the respective
target domains. Spectral normalization (Miyato et al.,
2018) is adopted for both generators and
discriminators to stabilize training of this network.
3.2 Importing Color Characteristics
The architecture adopted for 𝐺 and 𝐹 is shown in
Figure 2. The color distribution is imported with
reference to previous methods. First, an RGB
histogram is obtained from an image in the target
domain. Histograms for each color are concatenated
and imported to the middle layer between the encoder
and the decoder of the generator. The purpose of
importing the histograms is to import color
information after spatial features have been
convoluted. A translated image is output from the
decoder. To evaluate the color of the output image,
L2 loss between histograms of the source and output
images is obtained. The histograms of the output
image are obtained by kernel density estimation
because it enables backpropagation and updating of
the network. Kernel density estimation is done using
the following probability density function: