Figure  1:  Schematic  of  the  image-to-image  translation 
network.
 
It could also support communication between patients 
and  healthcare  professionals  by  aiding  in 
explanations of MRI images from other patients. 
In the remainder of this paper, we discuss related 
work in Section 2, introduce our proposed method in 
Section  3,  present  our  experimental  methods  and 
results  in  Section  4,  and  give  our  conclusions  in 
Section 5. 
2  RELATED WORK 
GANs have been used previously for image-to-image 
domain adaptation. Pix2pix (Isola et al., 2017) is an 
early,  well-known  method  for  image-to-image 
translation that uses conditional GANs to learn paired 
images.  However, Pix2pix uses  supervised learning 
and  a  large  number  of  paired  images,  so  some 
labeling  cost  is  required.  CycleGAN  (Zhu  et  al., 
2017)  is  a  method  for  image-to-image  translation 
without  learning  paired  images.  The  model  has  a 
cycle architecture that consists of two generators and 
two  discriminators  and  cycle  consistency  loss  for 
unsupervised  learning.  CycleGAN  requires  a 
relatively  small  number  of  unpaired  images  for 
training.  However,  it  is  difficult  to  adapt  the 
generated  image  to  specific  characteristics  (color, 
illumination, etc.) in a dataset. Recently, StarGAN-v2 
(Choi  et  al.,  2018;  Choi  et  al.,  2020)  has  been 
proposed  for  image-to-image  translation  across 
domains  and  can  reflect  styles  by  obtaining  style 
codes  from  datasets  automatically.  However,  it 
requires a large dataset for training, and it is difficult 
to  specify  a  specific  style  in  a  non-biased  dataset 
because  style  codes  are  automatically  obtained  by 
capturing  bias in  a training  dataset.  DeepHist  (Avi-
Aharon  et  al.,  2020)  can  reflect  specific  color 
characteristics  in  generated  images  by  using  a 
differentiable network with kernel density estimation 
of  color  histograms  from  the  generated  image. 
However, paired images are required for training. 
3  PROPOSED METHOD 
Here we propose a network model for image-to-image 
translation  that  is  based  on  the  cyclic  architecture  of 
CycleGAN  and  is  able  to  reflect  target  color 
characteristics. For training, our method requires only 
a relatively small amount of data and does not require 
pairing images. Furthermore, we introduced an archite-
cture to accept color histograms for the target domain. 
This  section  describes  the  details  of  this  architecture 
and the estimation of losses during training. 
3.1  Overview of the Network 
The  network  consists  of  two  generators  and  two 
discriminators as shown in Figure 1. 𝑋 and 𝑌 denote 
the  domains  in  image-to-image  translation. 𝐺 and 𝐹 
denote  the  generator  from  𝑋  to  𝑌  and  𝑌  to  𝑋 
respectively.  Each  of  them  generates  a  synthetic 
image (𝑋
 and 𝑌
) in the target domain from 
a real image (𝑋
 and 𝑌
) in the source domain. 
𝐷
 and 𝐷
 denote  the  discriminators  and 𝑋
 and 
𝑌
 denote  the  color  histograms  for  𝑋  and  𝑌, 
respectively.  These  histograms  are  input  into 𝐹 and 
𝐺, respectively, so that the generated synthetic images 
reflect  the  color  characteristics  obtained  from 
reference  images  (𝑋
 and 𝑌
)  in  the  respective 
target domains. Spectral normalization (Miyato et al., 
2018)  is  adopted  for  both  generators  and 
discriminators to stabilize training of this network. 
3.2  Importing Color Characteristics 
The  architecture  adopted  for 𝐺 and  𝐹 is  shown  in 
Figure  2.  The  color  distribution  is  imported  with 
reference  to  previous  methods.  First,  an  RGB 
histogram  is  obtained  from  an  image  in  the  target 
domain. Histograms for each color are concatenated 
and imported to the middle layer between the encoder 
and  the  decoder  of  the  generator.  The  purpose  of 
importing  the  histograms  is  to  import  color 
information  after  spatial  features  have  been 
convoluted.  A  translated  image  is  output  from  the 
decoder. To evaluate the color of the output image, 
L2 loss between histograms of the source and output 
images  is  obtained.  The  histograms  of  the  output 
image  are  obtained  by  kernel  density  estimation 
because it enables backpropagation and  updating of 
the network. Kernel density estimation is done using 
the following probability density function: