Image-to-Image Translation Based on CycleGAN: From CT to MRI
Chenjie Ni
School of Artificial Intelligence, Southeast University, Nanjing, China
Keywords: Computed Tomography, Magnetic Resonance Imaging, Image translation, CycleGAN.
Abstract: Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) have equal importance in routine
examinations. However, in some cases, one certain type may not be available due to limitations in condition.
Therefore, it is necessary to establish a connection between CT and MRI images. With the idea of image-to-
image translation, this study proposes using the Cycle-Consistent Generative Adversarial Networks
(CycleGAN) structure to build a mapping between these two kinds of medical images. Through the
combination of Resnet Generator as well as Patch Generative Adversarial Networks (PatchGAN)
Discriminator, the CycleGAN model is trained bidirectionally to achieve cyclic translation. Both qualitative
and quantitative evaluations are implemented to highlight the model's effectiveness in transforming CT or
MRI images from either direction to the other. In addition, the CycleGAN model excels particularly in cycle
consistency, meaning a realistic recovery of the transformed images. Therefore, this study presents a powerful
way for achieving mutual conversion between CT and MRI images, which is especially meaningful to
diagnosis with limited information. In addition, this research also suggests the potential of image-to-image
translation in medical image processing. Future research directions can be set upon this study to further
improve the clarity of images and reduce noise so that the generated results can be truly used for clinical
diagnosis.
1 INTRODUCTION
CT and MRI are two basic ways of getting information
about the diseased region during diagnosis (Kidwell
and Amie 2006). Yet each of these two methods has
its advantages and limitations. For CT, the advantages
lie in its short examination time, low cost, and wider
application range (Angela and Müller 2011).
However, CT has radiation and is not suitable for
pregnant women and children. Meanwhile, the
contrast resolution of CT is relatively low. Concerning
MRI, it is non-invasive to the human body, with
diverse parameters and the freedom to choose the
orientation for imaging (Beek and Eric 2008). But it
also brings drawbacks such as long scanning time,
large noise, and expensive equipment. In addition, due
to the strong magnetic field during operation, it cannot
be used for patients with ferromagnetic substances in
their bodies. Considering the equal importance of
these two methods, it is necessary to establish a
connection between CT and MRI images to provide
more information for constrained diagnosis.
Many studies have proposed meaningful methods
to build this link or create new images based on
existing information. For example, Han Xiao
attempted to reconstruct CT images from MRI by
using a deep convolutional neural network (Han
2017). Toda Ryo et al. attempted to use semi-
conditional Information Generative Adversarial
Networks (InfoGAN) to synthesize CT images of
certain types of lung cancer (Toda et al 2021).
Alrashedy, Halima Hamid N. et al. proposed Brain
Generative Adversarial Networks (BrainGAN),
combining Generative Adversarial Networks (GAN)
architectures with Convolutional Neural Network
(CNN) models to generate MRI images (Alrashedy et
al 2022). Kwon Gihyun et al. used auto-encoding
generative adversarial networks to generate 3D brain
MRI images (Kwon et al 2019). However, the above-
mentioned studies as well as most of the existing
methods can only achieve unidirectional image
synthesis like synthesizing MRI images with CT
images. This deficiency has put some constraints on
doctors to get full information on the patients. Yet in
recent years, the task of image-to-image translation
has been broadly discussed, bringing some new ideas
for connecting CT and MRI images (Isola et al 2017).
Using a training set of aligned image pairs, image-to-
image translation aims to learn the mapping between
an input image and an output image. While there have
already been a lot of existing applications of image
Ni, C.
Image-to-Image Translation Based on CycleGAN: From CT to MRI.
DOI: 10.5220/0012799400003885
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Data Analysis and Machine Learning (DAML 2023), pages 229-233
ISBN: 978-989-758-705-4
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
229
translation (e.g. Chen et al 2021), little attention was
paid to the field of medicine. The idea of image
translation is very suitable for constructing a
bidirectional change pathway between CT and MRI.
Given the facts above, the main objective of this
study is to enable a free switch between CT and MRI
images. Specifically, the ct2mri dataset is
preprocessed first, including the partition of the
training set and test set, as well as resizing images.
Second, CycleGAN structure is introduced to achieve
this translation process (Zhu et al 2017). CycleGAN is
a powerful model that can learn to translate images
between different styles without paired examples.
This independence of paired images is especially
helpful to the connection of CT and MRI because their
images always vary greatly in properties. The process
of CycleGAN can be concluded as training one pair of
generator and discriminator for each direction. For
valid image translation, constraints on loss are added
to ensure consistent content with different styles.
Through pairs of generator and discriminator in
CycleGAN, features of the images are extracted and
reorganized to construct mappings between two
domains. Thus, a direct connection between images of
different domains is learned, allowing the model to
convert any related images into each other. The
experimental results demonstrate a satisfying
performance in the bidirectional translation of CT and
MRI images. This kind of translation model can help
doctors quickly and effectively obtain the necessary
information when conditions are limited, such as when
one of the medical images is unavailable due to patient
reasons.
2 METHODOLOGY
2.1 Dataset Description and
Preprocessing
The dataset used in this study is sourced from Kaggle
called CT and MRI brain scans (CT and MRI brain
scans 2020). It contains a total number of 4974 images
of the results of CT and MRI brain scans. The size of
these images is not uniform, for the training process of
CycleGAN is unpaired, which means it is not affected
by whether the image size corresponds or not. The
images have been pre-adjusted to make sure that the
results of brain scans are in the center and take up
approximately even space in every image.
The goal of the experiment is to learn a map
between CT and MRI images in the dataset. To be
loaded for use in a CycleGAN implementation for
image-to-image translation, all of the CT and MRI
brain images are organized into a directory structure
and labeled as A and B respectively, with 2486 CT
images for A and 2488 MRI images for B. Also, for
model evaluation, a training set and a testing set are
created from each of the parts with a ratio of 70% to
30%. Fig. 1 displays a typical pair of CT and MRI
images from this collection.
Figure 1: An illustration of a CT and MRI image from the
dataset of CT and MRI brain scans (Picture credit:
Original).
2.2 Proposed Approach
The core issue of this proposed approach for CT and
MRI image translation lies in constructing a complete
structure of CycleGAN. This involves choosing
proper network structure for both generator and
discriminator in each direction, as well as a powerful
loss function to drive the entire training process. For
the Discriminator, it is chosen to have a PatchGAN
structure with a patch size of 70x70; For the
Generator, several Resnet Blocks are utilized to build
the whole network. With regard to the loss function,
GAN loss and cycle consistent loss are combined to
ensure better performance. Fig. 2 illustrates the
structure of the system.
Figure 2: Composition of the model (Picture credit:
Original).
2.2.1 ResNet Generator
ResNet is a well-known convolutional neural network
with efficient performance regarding vanishing or
exploding gradient problems. Resnet Block takes a
step further. It draws on the core ideas of Resnet,
which is "skip connection", and generalizes into a
universal neural network layer structure to have two
convolutional layers and a skip connection. This
DAML 2023 - International Conference on Data Analysis and Machine Learning
230
method largely prevents the degradation of deep
neural networks. The generators in this study are
designed to be mainly made up of 9 Resnet Blocks,
with reflection padding inside convolutional layers to
preserve edge information of images. In addition,
downsampling is implemented prior to the input of
Resnet Blocks to reduce subsequent computational
complexity, together with upsampling after Resnet
Blocks to recover the image size. In the last
convolutional layer, 64 generator filters with size 7x7
are created to contain the generated information. All
of the operations above contribute to achieving better
results of feature extraction, pushing the generator to
generate more realistic images as well as enhancing
robustness. Fig. 3 shows the basic sequence of the
generator structure.
Figure 3: Sequential layer structure of the Resnet generator
(Picture credit: Original).
2.2.2 PatchGAN Discriminator
Initially, PatchGAN is proposed to solve ambiguous
generation in L2 or L1 loss cases. Instead of dealing
with the whole image at one time, PatchGAN focuses
on local image patches step by step and penalizes
structure at the scale of patches. Convolutationally
scanning across the image, this discriminator aims to
decide whether each patch is fake or not, and finally
collect all responses to provide the ultimate output.
This kind of patch-based structure has fewer
parameters than a discriminator dealing with full
images, which can greatly accelerate the training
process. Besides, discriminator with PatchGAN
structure can be applied to arbitrarily-sized images
process, providing great convenience for this study.
According to the suggestions in the original paper, the
patch size in this study is set to 70x70 to get an
optimal performance. In addition, for network
architecture, the PatchGAN discriminator is
constructed through 3 main convolutional layers with
an increasing number of filters and converges to one
output channel by performing convolution processing
again in the end to get the predicted results.
2.2.3 Loss function
It is critical to choose the right loss function in the
training of deep learning models, especially in
generational ones. As for this image-to-image
translation task, the full objective loss function
mainly consists of two terms: The first is adversarial
losses. Here an improved version of vanilla GAN
losses proposed in Zhu et al’s study is implemented.
It is called LSGAN loss:
𝑙

𝐺
, 𝐷
𝔼
~

𝐷
𝑦
1
𝔼
~

𝐷
𝐺
𝑥
(1)
The above formula illustrates the form of LSGAN
loss, where
𝐺 represents the generator mapping from
𝑋 to
𝑌, while 𝐷
denotes the discriminator on domain
𝑌. LSGAN loss substitutes a least square loss for the
original negative log-likelihood, which brings a more
stable training as well as better performance. For the
opposite direction, there is also a similar function
𝑙

𝐺
, 𝐷
.
The second part is defined as cycle consistency
loss:
𝑙

𝐺
, 𝐺
𝔼
~

𝐺
𝐺
𝑥
𝑥

𝔼
~

𝐺
𝐺
𝑥
𝑦
(2)
where 𝐺 and 𝐹 represent two generators. The cycle
consistency loss guarantees that the cycle of image
translation is able to bring the input back to the
original image as similarly as possible. Then the full
objective is established through a combination of the
following forms:
𝑙
𝐺
, 𝐺
, 𝐷
, 𝐷
𝑙

𝐺
, 𝐷
𝑙

𝐺
, 𝐷
𝜆𝑙

𝐺
, 𝐺
(3)
where
𝜆 controls the relative weight of two different
types of loss. This parameter was determined through
hyperparameter tuning to ensure optimal
performance. To prevent overfitting, instance
normalization is implemented. In comparison to
traditional batch normalization, instance
normalization performs better in image translation
because, for this type of task, each pixel of the input
sample is crucial to the training process.
Image-to-Image Translation Based on CycleGAN: From CT to MRI
231
2.3 Implementation Details
In the training process of the suggested model, several
important aspects are highlighted. Firstly, Adam is
chosen to be the optimizer of all generators and
discriminators in CycleGAN because of its satisfying
performance concerning gradient descent in high-
dimensional spaces. Speaking of hyperparameters, in
the first 50 training epochs, the learning rate is fixed
at 0.0002, and in the subsequent 50 training epochs, it
decreases linearly to zero. This can make sure that the
model learns more at the beginning, and keeps the
parameters almost unchanged near the end to reduce
the probability of overfitting. The momentum term of
Adam is set to be 0.5. Limited by equipment
RTX3060, the batch size during training is constrained
to 2, and the model trains for a total of 100 epochs.
3 RESULTS AND DISCUSSION
As a generative model, evaluation of the performance
usually focuses on observing the generation results
through the test set on the trained model. Specifically,
the results of this study will be discussed through the
method of visualization as well as generation
accuracy. For testing and evaluation, 744 unpaired CT
and MRI images are prepared to give translation. Here
only the translation results of the model from CT
images to MRI images and back to CT will be shown.
It is because, for CycleGAN, the results of image
translation from both two directions (which is CT-
MRI-CT and MRI-CT-MRI) should be equivalent in
performance.
3.1 Visualization Analysis
Some typical test outputs are selected to be
demonstrated in Fig. 4 below. From left to right, the
generated MRI image, original MRI image, restored
CT image, and original CT image are sequentially
displayed in columns.
Figure 4: Typical outputs of the constructed CycleGAN
(Picture credit: Original).
It can be intuitively seen from Fig. 4 that the
CycleGAN model constructed in this study effectively
maps the given CT images into MRI ones, with
necessary details as well as correct contour. Thanks to
the delicate structure of the Resnet Generator, the
CycleGAN model has such a strong feature extraction
ability that it can rebuild most of the detailed
information of the real images. Besides, PatchGAN
Discriminator enhanced the refinement of the
generator as well by serving as an adversarial part,
forcing the generator to pay more attention to details.
Though defects can be observed such as there is still
residual information from the original image, it is
caused by the nature of CycleGAN, which tends to
preserve the content. Nevertheless, the CycleGAN
model still establishes a valid connection between CT
and MRI images from a visual perspective.
At the same time, the model almost perfectly
recovers the transformed images back into the original
ones. This means that the CycleGAN model in this
study has a strong cycle consistency, which should be
attributed to the powerful constraint of cycle
consistency loss in the loss function on the generation
of image content. In addition, the results imply that the
parameter λ is not obtained too morbidly to cause
failures in image generation, proving a success in
hyperparameter tuning.
3.2 Generation Accuracy
In this work, the structural similarity index measure
(SSIM) is utilized to assess the trained model's
generation accuracy. The SSIM metric extracts three
key features from an image: brightness, contrast, and
structure, which are used to measure the similarity
between two given images. Implementing this metric
through the outputs of the test set, the model gets an
average score of 0.4038 on the generated MRI images
and 0.9642 on the recovery of the translated images.
SSIM metric provides a quantified summary of the
performance of the CycleGAN model. Combined with
the visualization results, it can be concluded that the
CycleGAN model has no problems in generating most
of the image details, but still faces challenges in terms
of image brightness and clarity, which is caused by
CycleGAN’s property of keeping the original
structure information as is discussed before. This
observation raises the necessity for some structural
alteration on the CycleGAN model to eliminate excess
information.
DAML 2023 - International Conference on Data Analysis and Machine Learning
232
4 CONCLUSION
This article introduces an approach employing a
CycleGAN architecture to decipher the intricate
mapping relationship between CT and MRI images,
with a curated dataset of brain scans serving as the
primary data source. The model exhibits remarkable
performance in feature analysis and extraction,
leveraging the Resnet and PatchGAN architectures for
its generator and discriminator components. This
choice empowers the model to excel in capturing
salient features and fostering discriminative
capabilities.
An extensive series of experiments has been
meticulously conducted to evaluate the proposed
methodology, employing a range of qualitative and
quantitative metrics. The results garnered from these
experiments on the CT and MRI brain scan dataset are
highly promising. The CycleGAN model successfully
forges a meaningful connection between CT and MRI
images, preserving intricate details and structural
integrity. Moreover, the model demonstrates robust
cycle consistency, affirmed through both visual
inspection and the SSIM.
The model's remarkable image generation
capabilities can be attributed to ResNet's ability to
retain vital input information and PatchGAN's
effectiveness in scrutinizing generated images at the
patch level. It is important to acknowledge that future
research endeavors will be primarily dedicated to
refining the model's architecture to address any
identified limitations. Additionally, the exploration of
a diverse range of models for enhancing performance
in the domain of image translation will remain a focal
point in upcoming research pursuits. This
commitment to continuous improvement underscores
the model's potential contributions to the field of
medical imaging.
REFERENCES
S. Kidwell, W. Amie, “Imaging of the brain and cerebral
vasculature in patients with suspected stroke
advantages and disadvantages of CT and MRI,” Current
neurology and neuroscience reports, vol 6, 2006, pp. 9-
16.
C. Angela, P. Müller, “Introduction to computed
tomography,” Kgs. Lyngby: DTU Mechanical
Engineering, 2011
J. R. Beek, A. Eric, “Hoffman Functional imaging: CT and
MRI Clinics in chest medicine, vol. 29, 2008, pp. 195-
216
X. Han, “MRbased synthetic CT generation using a deep
convolutional neural network method Medical physics,
vol. 44, 2017, pp. 1408-1419
R. Toda et al, “Synthetic CT image generation of shape-
controlled lung cancer using semi-conditional
InfoGAN and its applicability for type classification,”
International Journal of Computer Assisted Radiology
and Surgery, vol. 16, 2021, pp. 241-251
H. Alrashedy, N. Hamid et al, “BrainGAN: brain MRI
image generation and classification framework using
GAN architectures and CNN models,” Sensors, vol. 22,
2022, pp. 4297
G. Kwon, H. Chihye, D. Kim, “Generation of 3D brain MRI
using auto-encoding generative adversarial networks,”
International Conference on Medical Image Computing
and Computer-Assisted Intervention Cham, Springer,
2019
P. Isola et al, “Image-to-image translation with conditional
adversarial networks,” Proceedings of the IEEE
conference on computer vision and pattern recognition,
2017
Z. Chen et al. “Semantic segmentation for partially
occluded apple trees based on deep learning,”
Computers and Electronics in Agriculture, vol. 181,
2021, p. 105952
J. Y. Zhu et al, “Unpaired image-to-image translation using
cycle-consistent adversarial networks,” Proceedings of
the IEEE international conference on computer vision,
2017
CT and MRI brain scans https://www.kaggle.com/
datasets/darren2020/ct-to-mri-cgan
Image-to-Image Translation Based on CycleGAN: From CT to MRI
233