number of redundant images and explains the over-
fitting behaviour and the failure results in previous
work. For this reason, only every third image is taken
in the training set to yield a set with 10,027 image
pairs, while all of the images in the test set are used.
4.2 Training Setup
All experiments were implemented in Pytorch and
performed on an NVIDIA TITAN XP graphics card.
TIR2Lab (Berg et al., 2018) and TIC-CGAN (Kuang
et al., 2018) were re-implemented and trained as ex-
plained in the original papers.
The proposed model, TICPan, trained using
ADAM optimizer with default Pytorch parameters
and weights were initialized with He normal initial-
ization (He et al., 2015). All experiments were trained
for 1000 epochs and the learning rate was initial-
ized with 8e
−4
with decay after 400 epochs. The
LeakyReLU layers parameter was set to α = 0.2 and
the dropout layer was set to 0.5.
In each training batch, 32 cropped images of size
160 x 160 were randomly extracted. For each iter-
ation, a random augmentation was applied by flip-
ping horizontally or vertically and rotating in the
[−90
◦
, 90
◦
]. Since the number of training images in
KAIST-Ms is 14 times more than ULV-VT.v2, the
number of iterations for the model to train on the
ULV-VT.v2 was increased to match the model trained
on KAIST-MS.
For validation, the peak signal-to-noise ratio
(PSNR), structural similarity (SSIM) and root-mean-
square error (RMSE) were used between the gener-
ated colorized images and the true images.
4.3 Quantitative Evaluation
The proposed model was evaluated on transforming
thermal infrared images to RGB images compared
with the state-of-the-art using the measurement met-
rics shown in Table 1.
The proposed model evaluation was performed on
the full colorized thermal image, which is the result
of the fusion of the predicted visual LF information
and the input thermal HF information. This resulted
in a higher pixel-wise error compared to other models
since the HF content of the image was taken from the
thermal domain. However, our method achieved com-
parable results with the synthesized images as shown
in Fig. 3.
It is believed that the pixel-wise metrics are not
suitable for the colorization problem where the per-
ception of the image has an important role. The
TIR2Lab achieved higher evaluation values while
their generated images are uninterruptable. TIC-
CGAN has 12.266 million parameters that explain
the overfitting behaviour in its generated images.
TICPan-BN was excluded because it has the lowest
evaluation values and less comparable quality images.
4.4 Qualitative Evaluation
Four examples are presented in Fig. 8 on the ULB17-
VT.v2 dataset. The TIR2Lab model generated ap-
proximated good colour representations for trees with
blur effect but failed to produce fine textures and to
preserve the image content. On the hand, the TIC-
CGAN model generated better image colour quality
with fine textures and were more realistic. This is
very recognizable, as an over-fitting behaviour, when
the test image comes from the same distribution as the
densely represented images in the training set such as
image number (650).
TICPan generates images that have strong true
colour values for objects that are relatively fixed in
space and time, such as sky, tree leaves, and streets
and buildings. Sky is represented in white or light
blue colour, trees are in different shades of green, and
streets and buildings also represented with approxi-
mated true colour values. However, objects like hu-
mans are represented in grey or in black due to the
clipping effect. Our method assures that the object
thermal signature does not disappear in image trans-
formation or get deformed. The model cannot pre-
dict true colour values for the varying objects but it
predicts an averaged colour value represented in grey
and the final pansharpening process maintains their
appearance in the generated colourized images.
In Fig. 9 four examples are presented on the
KAIST-MS dataset. The TIR2Lab method produced
approximate good true chrominance values but it has
heavily blurred images and suffers from recovering
fine textures accurately. The produced artefacts are
very obvious in the generated images and some ob-
jects, such as the walking person in (S6V3I03016)
are missing in their outputs. The TIC-CGAN model
produced better perceptual colourized thermal images
with realistic textures and fine details, but they suffer
from the same countereffects of missing objects and
objects deformation. This is due to the use of GAN
adversarial loss which learns the dataset distribution
and estimates what should appear in each location,
and also because of the large size of the model and its
over-fitting behaviour. This is seen in (S8V2I01723)
in the falsely generated road surface markings and in
the missing person in (S6V3I03016). In contrast, the
proposed TICPan model does not generate very plau-
sible colour values in the KAIST-MS dataset but it
VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications
352