3 PROPOSED SYSTEM
The purpose of our system is to visualize thermal in-
formation as a 3D model, superimposing thermal in-
formation on the RGB images. Our system consists of
three stages: camera calibration, 3D reconstruction,
and superimposition of the images. First, we fix the
RGB and thermal cameras and calibrate them to ob-
tain the intrinsic and extrinsic parameters. After that,
we move the equipment to obtain a video sequence.
We then reconstruct a 3D point cloud by using LSD-
SLAM and add the thermal information to each point
cloud. Finally, we superimpose thermal information
on RGB images and create a mesh by Delaunay divi-
sion.
Thermal images have less features of brightness
than RGB images, and temperature changes more
smoothly than RGB values. Thus, commonly used
key point descriptors like SIFT (Lowe, 1999) or
SURF (Bay et al., 2006) fail to obtain feature points.
This makes it difficult to track camera motion or re-
construct structure from motion. The direct method
(Engel et al., 2013) that do not use feature points such
as LSD-SLAM, are also not able to track camera mo-
tion because thermal images do not produce enough
features of brightness. Figure 3 shows the difference
between semi-dense map for the same scene acquired
by each camera. The semi-dense map for the thermal
camera produces less depth value than the map from
the RGB camera, and tracking is lost if only the ther-
mal camera is used. Thus, using another camera is
a better way to visualize thermal information as 3D
structures.
3.1 Camera Calibration
Because the temperature of calibration boards is con-
stant, we cannot calibrate the thermal cameras in the
same way that we calibrate RGB cameras. Thus, we
must develop a calibration board that can be captured
by both RGB cameras and thermal cameras. (Prakash
et al., 2006) heated a calibration board with a flood
lamp and based on the emissivity difference between
the black and white regions, theycan detect the check-
ered pattern. We developed a checkered calibration
board that creates differences in temperature within
the pattern. We use an electric carpet and thermal in-
sulation material to generate the temperature differ-
ence, allowing calibration images to be obtained like
those shown in Figure 2. Based on this calibration
board, we can calibrate both the thermal and RGB
cameras. We use the method of (Zhang, 2000) to cal-
culate the internal parameter.
3.2 Reconstruct the Thermal 3D Point
Cloud
One of the methods of reconstructing 3D models by
using a monocular camera is SLAM. This method
tracks camera position and rotation and maps 3D
points that are used to obtain the scene. In our works,
we use LSD-SLAM which is one of method of SLAM
and visualize thermal information as 3D structures by
using it. Figure 4 shows a flow chart of the recon-
struction part of our system.
3.2.1 The Features of LSD-SLAM
Methods like PTAM and ORB-SLAM use a monoc-
ular RGB camera and detect feature points from im-
ages. In such methods, camera pose and translation
are estimated based on feature points, but only fea-
ture points are used to make the resulting map. Thus,
the 3D point clouds are very sparse. However, LSD-
SLAM uses the pixel values in the images to estimate
camera pose and translation, which results in more
robust estimations and denser 3D point clouds than in
previous works.
3.2.2 Process of LSD-SLAM
Camera pose and translation are estimated by com-
parison of the pixel values between the input frame
and the key frame. In LSD SLAM, a semi-dense map
of each input frame and its camera pose and posi-
tion are computed by matching with a key frame and
a 3D map recovered until the previous frame. The
semi-dense map is integrated with the 3D map. If the
matching score is less than a pre-defined threshold,
the key frame is replaced with the input frame. In the
proposed method, temperature image is also captured
when the key frame is replaced with the input frame,
so that the temperature is also added to the 3D map.
3.2.3 Map Constitution
The factors contained in the key frame constitute the
map. Each key frame contains the pixel values of the
input images, the depth map, and dispersion of the
depth map. In this work, some pixel in depth map
do not retain depth value. The map only keeps the
depth information that satisfies a given threshold. The
threshold is based on the values of the peripheral pix-
els. This results in a semi-dense map. We can use
a CPU because semi-dense map calculation requires
less computation power.