MSER-based Framework for Classification of Objects in Thermal
Images
Alia Aljasmi
1
and Andrzej
´
Sluzek
1,2 a
1
Khalifa University, Abu Dhabi, U.A.E.
2
Warsaw University of Life Sciences-SGGW, Warsaw, Poland
Keywords: Thermal Images, MSER, Object Detection, Shape Descriptors, Object Classification.
Abstract:
In this paper, the problem of multi-class object recognition in thermal images is discussed. An alternative
model of thermal objects is investigated, where an object is represented by multiple shapes extracted by MSER
detectors. The shapes are nested within the largest MSER outlining the object (which might be the actual
outline of the object, the outline of its thermal footprint or the outline of its largest prominent fragment).
We show, using a multi-class dataset of thermal images captured in indoor environments, that the proposed
methodology is a feasible solution for various object classification problems in thermal imaging. In particular,
no object-specific algorithms are needed, so that the method is applicable to most of typical applications of
thermal cameras (subject to general limitations of data captured by thermal imaging devices). The presented
work is considered a preliminary feasibility study exploring potentials an limits of thermal image classification
in more sophisticated machine vision problems.
1 INTRODUCTION
Thermal images are an alternative representation
of visually difficult environments (poor illumi-
nation, foggy/smoky conditions, confusing pat-
terns/camouflage etc.) which nevertheless contain ob-
jects of distinctive temperature profiles. The most
popular applications in visual surveillance and moni-
toring tasks include, see (Gade and Moeslund, 2014),
detection and tracking of moving objects (humans,
animals, vehicles, e.g., (Wang et al., 2010; Fernandez-
Caballero et al., 2014; Zhou et al., 2009; Christiansen
et al., 2014; Iwasaki et al., 2013), etc.), inspection,
security and quality control, and other selected indus-
trial applications (e.g., (Sirmacek et al., 2011; Vidas
et al., 2013; Ginesu et al., 2004; Ng et al., 2007; Meri-
audeau et al., 2010), etc.).
However, applications of thermal imaging in typ-
ical problems of multi-class object identification are
rather limited. This can be attributed to the follow-
ing factors. First, the spatial resolution of thermal
cameras is still low, compared to standard cameras.
Secondly, the visual distinctiveness in thermal im-
ages is rather poor due to heat radiation an dissipa-
tion. Therefore, objects in thermal images are typ-
a
https://orcid.org/0000-0003-4148-2600
ically blurred and poorly contrasted, where regions
(often with boundaries only approximately delimited)
are the sole available representation of those objects.
Correspondingly, very few experimental works have
been reported on classification of several types of ob-
jects within the same task, where only thermal imag-
ing is used (e.g. (Meis et al., 2003)). The majority
of thermal imaging applications applications focus on
object detection and subsequent tracking. Not surpris-
ingly, the diversity of features used in such works is
also limited (mostly binary regions and/or character-
istics of their boundaries) and the reported results are
not very impressive, even with features hand-crafted
for specific problems and a limited number of consid-
ered classes (as in (Meis et al., 2003)).
In this paper, object classification in thermal imag-
ing for is discussed from a more general perspective,
even though we (indirectly) focus on indoor tasks
(e.g. visual surveillance in dark premises). Primar-
ily, we investigate an alternative model of objects in
thermal images (where each instance of an object is
represented by multiple regions extracted by MSER
detectors (Matas et al., 2002; Nist
´
er and Stew
´
enius,
2008), as explained in Section 2). Subsequently, in
Section 3, we use a simple classification method to:
Identify 3D objects from a range of diversified
classes using regions extracted by MSER detec-
566
Aljasmi, A. and
´
Sluzek, A.
MSER-based Framework for Classification of Objects in Thermal Images.
DOI: 10.5220/0008116105660572
In Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2019), pages 566-572
ISBN: 978-989-758-380-3
Copyright
c
2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
tor from thermal images.
Distinguish (from a sequence of thermal frames)
between rigid (fixed-geometry) and articulated
(e.g. animals, humans, walking toys, etc.) ob-
jects. This is a supplementary objective.
The experimental results are also discussed in Sec-
tion 3. Finally, the paper is briefly summarized in
Section 4.
2 MSER-BASED MODELS OF
THERMAL OBJECTS
In thermal images, even objects with distinctive tem-
perature profiles are normally seen as diluted silhou-
ettes on images of rather poor quality. Such im-
ages can be subsequently binarized (into objects and
background) for further analysis and processing. Un-
fortunately, standard image thresholding algorithms
(e.g. (Sezgin and Sankur, 2004; Puneet and Garg,
2013)) cannot handle typical effects of thermal im-
ages (as illustrated in Fig. 1) so that accurate outlines
of the actual objects may not be reliably extracted.
Therefore, we propose an alternative approach
based on maximally stable extremum regions
Figure 1: Examples of raw-data thermal images of objects.
(MSERs). MSERs are the image fragments which are
least sensitive to the binarization threshold variations
(see (Matas et al., 2002; Nist
´
er and Stew
´
enius,
2008)). Therefore, MSER detector can identify in
infrared images even poorly contrasted fragments,
as long as these fragments have the most distinc-
tive thermal profiles within the processed image.
Actually, MSERs can be detected at very small com-
putational costs (including on-chip implementation
of the detector (Sluzek et al., 2019)) which makes
them particularly attractive for low-cost systems (i.e.
IoT devices).
Formally, binary MSER regions Q(t) (where t in-
dicates the threshold level) are detected as local min-
ima of the growth rate function q(t) defined by the
derivative of the region’s area over the threshold val-
ues:
q(t) =
d
dt
|
Q(t)
|
|
Q(t)
|
, (1)
where
|
·
|
represents the area of a region.
MSER detector has been reported in some ap-
plications of thermal images, e.g. (Lahouli et al.,
2018). Nevertheless, the most significant advantage
of MSER in thermal images, i.e. their ability to ex-
tract not only the outlines of objects, but also dis-
tinctive internal fragments of objects or their thermal
shadows on the surrounding scenes, etc. (as illus-
trated in examples in Fig. 2) is apparently not fully
exploited yet in the available literature. Therefore, we
propose to represented thermal objects by the family
of MSERs nested within the largest MSER outlining
the object.
This largest MSER can be the actual outline of the
object shape, the outline of its thermal footprint (i.e.
incorporating the heat radiation effects on the object
neighborhood) or the outline of the largest prominent
fragment within the object (if the whole object is in-
distinguishably blended with its background). Certain
practical constraints are obviously applied, namely re-
moval of too small/large MSERs from the processed
thermal images or rejection of MSERs which cannot
(for various reasons) represent physical objects.
An example of a simple thermal object repre-
sented by two binary shapes of its MSERs is given
in Fig. 3.
The practicality of this methods has been tested in
two experiments:
In the first experiment, the objective is to classify
detected in thermal images objects from a collec-
tion of exemplary classes of 3D objects.
In the second experiment, we attempt to classify
moving objects into either rigid category (bod-
ies of fixed geometry) or articulated category (i.e.
MSER-based Framework for Classification of Objects in Thermal Images
567
mechanical or biological bodies changing their
configuration while in motion).
Figure 2: Examples of MSER detection in thermal images.
Figure 3: Example of a thermal object represented by two
shapes of its MSERs.
All dataset and test images for the experiments are
captured in natural indoor environments at 640 × 480
resolution, using FLIR C2 thermal camera.
2.1 Dataset for Multi-class Recognition
For the multi-class recognition experiment, a dataset
of over 700 images has been collected for 13 diver-
sified objects, including 8 rigid objects (glass plant,
cup, bottle of water, bottle of juice, iron, plate, sta-
pler and kettle) and 5 fully or partially articulated
objects (natural flower, pot plant, woman in abaya,
bicycle and teddybear). For rigid objects, the im-
ages were captured from sufficiently diversified view-
points, while for articulated objects various geometric
configurations were additionally taken into consider-
ation. Examples of the dataset images (for two differ-
ent objects) are shown in Fig. 4.
Eventually, each class C
i
is modeled by
a collection of all binary MSER regions
{
MSER
1
(C
i
), ..., MSER
n
i
(C
i
)
}
extracted from all
ground-truth examples of the corresponding category.
2.2 Datasets for Rigid-articulated
Categorization
In this experiment, no permanent dataset is used. In-
stead, temporary reference datasets are dynamically
updated from the most recent 5 frames
{
t, ..., t 4
}
containing the object of interest O. Binary MSER
shapes (extracted from the object in the same way as
above) are grouped within the corresponding frames,
i.e.
ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics
568
Frame0
{
MSER
1
(O
t
), ..., MSER
n
0
(O
t
)
}
...
Frame4
{
MSER
1
(O
t4
), ..., MSER
n
4
(O
t4
)
}
The union of these groups of shapes is considered the
currently used representation of the analyzed object.
Figure 4: Examples of thermal dataset images for two
classes.
3 OBJECT CLASSIFICATION BY
SHAPE DESCRIPTORS
Since (as discussed in Section 2) the thermal objects
of interest (and classes of objects) are eventually rep-
resented by collections of binary shapes, only binary
shape descriptors can be used for object classifica-
tion. Because regions extracted from thermal images
are generally smooth and without intricate shape de-
tails, we preliminarily selected very popular Hu mo-
ment invariants, e.g. (Sluzek, 1995; Flusser, 2000), to
represent MSER regions by 7D vectors (with some
refinements as specified below):
V 7 =
{
I
1
, ..., I
7
}
(2)
where I
i
are the original invariants φ
i
from (Hu,
1962) normalized to have the same mean and stan-
dard deviation values. The normalization was done
on a popular benchmark dataset of binary shapes
MPEG7
1
, and the first Hu invariant was used as the
reference.
It was later experimentally verified that vectors
of invariant with the dimensionality reduced (by us-
ing PCA) to 3D provide practically the same perfor-
mances, so that the regions are alternatively repre-
sented by V 3 vectors
V 3 =
{
I(PCA)
1
, I(PCA)
2
, I(PCA)
3
}
(3)
Now, similarity between MSER regions (which is
needed for the region-based classification of thermal
objects) can be defined as follows. Given a dataset
region R
D
and an object region R
O
, the level of simi-
larity between R
O
and R
D
is straightforwardly defined
as:
Sim(R
O
, R
D
) = 1
k
V 7(R
O
) V 7(R
D
)
k
k
V 7(R
D
)
k
(4)
or
Sim(R
O
, R
D
) = 1
k
V 3(R
O
) V 3(R
D
)
k
k
V 3(R
D
)
k
(5)
where
k
·
k
is the vector norm.
Eventually, we consider two regions similar if
their level of similarity by Eq. 4 or Eq. 5 exceeds the
predefined (established experimentally) threshold.
Two examples of diversified similarity levels be-
tween binary regions are given in Fig. 5.
1
http://www.dabi.temple.edu/
shape/MPEG7/
MPEG7dataset.zip
MSER-based Framework for Classification of Objects in Thermal Images
569
Figure 5: Regions with 0.94 similarity (top row) and with
0.54 similarity (bottom row).
3.1 Multi-class Recognition of Thermal
Objects
Given a thermal object O (represented by a number of
binary MSER shapes
{
MSER
1
(O), ..., MSER
m
(O)
}
)
and the class C
i
(modeled, as defined in
Subsection 2.1, by binary MSER shapes
{
MSER
1
(C
i
), ..., MSER
n
i
(C
i
)
}
) we assume that
O is similar to C
i
class (i.e. it can potentially be a
member of this class) if:
1. Several shapes from
{
MSER
1
(O), ..., MSER
m
(O)
}
set are similar
to some
{
MSER
1
(C
i
), ..., MSER
n
i
(C
i
)
}
shapes.
In practice, the minimum number of similarities
may be required.
2. If O object is similar to too many classes, only the
classes with the highest numbers of inter-shape
similarities (e.g. top three) are eventually ac-
cepted. This assumption is applied in our tests.
For the actual tests, we selected only five classes
from the developed dataset, namely glass plant,
woman in abaya, bottle of juice, teddybear and
pot
plant (the remaining classes acting as confusion
data only). Fig. 6 shows the confusion matrix of
the classification statistics. Unfortunately, we did
not find any suitable benchmark to compare to, but
the obtained results can be approximately compared
to (Meis et al., 2003) (which is the only similar ex-
ample we found of multi-class recognition in thermal
objects) and the outcome should be considered satis-
factory.
Figure 6: The confusion matrix for 5-class test results.
3.2 Recognition of Rigid and
Articulated Thermal Objects
In the second experiment, we tested the method’s abil-
ity to distinguish between rigid and articulated ob-
jects. Objects of both categories can move (with re-
spect to he camera) but over short periods of time only
the articulated objects are expected to significantly
change their shapes in he captured thermal images.
Thus, as explained in Subsection 2.2, the sequence of
most recent five images (frames) is used to identify
the object category. As shown in Fig. 7, we build a
matrix of inter-frame similarities, where × marker in-
dicates that similar MSER regions are found in the
corresponding pair of frames. If less than 8 entries
(i.e. 40%) are marked, the object is considered artic-
ulated, and if the number of marked entries exceeds
12 (i.e. 60%) the object is recognized as rigid. Other-
wise, no decision is made.
Figure 7: Exemplary similarity matrix between 5 subse-
quent frames (the content of this matrix represents an ar-
ticulated object).
Performances of this approach are illustrated by
the confusion matrix in Fig. 8. Though some rigid
objects are wrongly classified as articulated (as they
may move/rotate relatively to the camera, or the ther-
mal conditions of the scene change) we have not
found any case of an articulated object (performing
the actual motion) recognized as rigid. The percent-
age of unknown decisions is moderate. Altogether, we
consider these results satisfactory, at least within the
classes of tested objects.
ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics
570
Figure 8: The confusion matrix for the results of rigid-
articulated classification.
Examples of rigid and articulated sequences are
given in Fig. 9, where only the external outline
MSERs are shown.
Figure 9: Examples of sequences showing rigid (two top
rows) and articulated (bottom rows) objects.
4 CONCLUDING REMARKS
The presented work is a preliminary feasibility study
exploring potentials of thermal imaging in more so-
phisticated applications than typical detection and
tracking tasks. In particular, we consider the prospec-
tive needs of visual surveillance and monitoring sys-
tems in environments which should remain dark. In
many such problems, the major task is not to de-
tect the presence of thermally distinctive objects, but
rather to classify them in terms of their identity and/or
behavior (e.g. to identify dangerous or critical scenar-
ios).
Our results indicate that such results are practical
(subject to well-known limitations of thermal imag-
ing) under some constraints, e.g. with rather limited
numbers of object classes, non-overlapping objects,
etc.
We can also conclude that for thermal im-
ages suitable representation of objects is, in gen-
eral applications, more critical than the specific fea-
tures/descriptors. The presented results have been ob-
tained usign (deliberately) simplified shape descrip-
tors. Because of such a simplification, the presented
algorithms are suitable for low-cost solutions (includ-
ing IoT devices, small robotic systems, etc.).
REFERENCES
Christiansen, P., K.A. Steen, R. J., and Karstoft, H. (2014).
Automated detection and recognition of wildlife using
thermal cameras. Sensors, 14(8):13778–13793.
Fernandez-Caballero, A., Lopez, M., and Serrano-Cuerda,
J. (2014). Thermal-infrared pedestrian roi extraction
through thermal and motion information fusion. Sen-
sors, 14(4):6666–6676.
Flusser, J. (2000). On the independence of rotation moment
invariants. Pattern Recognition, 33:1405–1410.
Gade, R. and Moeslund, T. (2014). Thermal cameras and
applications: a survey. Machine Vision & Applica-
tions, 25(1):245–262.
Ginesu, G., Giusto, D., Margner, V., and Meinlschmidt, P.
(2004). Detection of foreign bodies in food by thermal
image processing. IEEE Trans. on Industrial Elec-
tronics, 51(2):480–490.
Hu, M. (1962). Visual pattern recognition by moment in-
variants. IRE Transactions on Information Theory,
8(2):179–187.
Iwasaki, Y., Misumi, M., and Nakamiya, T. (2013). Ro-
bust vehicle detection under various environmental
conditions using an infrared thermal camera and its
application to road traffic flow monitoring. Sensors,
13(6):7756–7773.
Lahouli, I., Haelterman, R., Chtourou, Z., Cubber, G., and
Attia, R. (2018). Pedestrian detection and tracking in
thermal images from aerial mpeg videos. In Proc.13
Int. Joint Conf. VISIGRAPP 2018, volume 5(VIS-
APP), pages 487–495.
Matas, J., Chum, O., Urban, M., and Pajdla, T. (2002). Ro-
bust wide baseline stereo from maximally stable ex-
tremal regions. In Proc. British Machine Vision Con-
ference, pages 384–393.
Meis, U., Ritter, W., and Neumann, H. (2003). Detection
and classification of obstacles in night vision traffic
MSER-based Framework for Classification of Objects in Thermal Images
571
scenes based on infrared imagery. In Proc. 2003 IEEE
Int. Conf. on Intelligent Transportation Systems, pages
1140–1144.
Meriaudeau, F., Secades, L., Eren, G., Ercil, A., Truchetet,
F., Aubreto, O., and Fofi, D. (2010). 3-d scanning
of nonopaque objects by means of imaging emitted
structured infrared patterns. IEEE Trans. on Instru-
mentation and Measurement, 59(11):2898–2906.
Ng, Y.-M., Yu, M., Huang, Y., and Du, R. (2007). Diagnosis
of sheet metal stamping processes based on 3-d ther-
mal energy distribution. IEEE Trans. on Automation
Science & Eng., 4(1):22–30.
Nist
´
er, D. and Stew
´
enius, H. (2008). Linear time maximally
stable extremal regions. In Proc. 10th European Conf.
ECCV 2008, pages 183–196.
Puneet, G. and Garg, N. (2013). Binarization techniques
used for grey scale images. Int. Journal of Computer
Applications, 71(1):8–11.
Sezgin, M. and Sankur, B. (2004). Survey over im-
age thresholding techniques and quantitative perfor-
mance evaluation. Journal of Electronic Imaging,
13(1):146–168.
Sirmacek, B., Hoegner, L., and Stilla, U. (2011). Detection
of windows and doors from thermal images by group-
ing geometrical features. In Proc. 2011 Joint Urban
Remote Sensing Event.
Sluzek, A. (1995). Identifcation and inspection of 2-d ob-
jects using new moment-based shape descriptors. Pat-
tern Recognition Letters, 16(7):687–697.
Sluzek, A., Saleh, H., Mohammad, B., Al-Qutayri, M., and
Ismail, M. (2019). Mser-in-chip: An efficient vision
tool for iot devices. In Elfadel, I. and M.Ismail, edi-
tors, Innovations in Intelligent Image Analysis, pages
245–259. Springer.
Vidas, S., Moghadam, P., and Bosse, M. (2013). 3d ther-
mal mapping of building interiors using an rgb-d and
thermal camera. In Proc. 2013 IEEE Robotics & Au-
tomation Conf. (ICRA), pages 2311–2318.
Wang, W., Zhang, J., and Shen, C. (2010). Improved hu-
man detection and classification in thermal images. In
Proc. 2010 IEEE ICIP Conference, pages 2313–2316.
Zhou, D., Dillon, M., and Kwon, E. (2009). Tracking-based
deer vehicle collision detection using thermal imag-
ing. In Proc. IEEE Int. Conference on Robotics and
Biomimetics (ROBIO).
ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics
572