MSER-based Framework for Classiﬁcation of Objects in Thermal

Images

Alia Aljasmi

and Andrzej

Sluzek

1,2 a

Khalifa University, Abu Dhabi, U.A.E.

Warsaw University of Life Sciences-SGGW, Warsaw, Poland

Keywords: Thermal Images, MSER, Object Detection, Shape Descriptors, Object Classiﬁcation.

Abstract:

In this paper, the problem of multi-class object recognition in thermal images is discussed. An alternative

model of thermal objects is investigated, where an object is represented by multiple shapes extracted by MSER

detectors. The shapes are nested within the largest MSER outlining the object (which might be the actual

outline of the object, the outline of its thermal footprint or the outline of its largest prominent fragment).

We show, using a multi-class dataset of thermal images captured in indoor environments, that the proposed

methodology is a feasible solution for various object classiﬁcation problems in thermal imaging. In particular,

no object-speciﬁc algorithms are needed, so that the method is applicable to most of typical applications of

thermal cameras (subject to general limitations of data captured by thermal imaging devices). The presented

work is considered a preliminary feasibility study exploring potentials an limits of thermal image classiﬁcation

in more sophisticated machine vision problems.

1 INTRODUCTION

Thermal images are an alternative representation

of visually difﬁcult environments (poor illumi-

nation, foggy/smoky conditions, confusing pat-

terns/camouﬂage etc.) which nevertheless contain ob-

jects of distinctive temperature proﬁles. The most

popular applications in visual surveillance and moni-

toring tasks include, see (Gade and Moeslund, 2014),

detection and tracking of moving objects (humans,

animals, vehicles, e.g., (Wang et al., 2010; Fernandez-

Caballero et al., 2014; Zhou et al., 2009; Christiansen

et al., 2014; Iwasaki et al., 2013), etc.), inspection,

security and quality control, and other selected indus-

trial applications (e.g., (Sirmacek et al., 2011; Vidas

et al., 2013; Ginesu et al., 2004; Ng et al., 2007; Meri-

audeau et al., 2010), etc.).

However, applications of thermal imaging in typ-

ical problems of multi-class object identiﬁcation are

rather limited. This can be attributed to the follow-

ing factors. First, the spatial resolution of thermal

cameras is still low, compared to standard cameras.

Secondly, the visual distinctiveness in thermal im-

ages is rather poor due to heat radiation an dissipa-

tion. Therefore, objects in thermal images are typ-

https://orcid.org/0000-0003-4148-2600

ically blurred and poorly contrasted, where regions

(often with boundaries only approximately delimited)

are the sole available representation of those objects.

Correspondingly, very few experimental works have

been reported on classiﬁcation of several types of ob-

jects within the same task, where only thermal imag-

ing is used (e.g. (Meis et al., 2003)). The majority

of thermal imaging applications applications focus on

object detection and subsequent tracking. Not surpris-

ingly, the diversity of features used in such works is

also limited (mostly binary regions and/or character-

istics of their boundaries) and the reported results are

not very impressive, even with features hand-crafted

for speciﬁc problems and a limited number of consid-

ered classes (as in (Meis et al., 2003)).

In this paper, object classiﬁcation in thermal imag-

ing for is discussed from a more general perspective,

even though we (indirectly) focus on indoor tasks

(e.g. visual surveillance in dark premises). Primar-

ily, we investigate an alternative model of objects in

thermal images (where each instance of an object is

represented by multiple regions extracted by MSER

detectors (Matas et al., 2002; Nist

er and Stew

enius,

2008), as explained in Section 2). Subsequently, in

Section 3, we use a simple classiﬁcation method to:

• Identify 3D objects from a range of diversiﬁed

classes using regions extracted by MSER detec-

566

Aljasmi, A. and

Sluzek, A.

MSER-based Framework for Classiﬁcation of Objects in Thermal Images.

DOI: 10.5220/0008116105660572

In Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2019), pages 566-572

ISBN: 978-989-758-380-3

tor from thermal images.

• Distinguish (from a sequence of thermal frames)

between rigid (ﬁxed-geometry) and articulated

(e.g. animals, humans, walking toys, etc.) ob-

jects. This is a supplementary objective.

The experimental results are also discussed in Sec-

tion 3. Finally, the paper is brieﬂy summarized in

Section 4.

2 MSER-BASED MODELS OF

THERMAL OBJECTS

In thermal images, even objects with distinctive tem-

perature proﬁles are normally seen as diluted silhou-

ettes on images of rather poor quality. Such im-

ages can be subsequently binarized (into objects and

background) for further analysis and processing. Un-

fortunately, standard image thresholding algorithms

(e.g. (Sezgin and Sankur, 2004; Puneet and Garg,

2013)) cannot handle typical effects of thermal im-

ages (as illustrated in Fig. 1) so that accurate outlines

of the actual objects may not be reliably extracted.

Therefore, we propose an alternative approach

based on maximally stable extremum regions

Figure 1: Examples of raw-data thermal images of objects.

(MSERs). MSERs are the image fragments which are

least sensitive to the binarization threshold variations

(see (Matas et al., 2002; Nist

er and Stew

enius,

2008)). Therefore, MSER detector can identify in

infrared images even poorly contrasted fragments,

as long as these fragments have the most distinc-

tive thermal proﬁles within the processed image.

Actually, MSERs can be detected at very small com-

putational costs (including on-chip implementation

of the detector (Sluzek et al., 2019)) which makes

them particularly attractive for low-cost systems (i.e.

IoT devices).

Formally, binary MSER regions Q(t) (where t in-

dicates the threshold level) are detected as local min-

ima of the growth rate function q(t) deﬁned by the

derivative of the region’s area over the threshold val-

ues:

q(t) =

Q(t)

, (1)

where

represents the area of a region.

MSER detector has been reported in some ap-

plications of thermal images, e.g. (Lahouli et al.,

2018). Nevertheless, the most signiﬁcant advantage

of MSER in thermal images, i.e. their ability to ex-

tract not only the outlines of objects, but also dis-

tinctive internal fragments of objects or their thermal

shadows on the surrounding scenes, etc. (as illus-

trated in examples in Fig. 2) is apparently not fully

exploited yet in the available literature. Therefore, we

propose to represented thermal objects by the family

of MSERs nested within the largest MSER outlining

the object.

This largest MSER can be the actual outline of the

object shape, the outline of its thermal footprint (i.e.

incorporating the heat radiation effects on the object

neighborhood) or the outline of the largest prominent

fragment within the object (if the whole object is in-

distinguishably blended with its background). Certain

practical constraints are obviously applied, namely re-

moval of too small/large MSERs from the processed

thermal images or rejection of MSERs which cannot

(for various reasons) represent physical objects.

An example of a simple thermal object repre-

sented by two binary shapes of its MSERs is given

in Fig. 3.

The practicality of this methods has been tested in

two experiments:

• In the ﬁrst experiment, the objective is to classify

detected in thermal images objects from a collec-

tion of exemplary classes of 3D objects.

• In the second experiment, we attempt to classify

moving objects into either rigid category (bod-

ies of ﬁxed geometry) or articulated category (i.e.

MSER-based Framework for Classiﬁcation of Objects in Thermal Images

567

mechanical or biological bodies changing their

conﬁguration while in motion).

Figure 2: Examples of MSER detection in thermal images.

Figure 3: Example of a thermal object represented by two

shapes of its MSERs.

All dataset and test images for the experiments are

captured in natural indoor environments at 640 × 480

resolution, using FLIR C2 thermal camera.

2.1 Dataset for Multi-class Recognition

For the multi-class recognition experiment, a dataset

of over 700 images has been collected for 13 diver-

siﬁed objects, including 8 rigid objects (glass plant,

cup, bottle of water, bottle of juice, iron, plate, sta-

pler and kettle) and 5 fully or partially articulated

objects (natural ﬂower, pot plant, woman in abaya,

bicycle and teddybear). For rigid objects, the im-

ages were captured from sufﬁciently diversiﬁed view-

points, while for articulated objects various geometric

conﬁgurations were additionally taken into consider-

ation. Examples of the dataset images (for two differ-

ent objects) are shown in Fig. 4.

Eventually, each class C

is modeled by

a collection of all binary MSER regions

{

MSER

), ..., MSER

)

}

extracted from all

ground-truth examples of the corresponding category.

2.2 Datasets for Rigid-articulated

Categorization

In this experiment, no permanent dataset is used. In-

stead, temporary reference datasets are dynamically

updated from the most recent 5 frames

{

t, ..., t − 4

}

containing the object of interest O. Binary MSER

shapes (extracted from the object in the same way as

above) are grouped within the corresponding frames,

i.e.

ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics

568

Frame0 →

{

MSER

), ..., MSER

)

}

...

Frame4 →

{

MSER

t−4

), ..., MSER

t−4

)

}

The union of these groups of shapes is considered the

currently used representation of the analyzed object.

Figure 4: Examples of thermal dataset images for two

classes.

3 OBJECT CLASSIFICATION BY

SHAPE DESCRIPTORS

Since (as discussed in Section 2) the thermal objects

of interest (and classes of objects) are eventually rep-

resented by collections of binary shapes, only binary

shape descriptors can be used for object classiﬁca-

tion. Because regions extracted from thermal images

are generally smooth and without intricate shape de-

tails, we preliminarily selected very popular Hu mo-

ment invariants, e.g. (Sluzek, 1995; Flusser, 2000), to

represent MSER regions by 7D vectors (with some

reﬁnements as speciﬁed below):

V 7 =

{

, ..., I

}

(2)

where I

are the original invariants φ

from (Hu,

1962) normalized to have the same mean and stan-

dard deviation values. The normalization was done

on a popular benchmark dataset of binary shapes

MPEG7

, and the ﬁrst Hu invariant was used as the

reference.

It was later experimentally veriﬁed that vectors

of invariant with the dimensionality reduced (by us-

ing PCA) to 3D provide practically the same perfor-

mances, so that the regions are alternatively repre-

sented by V 3 vectors

V 3 =

{

I(PCA)

, I(PCA)

}

(3)

Now, similarity between MSER regions (which is

needed for the region-based classiﬁcation of thermal

objects) can be deﬁned as follows. Given a dataset

region R

and an object region R

, the level of simi-

larity between R

and R

is straightforwardly deﬁned

as:

Sim(R

, R

) = 1 −

V 7(R

) −V 7(R

)

V 7(R

)

(4)

Sim(R

, R

) = 1 −

V 3(R

) −V 3(R

)

V 3(R

)

(5)

where

is the vector norm.

Eventually, we consider two regions similar if

their level of similarity by Eq. 4 or Eq. 5 exceeds the

predeﬁned (established experimentally) threshold.

Two examples of diversiﬁed similarity levels be-

tween binary regions are given in Fig. 5.

http://www.dabi.temple.edu/

∼

shape/MPEG7/

MPEG7dataset.zip

MSER-based Framework for Classiﬁcation of Objects in Thermal Images

569

Figure 5: Regions with 0.94 similarity (top row) and with

0.54 similarity (bottom row).

3.1 Multi-class Recognition of Thermal

Objects

Given a thermal object O (represented by a number of

binary MSER shapes

{

MSER

(O), ..., MSER

(O)

}

)

and the class C

(modeled, as deﬁned in

Subsection 2.1, by binary MSER shapes

{

MSER

), ..., MSER

)

}

) we assume that

O is similar to C

class (i.e. it can potentially be a

member of this class) if:

1. Several shapes from

{

MSER

(O), ..., MSER

(O)

}

set are similar

to some

{

MSER

), ..., MSER

)

}

shapes.

In practice, the minimum number of similarities

may be required.

2. If O object is similar to too many classes, only the

classes with the highest numbers of inter-shape

similarities (e.g. top three) are eventually ac-

cepted. This assumption is applied in our tests.

For the actual tests, we selected only ﬁve classes

from the developed dataset, namely glass plant,

woman in abaya, bottle of juice, teddybear and

pot

plant (the remaining classes acting as confusion

data only). Fig. 6 shows the confusion matrix of

the classiﬁcation statistics. Unfortunately, we did

not ﬁnd any suitable benchmark to compare to, but

the obtained results can be approximately compared

to (Meis et al., 2003) (which is the only similar ex-

ample we found of multi-class recognition in thermal

objects) and the outcome should be considered satis-

factory.

Figure 6: The confusion matrix for 5-class test results.

3.2 Recognition of Rigid and

Articulated Thermal Objects

In the second experiment, we tested the method’s abil-

ity to distinguish between rigid and articulated ob-

jects. Objects of both categories can move (with re-

spect to he camera) but over short periods of time only

the articulated objects are expected to signiﬁcantly

change their shapes in he captured thermal images.

Thus, as explained in Subsection 2.2, the sequence of

most recent ﬁve images (frames) is used to identify

the object category. As shown in Fig. 7, we build a

matrix of inter-frame similarities, where × marker in-

dicates that similar MSER regions are found in the

corresponding pair of frames. If less than 8 entries

(i.e. 40%) are marked, the object is considered artic-

ulated, and if the number of marked entries exceeds

12 (i.e. 60%) the object is recognized as rigid. Other-

wise, no decision is made.

Figure 7: Exemplary similarity matrix between 5 subse-

quent frames (the content of this matrix represents an ar-

ticulated object).

Performances of this approach are illustrated by

the confusion matrix in Fig. 8. Though some rigid

objects are wrongly classiﬁed as articulated (as they

may move/rotate relatively to the camera, or the ther-

mal conditions of the scene change) we have not

found any case of an articulated object (performing

the actual motion) recognized as rigid. The percent-

age of unknown decisions is moderate. Altogether, we

consider these results satisfactory, at least within the

classes of tested objects.

ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics

570

Figure 8: The confusion matrix for the results of rigid-

articulated classiﬁcation.

Examples of rigid and articulated sequences are

given in Fig. 9, where only the external outline

MSERs are shown.

Figure 9: Examples of sequences showing rigid (two top

rows) and articulated (bottom rows) objects.

4 CONCLUDING REMARKS

The presented work is a preliminary feasibility study

exploring potentials of thermal imaging in more so-

phisticated applications than typical detection and

tracking tasks. In particular, we consider the prospec-

tive needs of visual surveillance and monitoring sys-

tems in environments which should remain dark. In

many such problems, the major task is not to de-

tect the presence of thermally distinctive objects, but

rather to classify them in terms of their identity and/or

behavior (e.g. to identify dangerous or critical scenar-

ios).

Our results indicate that such results are practical

(subject to well-known limitations of thermal imag-

ing) under some constraints, e.g. with rather limited

numbers of object classes, non-overlapping objects,

etc.

We can also conclude that for thermal im-

ages suitable representation of objects is, in gen-

eral applications, more critical than the speciﬁc fea-

tures/descriptors. The presented results have been ob-

tained usign (deliberately) simpliﬁed shape descrip-

tors. Because of such a simpliﬁcation, the presented

algorithms are suitable for low-cost solutions (includ-

ing IoT devices, small robotic systems, etc.).

REFERENCES

Christiansen, P., K.A. Steen, R. J., and Karstoft, H. (2014).

Automated detection and recognition of wildlife using

thermal cameras. Sensors, 14(8):13778–13793.

Fernandez-Caballero, A., Lopez, M., and Serrano-Cuerda,

J. (2014). Thermal-infrared pedestrian roi extraction

through thermal and motion information fusion. Sen-

sors, 14(4):6666–6676.

Flusser, J. (2000). On the independence of rotation moment

invariants. Pattern Recognition, 33:1405–1410.

Gade, R. and Moeslund, T. (2014). Thermal cameras and

applications: a survey. Machine Vision & Applica-

tions, 25(1):245–262.

Ginesu, G., Giusto, D., Margner, V., and Meinlschmidt, P.

(2004). Detection of foreign bodies in food by thermal

image processing. IEEE Trans. on Industrial Elec-

tronics, 51(2):480–490.

Hu, M. (1962). Visual pattern recognition by moment in-

variants. IRE Transactions on Information Theory,

8(2):179–187.

Iwasaki, Y., Misumi, M., and Nakamiya, T. (2013). Ro-

bust vehicle detection under various environmental

conditions using an infrared thermal camera and its

application to road trafﬁc ﬂow monitoring. Sensors,

13(6):7756–7773.

Lahouli, I., Haelterman, R., Chtourou, Z., Cubber, G., and

Attia, R. (2018). Pedestrian detection and tracking in

thermal images from aerial mpeg videos. In Proc.13

Int. Joint Conf. VISIGRAPP 2018, volume 5(VIS-

APP), pages 487–495.

Matas, J., Chum, O., Urban, M., and Pajdla, T. (2002). Ro-

bust wide baseline stereo from maximally stable ex-

tremal regions. In Proc. British Machine Vision Con-

ference, pages 384–393.

Meis, U., Ritter, W., and Neumann, H. (2003). Detection

and classiﬁcation of obstacles in night vision trafﬁc

MSER-based Framework for Classiﬁcation of Objects in Thermal Images

571

scenes based on infrared imagery. In Proc. 2003 IEEE

Int. Conf. on Intelligent Transportation Systems, pages

1140–1144.

Meriaudeau, F., Secades, L., Eren, G., Ercil, A., Truchetet,

F., Aubreto, O., and Foﬁ, D. (2010). 3-d scanning

of nonopaque objects by means of imaging emitted

structured infrared patterns. IEEE Trans. on Instru-

mentation and Measurement, 59(11):2898–2906.

Ng, Y.-M., Yu, M., Huang, Y., and Du, R. (2007). Diagnosis

of sheet metal stamping processes based on 3-d ther-

mal energy distribution. IEEE Trans. on Automation

Science & Eng., 4(1):22–30.

Nist

er, D. and Stew

enius, H. (2008). Linear time maximally

stable extremal regions. In Proc. 10th European Conf.

ECCV 2008, pages 183–196.

Puneet, G. and Garg, N. (2013). Binarization techniques

used for grey scale images. Int. Journal of Computer

Applications, 71(1):8–11.

Sezgin, M. and Sankur, B. (2004). Survey over im-

age thresholding techniques and quantitative perfor-

mance evaluation. Journal of Electronic Imaging,

13(1):146–168.

Sirmacek, B., Hoegner, L., and Stilla, U. (2011). Detection

of windows and doors from thermal images by group-

ing geometrical features. In Proc. 2011 Joint Urban

Remote Sensing Event.

Sluzek, A. (1995). Identifcation and inspection of 2-d ob-

jects using new moment-based shape descriptors. Pat-

tern Recognition Letters, 16(7):687–697.

Sluzek, A., Saleh, H., Mohammad, B., Al-Qutayri, M., and

Ismail, M. (2019). Mser-in-chip: An efﬁcient vision

tool for iot devices. In Elfadel, I. and M.Ismail, edi-

tors, Innovations in Intelligent Image Analysis, pages

245–259. Springer.

Vidas, S., Moghadam, P., and Bosse, M. (2013). 3d ther-

mal mapping of building interiors using an rgb-d and

thermal camera. In Proc. 2013 IEEE Robotics & Au-

tomation Conf. (ICRA), pages 2311–2318.

Wang, W., Zhang, J., and Shen, C. (2010). Improved hu-

man detection and classiﬁcation in thermal images. In

Proc. 2010 IEEE ICIP Conference, pages 2313–2316.

Zhou, D., Dillon, M., and Kwon, E. (2009). Tracking-based

deer vehicle collision detection using thermal imag-

ing. In Proc. IEEE Int. Conference on Robotics and

Biomimetics (ROBIO).

ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics

572