Robust Facial Landmark Detection and Face Tracking in Thermal

Infrared Images using Active Appearance Models

Marcin Kopaczka, Kemal Acar and Dorit Merhof

Institute of Imaging and Computer Vision, RWTH Aachen University, Templergraben 55, Aachen, Germany

Keywords:

Thermal Infrared, Face Tracking, Facial Landmark Detection, Active Appearance Model.

Abstract:

Long wave infrared (LWIR) imaging is an imaging modality currently gaining increasing attention. Facial

images acquired with LWIR sensors can be used for illumination invariant person recognition and the contact-

less extraction of vital signs such as respiratory rate. In order to work properly, these applications require a

precise detection of faces and regions of interest such as eyes or nose. Most current facial landmark detec-

tors in the LWIR spectrum localize single salient facial regions by thresholding. These approaches are not

robust against out-of-plane rotation and occlusion. To address this problem, we therefore introduce a LWIR

face tracking method based on an active appearance model (AAM). The model is trained with a manually

annotated database of thermal face images. Additionally, we evaluate the effect of different methods for AAM

generation and image preprocessing on the ﬁtting performance. The method is evaluated on a set of still im-

ages and a video sequence. Results show that AAMs are a robust method for the detection and tracking of

facial landmarks in the LWIR spectrum.

1 INTRODUCTION

Algorithms for the analysis of face images are a key

research area in computer vision. A large number of

methods for detection, tracking, recognition and ex-

pression analysis of faces have been published in the

last years. While most of the methods introduced

in this ﬁeld are aiming at regular photographs and

videos, it is known that several frequency ranges out-

side the visual spectrum allow interesting applications

that cannot be realized using visible light. Long wave

infrared (LWIR) imaging is one of the domains that

have gained increased attention in recent years. This

subband is referred to as thermal infrared as the hu-

man body emits most of its heat in this range of the

electromagnetic spectrum. This allows LWIR sensors

to work independently from lighting conditions and

to operate even in complete darkness. Besides of hav-

ing the advantage of being invariant to illumination,

LWIR sensors also allow the extraction of informa-

tion from an image that is not easily detectable by

sensors in the visible domain. In face images they re-

veal information on the subcutaneous vascular struc-

ture (Zhu et al., 2008) or vital signs such as respi-

ratory rate (Lewis et al., 2011) and heart rate (Gault

and Farag, 2013). These key properties form the basis

for two major applications of LWIR imaging: Bio-

metric face recognition for person identiﬁcation and

the extraction of temperature signals for medical pur-

poses and affection state analysis. A number of pub-

lications has addressed both topics in recent years,

an overview of each area can be found in the recent

surveys by (Ghiass et al., 2014) and (Lahiri et al.,

2012). However, many of the authors of studies that

included analysis of face images have acquired their

data under strongly controlled conditions that restrict

head movement. The main reason for such controlled

environments is the lack of established and robust

face tracking methods in the thermal infrared. There-

fore, recorded persons are required to minimize head

movement in order to allow undisturbed data extrac-

tion from deﬁned regions of interest (ROIs). The lack

of tracking solutions can be attributed to the fact that

the appearance of faces in the LWIR spectrum differs

strongly from their appearance in the visual domain.

LWIR images generally have lower contrast and do

not reproduce any skin texture, so that many well-

established tracking algorithms developed for the vi-

sual domain do not perform well when applied di-

rectly to LWIR data. Therefore, head movement is

often restricted from the beginning. Methods that at-

tempt face tracking in LWIR data are currently lim-

ited to the tracking of exclusive salient regions such

as the nose or the inner corners of the eyes, which are

150

Kopaczka, M., Acar, K. and Merhof, D.

Robust Facial Landmark Detection and Face Tracking in Thermal Infrared Images using Active Appearance Models.

DOI: 10.5220/0005716801500158

In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016) - Volume 4: VISAPP, pages 150-158

ISBN: 978-989-758-175-5

usually both easy to ﬁnd due to their temperature sig-

nature but at the same time allow only limited robust-

ness towards partial occlusion and out-of-plane rota-

tion. To the best of our knowledge, no author has ever

proposed a holistic approach to track whole faces and

a complete set of facial landmarks in the thermal in-

frared.

Real-world scenarios require robust methods for

precise facial landmark detection. Several authors

have mentioned that advanced tracking methods

would increase the range of possible applications of

their LWIR image processing algorithms. Further-

more, it is known from research in the visual domain

that holistic approaches offer wider support and are

therefore more robust to unexpected pose changes,

fast head movement and partial occlusion than single-

ROI trackers. In this work, we therefore introduce

a LWIR face tracker based on an active appearance

model (AAM). We describe the face database used

for tracker training as well as several approaches to

improve ﬁtting precision in low-contrast LWIR data

by applying contrast-enhancing preprocessing to the

images. We extensively evaluate the ﬁtting perfor-

mance of the AAM using different state-of-the-art ﬁt-

ting algorithms together with recent improvements in

feature-based AAM training. To prove the versatility

of our approach, we additionally show that our AAM

can be used to track a previously unseen face in the

LWIR in a video sequence with large head movement.

2 PREVIOUS WORK

Basic methods for face tracking in the LWIR domain

are face detection and segmentation algorithms such

as (Filipe and Alexandre, 2013). They often rely

on the fact that LWIR sensors measure heat radia-

tion and therefore faces are often easy to locate using

thresholding and basic morphological operators. De-

spite their low computational complexity, these algo-

rithms often perform well for face segmentation tasks

in the thermal domain. However, they are not able to

perform precise landmark detection on facial regions

such as eyes or mouth. More advanced approaches

employ a basic segmentation and add a feature de-

tection step to the process. Current algorithms often

perform landmark detection by locating temperature

maxima which can usually be found in the inner cor-

ner of the eyes (Alkali et al., 2014). These methods

generally assume a frontal view of the face and their

performance degrades quickly when confronted with

out-of-plane rotation.

Another group of current trackers suitable for

landmark tracking in the thermal infrared are single-

ROI trackers, either general-purpose algorithms such

as TLD (Kalal et al., 2012) or complex approaches

developed especially for thermal IR tracking, for ex-

ample (Zhou et al., 2013). While showing good per-

formance in scenarios with little movement, the lim-

ited support area of single-ROI trackers is a down-

side that leads to poor tolerance in case of ROI oc-

clusion or fast movement. Only very little research

has been published on the use of multi-point track-

ers that could counter these downsides. (Dowdall

et al., 2007) demonstrated the use of a coalitional

multi-point tracker to track faces in the LWIR domain.

(Ghiass et al., 2013) was the ﬁrst and up to now only

person to use AAM in the thermal infrared domain,

however for face recognition and not for tracking pur-

poses. Instead, the research presented there was fo-

cused on algorithms that allow increasing image con-

trast and extract person-speciﬁc biometric informa-

tion from the data. The AAM was trained for recog-

nition tasks on single images, therefore it was not in-

vestigated if it could be used for robust face track-

ing. Furthermore, the proposed methods are focused

on the identiﬁcation of known persons and therefore

contain no information on the model’s ability to gen-

eralize towards unseen faces. To account for these is-

sues, we show in our work that our AAM is well able

to track faces in a largely unconstrained setting and

that the model can robustly adapt to unseen individu-

als, both of which have not been addressed in existing

publications on thermal AAM so far.

One of the current research areas for thermal

imaging is the extraction of biosignals from face im-

ages. Several methods for measuring respiratory ac-

tion (Lewis et al., 2011), blood vessel location (Zhu

et al., 2008) and cardiac pulse (Gault and Farag, 2013)

have been published in recent years. A common ap-

proach to measure the respiratory rate is temperature

monitoring of the nostril area as respiration-induced

temperature changes can be measured in this region

with great reliability. As in the general case, current

approaches here also either rely on single-ROI track-

ers or on an acquisition protocol that prohibits head

movement.

In the following, we will introduce and evaluate

methods for training AAMs in the thermal infrared

with the focus on robust facial landmark detection

and face tracking in unconstrained video sequences.

In our work we introduce novel image preprocess-

ing approaches and also evaluate the applicability of

state-of-the art ﬁtting methods and recent advances in

feature-based AAM representations to LWIR images.

We will show that our approach is able to reliably de-

tect and track facial ROIs even in challenging scenar-

ios including fast movement and signiﬁcant out-of-

Robust Facial Landmark Detection and Face Tracking in Thermal Infrared Images using Active Appearance Models

151

plane rotation. The method’s ability to robustly gen-

eralize towards unseen faces will be demonstrated as

well. To the best of our knowledge, our work demon-

strates for the ﬁrst time that AAMs are a viable so-

lution for robust LWIR face tracking and at the same

time it is the ﬁrst solution that allows precise detec-

tion of facial landmarks at such a wide range of real-

istic and arbitrary head poses in thermal infrared face

images.

3 METHODS

Generative methods such as AAM require an exten-

sive set of training images to allow modeling of un-

seen faces. In this section, we will therefore ﬁrst

describe our steps to create a thermal face database

that can be used to train facial landmark detectors.

We then describe how we used the database and ap-

plied dense image features combined with contrast-

enhancing preprocessing to train an AAM-based fa-

cial landmark detection system for the thermal in-

frared domain.

3.1 Thermal Face Database

We have created a face database with LWIR video se-

quences of currently 31 (25 male, 6 female) subjects.

The persons were asked to perform a set of deﬁned

and arbitrary actions that cover a large range of poses

and facial expressions. All videos were taken with

a microbolometer-based LWIR camera with a rela-

tive thermal resolution of 0.03K and acquired at the

sensor’s native spatial resolution of 1024x768 pixels.

Each subject was recorded and had to pose for at least

two recordings of 40 seconds. During the ﬁrst record-

ing the participants followed a deﬁned head move-

ment to cover a wide range of head poses as shown

in Figure 1. In the second recording, the volunteers

were asked to perform arbitrary head movement and

facial expressions (Figure 2). From these videos we

extracted a total of 695 frames. In a next step, all

selected frames were manually annotated with a 68-

point template to precisely indicate the position of fa-

cial regions such as mouth, eyes and nose.

3.2 Image Preprocessing

It has been suggested by (Ghiass et al., 2013) that

contrast-enhancing preprocessing of LWIR images

could have a positive impact on the ﬁtting perfor-

mance of an AAM. The method proposed there

was based on smoothing the input image with an

Figure 1: Posed head poses example.

Figure 2: Spontaneus head pose and facial expression ex-

amples.

anisotropic diffusion ﬁlter and subtracting the diffu-

sion result from the original, thereby enhancing the

edges and high-frequency components. We extend

the suggested ideas by implementing and evaluating a

group of sharpening ﬁlters based on the unsharp mask

concept.

3.2.1 Unsharp Mask

In unsharp masking (USM), a high-pass ﬁlter is im-

plemented by smoothing the image I(x, y) with a low-

pass ﬁlter G(x, y) and subsequently subtracting the ﬁl-

ter result from the original image, leaving only the

image’s high-frequency components:

f iltered

(x, y) = I(x, y) − I(x, y) ∗ G(x, y) (1)

Subsequently, the ﬁnal sharpened image I

is obtained

by adding the ﬁltered image I

with with a weight

factor k to the original image I:

(x, y) = I(x, y) + k I

(x, y) (2)

The lowpass is commonly implemented using a Gaus-

sian kernel:

G(x, y) =

2πσ

−

2σ

(3)

Since the general concept of unsharp masking is not

restricted to Gaussian kernels, we introduce two addi-

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

152

tional kernels based on anisotropic diffusion and bi-

lateral ﬁltering. The kernels are applied to the image

and the results are fed into the USM algorithm ac-

cording to Equations 1 and 2.

3.2.2 USM with Anisotropic Diffusion

Anisotropic diffusion ﬁlters offer edge-preserving im-

age smoothing by blurring the image along edges. A

commonly applied anisotropic diffusion ﬁlter is de-

ﬁned by

∂I

∂t

= div(g (k OI k)OI), (4)

with

g(k OI k) = e

−



kOIk



, (5)

where K is a parameter controlling the sensitivity to

edges in the image. The smoothed result is then sub-

tracted from the original image as in Equation 1.

3.2.3 USM with Bilateral Filtering

Just as anisotropic difusion, the bilateral ﬁlter is an-

other class of edge-preserving smoothing operation.

In contrast to regular (unilateral) Gaussian blurring,

the smoothing term of a bilateral ﬁlter depends not

only on the spatial pixel distance, but also on the in-

tensity difference between pixels. This means that the

appearance of the ﬁlter kernel depends on the local

image content. The ﬁlter to compute the new image

intensity I

at pixel coordinate x from the original im-

age I is deﬁned as

(x) =

∑

∈Ω

I(x

) G

(k I(x

) − I(x) k) G

(k x

− x k)

∑

∈Ω

(k I(x

) − I(x) k) G

(k x

− x k)

(6)

where Ω is the kernel window centered in x and G

, G

are Gaussian ﬁlters that are applied in the range and

position domain. Again, we apply unsharp masking

by performing the smoothing operation and subtract-

ing the result from the original input image.

3.3 Active Appearance Models (AAM)

Active Appearance Models were originally intro-

duced by (Cootes et al., 2001) and substantially ex-

tended in the work by (Matthews and Baker, 2004)

with the introduction of the inverse compositional

(IC) ﬁtting algorithm. AAMs are a state-of-the-art

method for landmark detection and mostly used to

model faces in photographs or anatomical structures

in medical images. An AAM is a generative method

to model an instance of an object which makes it pos-

sible to detect object landmarks and at the same time

acquire information on the properties of the modeled

object.

AAMs are trained with a manually annotated

database. For facial landmark detection, the database

contains images of persons with added landmarks for

facial regions such as eyes, nose or mouth. In the

training stage, the images are normalized using Pro-

crustes analysis and the key components of shape and

appearance variation are extracted independently us-

ing principal component analysis (PCA). The train-

ing results in mean vectors for shape and appear-

ance as well as vectors describing different orders of

deviation from the mean shape and appearance re-

spectively, described by the eigenvectors gained from

PCA. Using the training data, new faces can be mod-

eled as a linear combination of the mean shape and

appearance combined with weighted shape and ap-

pearance vectors.

3.3.1 AAM Fitting

AAM Fitting is the iterative process of adapting the

model parameters in order to minimize the difference

between the modeled and the target face, a task for

which several methods have been proposed since the

introduction of AAMs. Despite being proposed by

(Matthews and Baker, 2004) over a decade ago, the

Inverse-Compositional (IC) algorithm is still a com-

petitive method and the standard in many current ap-

plications. In their work describing the algorithm,

Matthews and Baker have shown that major parts of

the iterative computation can be moved outside the

loop, allowing a drastic increase in computation speed

without sacriﬁcing ﬁtting precision. We use the IC

algorithm as baseline and compare its ﬁtting perfor-

mance on LWIR face images to two more recent ex-

tensions, namely simultaneous inverse compositional

(SIC) as suggested by (Gross et al., 2005) and the

alternating inverse compositional (AIC) ﬁtting intro-

duced in (Papandreou and Maragos, 2008).

3.3.2 Feature-based AAM

Traditionally, AAM ﬁtting is performed on the unpro-

cessed input images. However, with the introduction

of feature descriptors such as SIFT and HOG as a

powerful tool in image processing, several success-

ful attempts to combine AAMs with descriptors have

been published. When working with a feature-based

AAM, the model is not directly trained on the input

images, but instead on sets of extracted feature matri-

ces with densely extracted features. It has been shown

in (Antonakos et al., 2015) that using features for im-

age description can signiﬁcantly increase the accu-

racy of AAMs in the visual domain. To analyze the

Robust Facial Landmark Detection and Face Tracking in Thermal Infrared Images using Active Appearance Models

153

suitability of feature-based AAMs for face ﬁtting in

LWIR images we have therefore compared the per-

formance of regular models with those trained using

dense SIFT and HOG features.

3.4 Proposed Processing Pipeline

We have combined the described processing steps into

an image processing pipeline that allows to evaluate

the performance of different algorithms for each step.

In our implemented pipeline, the database images are

ﬁrst ﬁltered using one of the preprocessing ﬁlters to

enhance image contrast. In the next step, AAMs are

trained using the database images and the manually

annotated landmark positions. The tool chain allows

training of traditional and feature-based AAMs using

dense HOG and SIFT features utilizing functionality

provided by the Menpo software package (Alabort-I-

medina et al., 2014). Fitting of the AAM to test im-

ages can be performed using any of the different IC-

based ﬁtting algorithms described above. The starting

position for the ﬁtting process can be acquired from

a user-selectable bounding box or by placing an ini-

tial shape on the bounding box of the image’s ground

truth landmarks, provided a ground truth is available.

In case a ground truth exists, the software makes it

possible to compare the results of the ﬁtting process to

the ground truth reference visually and quantitatively.

Figure 3: First and third image: Initial landmark loca-

tion. Second and last image: Landmarks after ﬁtting with a

DSIFT-AAM and SIC without preprocessing.

4 RESULTS

In this section, we display quantitative results of the

ﬁtting performance for single frames of both seen and

unseen individuals and an analysis of the different

method’s abilities to track a face in an unconstrained

video session. Finally, vital sign extraction from a

moving person’s face is demonstrated.

We have exhaustively analyzed all possible com-

binations (45 in total) of the following algorithms:

• Preprocessing: No preprocessing, anisotropic

diffusion highpass as in (Ghiass et al., 2013),

USM with anisotropic diffusion, USM with bilat-

eral ﬁltering and traditional USM with a Gaussian

kernel.

• AAM:Traditional Intensity-based AAM, AAM

with dense HOG features, AAM with dense SIFT

features.

• Fitting Algorithm: Project-out inverse compo-

sitional (PIC), alternating inverse compositional

(AIC), simultaneous inverse compositional (SIC).

In a ﬁrst step, we mirrored each image in the database

to increase pose variation, resulting in a total of

1390 images. To evaluate the AAMs performance to

model unseen faces, we have then split our annotated

database into 1272 images of 28 persons for training

and 118 images of 3 persons for testing. Fitting was

initialized using the AAM’s mean shape, scaled and

translated to ﬁt the bounding box of the ground truth.

We performed ﬁtting and compared the ﬁtting result

to the ground truth reference (Figure 3). To quantify

the ﬁtting error, we used the normalized error metric

introduced in (Zhu and Ramanan, 2012) which mini-

mizes the effect of face size and head pose on the ﬁnal

result, thereby allowing an efﬁcient comparison of er-

rors across different image sets and databases. The

error metric E

computed for each image I

is the root

mean squared distance in pixels between each hori-

zontal and vertical ﬁtted landmark position x

n, f

, y

n, f

after AAM ﬁtting and its corresponding ground truth

landmark x

n,g

, y

n,g

, accumulated across all N land-

marks in the image and normalized by the mean of

face width w

and height h

= N

∑

n=1



n, f

− x

n,g

)

+ (y

n, f

− y

n,g

)



(7)

with

( w

+ h

)

(8)

Figure 4 shows a quantitative analysis of the per-

formance of each combination. It can be seen that

the introduced preprocessing algorithms result in an

improvement of the AAM ﬁtting performance for

intensity-based AAM, especially when an advanced

ﬁtting algorithm such as AIC or SIC is used. Gener-

ally, AIC and SIC show comparable performance the

tested data and outperform the traditional PIC in all

direct comparisons, i.e. when preprocessing method

and used feature remain unchanged and only the ﬁt-

ting algorithm is varied. At the same time, the two

tested feature-based AAM combinations clearly out-

perform their intensity-based counterparts in all di-

rect comparisons, with DSIFT slightly outperform-

ing HOG in terms of ﬁtting performance and outlier

count. Notably, DSIFT and HOG performance is only

minimally affected by any preprocessing.

Since it has now been shown that preprocess-

ing has only minimal impact on ﬁtting performance

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

154

(a) Simlutaneous Inverse Compositional (SIC)

(b) Alternating Inverse Compositional (AIC)

Figure 4: Normalized residual error after performing ﬁtting on a set of 118 unseen images of 3 persons. In each column,

the AAM was trained using a different feature extraction method, while the ﬁtting algorithm was different for each row. The

ﬁgures represent an exhaustive overview of all tested combinations of features, ﬁtting algorithms and preprocessing methods.

of feature-based AAMs, further analysis focuses on

the precision of feature- and intensity-based meth-

ods on unﬁltered images. Figure 6 displays the

percentage of test images that meet given preci-

Robust Facial Landmark Detection and Face Tracking in Thermal Infrared Images using Active Appearance Models

155

sion requirements, evaluated for PIC/SIC/AIC and

intensity/DSIFT/HOG-based AAMs. Again, it is

shown that feature-based AAMs allow more precise

ﬁtting than intensity-based models. Overall, DSIFT

shows a better convergence behavior than HOG for

LWIR images; the performance difference between

both descriptors is higher than the difference for reg-

ular photographs reported in (Antonakos et al., 2015),

indicating that DSIFT is a particularly well suited de-

scriptor for LWIR face ﬁtting. AIC and SIC perform

comparably well and outperform the traditional PIC

method; especially the original inverse-compositional

AAM approch (intensity-based PIC) by (Matthews

and Baker, 2004) shows lower ﬁtting precision than

all other competing combinations.

Figure 5 shows a performance comparison for

seen and unseen faces. It displays box plots of the

ﬁtting performance of the AAM trained on 28 per-

sons and evaluated on 118 images of 3 unseen per-

sons, and for comparison plots of each ﬁtting algo-

rithms’ performance trained with the same database

and evaluated on a set of 125 seen images from the

training database. For this comparison we used the

well-performing combination of DSIFT with no pre-

processing and compared all three implemented ﬁt-

ting algorithms. The results show that the trained

model performs slightly better when confronted with

seen faces.

AAM performance in face tracking was tested

quantitatively on a 60 second video sequence taken

at 30 fps, where each 5th frame was annotated with

a total of 8 points located at the inner and outer

eye corners, the outer mouth corners and the cen-

ter edges of upper and lower lip, resulting in 359

annotated frames. During the sequence, the person

was performing increasingly fast and complex head

pose changes, starting with a slow controlled left-to-

right movement and ending with fast and arbitrary

head shakes and rotations. The tracker was initial-

ized manually by deﬁning the face’s bounding box in

the ﬁrst frame. The landmark detection for all sub-

sequent frames was performed fully automatically by

the AAM ﬁtting software using each frame’s ﬁnal ﬁt-

ting result as initial landmark positions for the next

frame. The normalized error between the 8 ground

truth points and the corresponding points detected by

the AAM was computed using Equation 7. Figure 7

shows the normalized error values as well as the mean

frame-to-frame change of the marker point positions

acquired as normalized error between two consecu-

tive frames to indicate the current movement speed of

the person’s head in each frame. Additionally, actual

ﬁtting results are shown. It can be seen that tracking

precision changes only marginally during slow head

Figure 5: Comparison of the normalized ﬁtting error for

a set of 3 seen and 3 unseen images, tested for different

ﬁtting algorithms and using DSIFT with no preprocessing

to generate the model.

movement regardless of head pose. The tracker per-

formance temporarily degrades in single parts of the

sequence that show very fast head movement, how-

ever it can be seen that the model is able to auto-

matically recover from high misalignment in single

frames.

5 DISCUSSION

It can be seen that performing diffusion ﬁltering as

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

156

(a) DSIFT (b) HOG

Figure 6: Percentage of Images that fall within given precision requirements, computed for images with no preprocessing

applied.

Figure 7: Analysis of AAM performance for tracking a face in an unseen video sequence. Top: Normalized error between

ground truth and ﬁtting result for each annotated frame. Center: Normalized difference between two consecutive frames to

indicate head movement speed in different sections of the video sequence. Bottom: Fitting result examples, sorted by their

ﬁnal normalized error in ascending order. From left to right: Best, 2

percentile, 25

percentile, median, 75

percentile,

percentile, worst.

preprocessing step improves the ﬁtting accuracy of

the AAM in case the project-out inverse composi-

tional algorithm (PIC) is used as stated by (Ghiass

et al., 2013). However, by employing the more re-

cently introduced feature-based AAM, even better re-

sults are obtained. The fact that the ﬁnal ﬁtting errors

Robust Facial Landmark Detection and Face Tracking in Thermal Infrared Images using Active Appearance Models

157

of intensity-based AAM cover a large span of val-

ues and that similar USM-based approaches lead to

signiﬁcantly different results suggests that intensity-

based AAM and the analyzed preprocessing ﬁlters

lack robustness and are prone to the bias introduced

by initialization and preprocessing parameters. On

the contrary, the two analyzed feature-based AAM

proved to be more robust. Using DSIFT and HOG

to train the model drastically improves ﬁtting perfor-

mance regardless of preprocessing; the fact that the

results do not differ signiﬁcantly for all preprocess-

ing algorithms shows that the extracted features de-

scribe image content very robustly. The results sug-

gest that using preprocessing for feature-based AAM

does not result in a signiﬁcant performance increase

and that the preprocessing step can be omitted, espe-

cially when considering the additional computational

requirements to run the preprocessing ﬁlter.

Although quantitative analysis has shown a mea-

surable difference in ﬁtting performance on seen and

unseen images the ability of the AAM to model un-

trained faces still allows for precise landmark detec-

tion in unseen images. The model has been shown

to be robust enough to track an unseen face during a

series of challenging head pose changes in a video se-

quence with the ability to recover even after phases of

fast head movement or extreme out-of-plane rotation.

6 CONCLUSION

In this paper we have shown that AAMs are a vi-

able approach for face tracking in the thermal in-

frared domain. Using a suitable database and a

well-performing combination of algorithms compris-

ing DSIFT for modeling and SIC for ﬁtting yields sta-

ble and robust results. It has been shown that AAMs

can be used for robust single-frame initialized LWIR

face tracking.

REFERENCES

Alabort-I-medina, J., Antonakos, E., Booth, J., Snape, P.,

and Zafeiriou, S. (2014). Menpo: A comprehensive

platform for parametric image alignment and visual

deformable models. In ACM International Confer-

ence on Multimedia, MM ’14, pages 679–682, Or-

lando, Florida, USA. ACM.

Alkali, A. H., Saatchi, R., Elphick, H., and Burke, D.

(2014). Eyes’ corners detection in infrared images for

real-time noncontact respiration rate monitoring. In

WCCAIS 2014, pages 1–5. IEEE.

Antonakos, E., i medina, J. A., Tzimiropoulos, G., and

Zafeiriou, S. (2015). Feature-based lucas-kanade and

active appearance models. IEEE Transactions on Im-

age Processing, 24(9):2617–2632.

Cootes, T. F., Edwards, G. J., and Taylor, C. J. (2001). Ac-

tive appearance models. IEEE PAMI, 23(6):681–685.

Dowdall, J., Pavlidis, I. T., and Tsiamyrtzis, P. (2007).

Coalitional tracking. Comput. Vis. Image Underst.,

106(2-3):205–219.

Filipe, S. and Alexandre, L. A. (2013). Thermal infrared

face segmentation: A new pose invariant method. In

Pattern Recognition and Image Analysis, pages 632–

639. Springer.

Gault, T. R. and Farag, A. A. (2013). A fully automatic

method to extract the heart rate from thermal video.

In CVPRW 2013, pages 336–341. IEEE.

Ghiass, R. S., Arandjelovi

c, O., Bendada, A., and

Maldague, X. (2014). Infrared face recogni-

tion: A comprehensive review of methodologies and

databases. Pattern Recognition, 47(9):2807–2824.

Ghiass, R. S., Arandjelovic, O., Bendada, H., and

Maldague, X. (2013). Vesselness features and the in-

verse compositional aam for robust face recognition

using thermal ir. arXiv preprint arXiv:1306.1609.

Gross, R., Matthews, I., and Baker, S. (2005). Generic vs.

person speciﬁc active appearance models. Image and

Vision Computing, 23(12):1080–1093.

Kalal, Z., Mikolajczyk, K., and Matas, J. (2012). Tracking-

learning-detection. PAMI, 34(7):1409–1422.

Lahiri, B., Bagavathiappan, S., Jayakumar, T., and Philip,

J. (2012). Medical applications of infrared thermog-

raphy: a review. Infrared Physics & Technology,

55(4):221–235.

Lewis, G. F., Gatto, R. G., and Porges, S. W. (2011). A

novel method for extracting respiration rate and rel-

ative tidal volume from infrared thermography. Psy-

chophysiology, 48(7):877–887.

Matthews, I. and Baker, S. (2004). Active appearance mod-

els revisited. IJCV, 60(2):135–164.

Papandreou, G. and Maragos, P. (2008). Adaptive and con-

strained algorithms for inverse compositional active

appearance model ﬁtting. In CVPR 2008, pages 1–8.

IEEE.

Zhou, Y., Tsiamyrtzis, P., Lindner, P., Timofeyev, I., and

Pavlidis, I. (2013). Spatiotemporal smoothing as a ba-

sis for facial tissue tracking in thermal imaging. IEEE

Trans. Biomed. Engineering, 60(5):1280–1289.

Zhu, X. and Ramanan, D. (2012). Face detection, pose es-

timation, and landmark localization in the wild. In

CVPR 2012, pages 2879–2886. IEEE.

Zhu, Z., Tsiamyrtzis, P., and Pavlidis, I. (2008). The seg-

mentation of the supraorbital vessels in thermal im-

agery. In AVSS 2008, pages 237–244.

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

158