Robust Remote Heart Rate Determination for E-Rehabilitation

A Method that Overcomes Motion and Intensity Artefacts

Christian Wiede, Jingting Sun, Julia Richter and Gangolf Hirtz

Department of Electrical Engineering and Information Technology,

Chemnitz University of Technology, Reichenhainer Str. 70, 09126 Chemnitz, Germany

Keywords:

Remote Heart Rate Determination, rPPG, Vital Parameters, E-Rehabilitation.

Abstract:

Due to an increasing demand for post-surgical rehabilitations, the need for e-rehabilitation is continuously

rising. At this point, a continuous monitoring of vital parameters, such as the heart rate, could improve the

efﬁciency assessment of training exercises by measuring a patient’s physical condition. This study proposes

a robust method to remotely determine a person’s heart rate with an RGB camera. In this approach, we used

an individual and situation depending skin colour determination in combination with an accurate tracking.

Furthermore, our method was evaluated by means of twelve different scenarios with 117 videos. Altogether,

the results show that this method performed accurately and robustly for e-rehabilitation applications.

1 INTRODUCTION

In recent years, the number of rehabilitation as a part

of post-surgical care is continuously rising. Especi-

ally for surgeries of the musculoskeletal system, re-

habilitation is a key factor for recovering. In order to

prevent too light training or over-training a continu-

ous monitoring of the patient is necessary. One pos-

sibility to evaluate a person’s physical condition is to

measure his or her vital parameters, such as the heart

rate, the respiration rate or the oxygen saturation. In

this work, we focus on the remote determination of

the heart rate by means of an RGB camera.

This contact-less working principle has the advan-

tage that the patients are not required to wear addi-

tional devices during the training, which is inconve-

nient for the patients and, in addition to that, increases

the effort for the rehabilitation centres. Furthermore,

more signiﬁcant information about the patient’s reha-

bilitation performance can be obtained. For example,

a sudden change of a patient’s physical condition can

be detected by monitoring the heart rate. In that case,

the training can be stopped and medical personnel can

be informed. Afterwards the training intensity can be

adapted.

However, e-rehabilitation is not the only applica-

tion ﬁeld of remote heart rate determination. In the

ﬁeld of ambient assisted living (AAL) such a remote

heart rate determination could contribute to a long-

term observation of the health status and assure a fast

response time in cases of emergencies. Furthermore,

such as system can also be applied for monitoring a

driver’s well-being in the context of autonomous dri-

ving and take control in emergency cases, e. g. a heart

attack.

For remote heart rate determination there exist

two general principles, which are intensity based met-

hods, such as proposed by Poh et al. (Poh et al.,

2010), and motion-based methods proposed by Ba-

lakrishnan (Balakrishnan et al., 2013). There are en-

hanced approaches as well, which combine the advan-

tages of both principles (Wiede et al., 2016b). Howe-

ver, these methods encounter problems with motion

and intensity artefacts, which poses challenges with

regard to the application in e-rehabilitation: When a

person moves during an exercise, the determined he-

art rate will be less accurate due to motion artefacts.

Similarly, intensity artefacts, such as reﬂections and

shadows, reduce the accuracy as well. In order to

overcome these issues, we propose a robust, remote

heart rate determination algorithm with an accurate

pixel tracking and a situation and person dependent

skin colour model. A database with reference data

was recorded for the evaluation of this method.

This work is structured as follows: In Sect. 2, the

related work in the ﬁeld of remote heart rate determi-

nation is outlined and the research gap is highlighted.

Based on this, our new method, which overcomes in-

tensity and motion artefacts, is presented in Sect. 3.

This is followed by the experimental results with a

Wiede, C., Sun, J., Richter, J. and Hirtz, G.

Robust Remote Heart Rate Determination for E-Rehabilitation - A Method that Overcomes Motion and Intensity Artefacts.

DOI: 10.5220/0006537504910500

In Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2018) - Volume 4: VISAPP, pages

491-500

ISBN: 978-989-758-290-5

491

variety of evaluated scenarios in Sect. 4, which is ac-

companied by a discussion. Finally, we summarise

our ﬁndings and outline future work.

2 RELATED WORK

The development of e-rehabilitation systems is conti-

nuously rising because of a higher demand and a lack

of personnel resources. With the release of the Mi-

crosoft Kinect, cost-effective depth sensors became

affordable and e-rehabilitation applications that em-

ploy the Kinect made its breakthrough. In the last

years, several Kinect-based e-rehabilitation systems

were developed, such as proposed by Su et al. (Su

et al., 2014) or Gal et al. (Gal et al., 2015). Howe-

ver, to date there is no study that evaluates a patient’s

performance during exercises based on remotely de-

termined vital parameters.

There are four main vital parameters, i. e. heart

rate, respiration rate, oxygen saturation and blood

pressure. In this study, we focus on remote heart

rate determination by means of optical sensors using

principles of photoplethysmography (PPG). In clini-

cal environments, the heart rate is normally obtai-

ned by electrocardiography (ECG) or pulse oxime-

ters. The basics of PPG were ﬁrst described by Hertz-

man and Spealman (Hertzman and Spealman, 1937).

They measured the volumetric changes of the blood

ﬂow with an optical sensor. The light that transmits

through thin body parts, such as ﬁngers or earlobes, is

received by an optical sensor (Allen, 2007). This met-

hod is called transmissive PPG. Next to transmissive

PPG, there exists the reﬂective PPG as well, which

measures the light reﬂected from a tissue. Due to

the reﬂection, the signal-to-noise ratio (SNR) for this

method is decreased by a factor of ten compared to

the transmissive PPG. Still, for both of these met-

hods, sensors have to be attached to the body. In

order to overcome this issue, Humphreys et al. de-

veloped a ﬁrst concept for remote photoplethysmo-

graphy (rPPG) (Humphreys et al., 2005). This was

followed by ﬁrst experiments in the infrared spectrum

(Garbey et al., 2007) and the visible light spectrum

(Verkruysse et al., 2008).

In 2007, Verkruysse et al. recorded probands a

with small distance to an RGB camera. These pro-

bands were instructed not to move during the recor-

dings in order to avoid motion artefacts. They de-

tected a region of interest (ROI) within a face, per-

fomed a spatial averaging of the colour channels and

determined the heart rate with the Fast Fourier Trans-

form (FFT). This method was followed by the ﬁrst au-

tomated approach by Poh et al. (Poh et al., 2010; Poh

et al., 2011). They used an automated face detection

and an independent component analysis (ICA). In or-

der to increase the speed, Lewandoska et al. (Lewan-

dowska et al., 2011) suggested to use a principal com-

ponent analysis (PCA) instead of an ICA. Further

works proposed to improve these methods by using

temporal ﬁlters (van Gastel et al., 2014), autoregres-

sive models (Tarassenko et al., 2014) or an adaptive

ﬁltering (Wiede et al., 2016a). All these approaches

belong to the group of methods called intensity-based

methods.

A different group of approaches are the so-called

motion-based methods, which were ﬁrst proposed by

Balakrishnan et al. (Balakrishnan et al., 2013). They

made use of small head motions caused by the he-

art bump triggered blood ﬂow. By using several dis-

tinctive feature points in the person’s face, small head

motions can be tracked over time with a Kanade-

Lucas-Tomasi (KLT) point tracker. After that, a PCA

determined the principal components of the trajecto-

ries of the points. At last, the heart rate was obtained

by using a peak detection.

As outlined by Wiede et al. (Wiede et al., 2016b),

intensity- and motion-based methods have different

advantages and disadvantages. Intensity-based met-

hods are less sensitive to motion artefacts, whereas the

motion-based methods suffer from fast motions. This

is because the motion artefacts and the heart bump in-

duced motion signal share the same frequency bands.

In contrast to that, motion-based methods are less

prone to illumination artefacts, such as reﬂections and

shadows. The ratio-based method exploits these facts

by using an intensity-based method when less inten-

sity artefacts occur and a motion-based method when

less motion artefacts are present (Wiede et al., 2016b).

Consequently, the ratio-based method can not com-

pletely eliminate such artefacts, because it only choo-

ses the method with the smallest amount of artefacts.

Thus, the main problems originate from the under-

lying sources of artefacts. If these sources can be re-

duced or eliminated, the accuracy will increase signi-

ﬁcantly. For that, we propose an intensity-based met-

hod, which can overcome the motion artefacts by an

accurate tracking and which signiﬁcantly reduces in-

tensity artefacts with a skin colour model.

3 METHODS

3.1 Overview

The major steps for the proposed robust remote he-

art rate determination are shown in Figure 1. After

acquiring an RGB image, white balancing was app-

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

492

Skin Colour Determination

(1) Fetch first video frame

(2) White balance parameter estimation

(3) Face detection and alignment

(4) ROIs selection

0 100 200 300

5000

10000

(a) face Hue

020406080100

5000

10000

(b) face Saturation

face

0 100 200 300

200

400

020406080100

200

400

(d) midface Saturation

midface

0 100 200 300

500

1000

(e) forehead Hue

020406080100

500

1000

(f) forehead Saturation

forehead

0 100 200 300

500

1000

(g) cheek Hue

020406080100

500

1000

(h) cheek Saturation

cheek left

0 100 200 300

1000

2000

(i) combined Hue

020406080100

500

1000

(j) combined Saturation

cheek right

(5) Skin colour estimation

Tracking and Skin Pixel Selection

(6) Input video frames

(7) Auto white balancing and smoothing

(8) Face tracking

(9) Skin pixels selection

(10) Signal extraction

0102030405060708090 100

time

100

110

magnitude

Time Signal Processing

(12) Coarse bandpass filtering

(13) Independent component analysis

(14) Signal separation & Adaptive heart rate extraction

(15) Output signal

0102030405060708090 100

time

-3

-2

-1

magnitude

nomarlized R

nomarlized G

nomarlized B

0102030405060708090 100

time

-1

-0.5

0.5

1.5

magnitude

BP R

BP G

BP B

012345

Frequency

0.2

0.4

0.6

PSD

PSD of ICA Component 1

020406080 100

Time

-4

-2

Magnitude

ICA Component 1

012345

Frequency

0.5

PSD

PSD of ICA Component 2

020406080 100

Time

-5

Magnitude

ICA Component 2

012345

Frequency

PSD

PSD of ICA Component 3

020406080 100

Time

-4

-2

Magnitude

ICA Component 3

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Frequency

PSD

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Frequency

PSD

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Frequency

PSD

0 100 200 300 400 500 600 700 800 900 1000

Frame

100

105

110

115

120

125

130

HR in BPM

estimation,mean:102.5036, rmse:1.9489

reference,mean:104.87

(11) Signal normalization

Figure 1: Overview of the proposed remote heart rate determination algorithm.

lied to obtain real world colours. In the ﬁrst frame of

the video, a face detection and alignment was carried

out. Based on this, different ROI within the face were

sampled and used to determine a proband’s individual

skin colour model. In the subsequent frames, we ap-

plied an auto white balancing, a face tracking, a skin

pixel selection and a time signal extraction. This time

signal was normalised and bandpass ﬁltered. An ICA

determined its independent components and the heart

rate was obtained by means of a frequency analysis.

An adaptive ﬁltering assured a stable heart rate over

time.

3.2 Skin Colour Determination

Due to the fact that different persons have different

skin colours and the lighting conditions depend on the

location, an individual skin colour model is necessary.

For that, the ﬁrst frame in the video was analysed and

the parameters for the skin colour model were deter-

mined.

In a ﬁrst step, a white balancing was applied to ad-

just the colours of the images by scaling and shifting

the intensities in such a way that real white surfaces

are ﬁnally represented by equally distributed RGB va-

lues. With that preprocessing, a bluish white or a yel-

lowish white, for example, can be corrected. We im-

Robust Remote Heart Rate Determination for E-Rehabilitation - A Method that Overcomes Motion and Intensity Artefacts

493

plemented a fast auto white balancing algorithm pro-

posed by Garud et al. (Garud et al., 2014), which is

based on the source illuminant values [ι

,ι

]. The

correlated colour temperature (CCT) is given and the

gain factors κ and the offset values τ can be determi-

ned. The gain factors are deﬁned as follows:

, (1a)

= 1 , (1b)

, (1c)

where κ

, κ

and κ

are the gain factors for the red,

the green and the blue colour channel respectively.

The offset values are calculated as:

= max(1,

CCT − CCT

ref

100

) · (κ

− 1) , (2a)

= 0 , (2b)

= max(1,

CCT

ref

− CCT

100

) · (κ

− 1) , (2c)

where CCT

ref

denotes the CCT of the canonical illu-

minant.

With these factors, the white balanced colour

channels R

, G

and B

can be determined by the

following equation:













0 0

0 κ

0 0 κ





















(3)

R, G and B are the original intensity values.

In the next step, the person’s face was detected in

the image. A common approach for this is the Viola

and Jones face detector (Viola and Jones, 2004). Ho-

wever, this approach is not accurate enough for this

application so that the face detector by Zhu and Ra-

manan (Zhu and Ramanan, 2012) was used instead.

This detector provides 68 facial landmarks in real-

world cluttered images. The provided bounding box

is very robustly located around the face. However, we

had to adjust the bounding box for our requirements

to include the forehead region and to exclude the neck

region. For that purpose, the bounding box was en-

larged at the left and the right boundary by 10 %, at

the upper boundary by 30 % and reduced at the lower

boundary by 10 %.

For the skin colour model, there are regions in the

face that are certainly skin pixels and not covered by

hair or other interfering objects. Under the condition

that the face was frontally captured, the regions of the

forehead, the two cheeks and the nose were selected

by their relative positions with regard to the total face

bounding box. One selection of these ROIs is shown

in Figure 2. These four ROIs were taken for the fol-

lowing skin colour estimation.

Figure 2: ROIs of the face regions selected for the skin co-

lour model, i. e. forehead, nose and the two cheeks.

The RGB colour space is not suited for determi-

ning a skin colour model, because the distribution of

the skin pixels does not follow any linear or concen-

trated coherency. Therefore, a conversion to a diffe-

rent colour space that separates brightness and chro-

minance is necessary. The HSV colour space contai-

ning the hue H, the saturation S and the value V is

convenient for this task. In accordance with the con-

version rules from Smith (Smith, 1978), the bright-

ness value V can be calculated by:

V = max(R, G, B) . (4)

The auxiliary variable C, which stands for the chroma

value, can be determined as follows:

C = V − min(R, G, B) . (5)

With these values the saturation S can be calculated

by:

S =



0, if V = 0 ,

, otherwise .

(6)

The hue H is given by:

H =











undeﬁned, if C = 0 ,

◦

· (

G−B

), if V = R ,

◦

· (

B−R

+ 2), if V = G ,

◦

· (

R−G

+ 4), if V = B .

(7)

The HSV colour space represents a cylindrical co-

lour space. For the further consideration of the skin

pixels, the hue-saturation-plane is relevant. In order to

deﬁne a region in this plane, which represents the skin

colour of a certain person, thresholds for the hue and

the saturation have to be determined. At this point,

an adaption has to be made for the hue: Red is the

dominant colour of the face. Since the hue values for

the red pixels are in a range around zero, the hue H

was shifted by 120 degrees, as shown in the following

equation:

∗



H + 240

◦

, if H ≤ 120

◦

H − 120

◦

, otherwise.

(8)

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

494

As shown in Figure 3, the values for hue and sa-

turation of the previously deﬁned ROIs are located in

the same area.

Hue in °

50 100 150 200

250

300

350

Saturation in

100

Forehead

Nose region

Cheeks

Face rest

Image rest

Figure 3: Illustration of the ROI pixels in the shifted hue-

saturation-plane in comparison to other pixels in the image

and the face.

According to the overall human skin model, with red

as the dominant color, the threshold values for the

shifted hue H

∗

and the saturation S are as follows:

∗

∈ [186

◦

, 294

◦

]

S ∈ [20 %, 100 %]

(9)

In our work, however, we applied different thres-

holds depending on the person’s speciﬁc skin colour

and the lighting conditions. In Figure 4, the selected

skin pixels that were chosen according to the adapted

thresholds deﬁned in Equation 10 are shown.

∗

∈ [252

◦

, 259

◦

]

S ∈ [45 %, 56 %]

(10)

It can be seen that regions of the eyes, hairs, lips,

glasses, nostrils, shadows and reﬂections do not be-

long the skin colour model.

Figure 4: Skin pixels that were selected with speciﬁc hue

and saturation thresholds within the face bounding box. Un-

selected pixels are masked with black.

3.3 Tracking and Skin Pixel Selection

Once the skin colour model was determined based on

the ﬁrst frame, a continuous tracking of the face boun-

ding box and a skin pixel selection were conducted

during the following frame sequence. The tracking is

necessary to be invariant against different motion ar-

tefacts. Our tracking method is based on the optical

ﬂow principle. The optical ﬂow method estimates the

motion between two consecutive frames at the time

t and t + ∆t. This results in the general optical ﬂow

equation:

+ I

= −I

, (11)

where I

, I

and I

are the partial derivatives of the

image at the position (x, y) on time t, V

are the x

and y components of the velocity or the optical ﬂow

of I(x, y, t). This equation contains two unknowns and

cannot be solved directly. A solution for this is the

KLT tracking algorithm (Tomasi and Kanade, 1991).

It followed the assumption that the motion is constant

in a local neighbourhood of an image patch. For n

different patterns in the image, we obtain n equations:

+ I

= −I

) ,

+ I

= −I

) ,

+ I

= −I

) ,

(12)

where p

, p

, . . . , p

are the pixels inside the image

patch. These equations can be written in matrix form

Av = b, where:

A =







) I

)

) I

)

) I

)







, (13a)

v =





, (13b)

b =







−I

)

−I

)

−I

)







. (13c)

That equation system can be solved by the least squa-

res principle:

Av = A

b. (14)

As a feature, the minimum Eigenvalue features

proposed by Shi and Tomasi (Shi and Tomasi, 1993)

were selected, because they came up with a large ro-

bustness. However, because of projective distortions

in the image region of the face, feature points can

vanish over time. A solution is to re-detect a sub-

ject’s face. For that, we used the normalised pixel

difference (NPD) face detector proposed by Liao et

al. (Liao et al., 2016). The face detector learns NPD

features by a classiﬁer with a quadratic tree structure

Robust Remote Heart Rate Determination for E-Rehabilitation - A Method that Overcomes Motion and Intensity Artefacts

495

with a depth of eight. Should the NPD detector fail to

detect a subject’s face, the KLT tracker is able to track

the pixel for a while.

The 2-D geometric transform from one frame to

the next frame can be estimated by using the trac-

ked pixels and can be applied in the same manner to

the face bounding box to follow the head motion. By

combining the NPD face detector and 2-D geometric

transform estimation, the subject’s face region can be

accurately tracked even in case of complex head mo-

tions.

Assuming that the lighting conditions do not

change completely from the ﬁrst frame on, the skin

colour model can be applied for the total frame se-

quence. In the tracked face bounding box, all pixels

that match the thresholds of the skin colour model

were selected. In order to improve the reliability of

the skin pixel selection, a distance threshold D was

deﬁned. For every skin pixel, the distance to the clo-

sest non-skin pixel was calculated. If this distance

was smaller than D, this skin pixel was rejected. This

procedure is equivalent to an erosion.

For the time signal extraction, all remaining skin

pixels were taken into consideration. They were

averaged for each frame for all three colour channels

R, G and B. Please note that we operate in the dis-

crete time domain and use n instead of the continuous

variable t.

R(n) =

∑

l=1

(n) (15a)

G(n) =

∑

l=1

(n) (15b)

B(n) =

∑

l=1

(n) (15c)

, G

and B

denote the l

selected skin pixel in

the frame and L is the number of all selected skin

pixels in this frame.

R(n),

G(n) and

B(n) represent

the mean value of the facial skin colour for a certain

frame n. As a result, we obtained a time varying sig-

nal for the skin colour.

3.4 Time Signal Processing

In order to remove remaining noise sources, the co-

lour time varying signal has to be further processed

to increase the SNR and to obtain a robust heart rate

signal.

The ﬁrst step of the time signal processing was

to normalise the signal to attain a zero mean and a

standard deviation of one:

R(n) =

(

R(n) − µ

) , (16a)

G(n) =

(

G(n) − µ

) , (16b)

B(n) =

(

B(n) − µ

) , (16c)

where

G and

B refer to the normalised colour

channels. µ

is the mean value and σ

is the stan-

dard deviation of the corresponding colour channel

C ∈ {R, G, B}:

∑

n=1

C(n) , (17)

∑

n=1

(

C(n) − µ

)

, (18)

where

C(n) represents the original colour channels

and N is the sequence length of the colour signal for

a single channel.

This was followed by a bandpass ﬁlter BP, which

excludes implausible frequencies, see Equation 19.

The frequencies lower than 0.7 Hz and higher than

4 Hz were cut off. For this implementation, an FIR

ﬁlter with an order of 128 was chosen to ensure a con-

stant group delay. The ﬁltered colur channels are then

denoted as R

, G

and B

(n) = BP(n) ∗

R(n) (19a)

(n) = BP(n) ∗

G(n) (19b)

(n) = BP(n) ∗

B(n) (19c)

Even now, the three ﬁltered colour channels can

still contain noise sources. In order to separate the

wanted pulse signal from the noise sources, a decom-

position of the colour channels by an ICA was app-

lied. The goal is to determine three new independent

components IC

, IC

and IC





(n)









1,1

1,2

1,3

2,1

2,2

2,3

3,1

3,2

3,3









(n)





. (20)

In our implementation, we used the FastICA approach

of Hyvrinen (Hyv

arinen, 1999).

In the next step, the independent component that

contains the wanted signal should be selected for the

further processing. We assume that the independent

component with the highest periodicity p is most li-

kely the one that contains the pulse signal. The peri-

odicity p of a signal is deﬁned as as the ratio between

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

496

the accumulated coefﬁcients in a range of 0.05 Hz

around the dominant frequency f

and the accumula-

ted coefﬁcients of the total power spectrum, see Equa-

tion 21. f

is the sampling frequency.

p =

∑

+0.025

−0.025

(k)

∑

(k)

(21)

In order to calculate p, the spectrum of each inde-

pendent component has to be obtained. One possibi-

lity to do this is by the Welch’s estimate of the power

spectrum density (PSD)

(k):

(k) =

∑

n=1

(k). (22)

Thereby,

(k) denotes the periodogram and k is

the discrete iterator in the frequency domain instead

of the continuous variable f . The periodogram uses a

hamming window for each segment.

After having selecting the best independent com-

ponent IC

, this component was split into segments of

10 s with an overlap of 90 % of the segments. This

small segment size guarantees a ﬂexibility when the

heart rate changes rapidly, for example during a trai-

ning exercise. The dominant frequency f

FFT

for each

segment k was determined by calculating the FFT for

this segment and by determining the maximum in the

spectrum:

FFT

(k) = max(

FFT(IC

)

). (23)

In presence of strong motion artefacts, other high

peaks can appear in the spectrum. They can be mis-

interpreted as the real heart rate signal. In order to

avoid this, an adaptive ﬁltering is introduced. We

assume that the heart rate does not change by more

than 15 BPM (0.25 Hz) between two adjacent fra-

mes. The mean value of the estimated heart rates in

the two previous segments was deﬁned as the guide

frequency f

gui

. As shown in Figure 5, only the part

of the spectrum for which applies f

gui

± 0.25 Hz was

taken into consideration for the ﬁnal heart rate HR. To

obtain the ﬁnal heart rate in beats per minute (BPM),

the frequency f

FFT

has to be multiplied by 60:

HR(k) = f

FFT

(k) · 60 [BPM]. (24)

4 RESULTS AND DISCUSSION

4.1 Setting

As a basis for our evaluation, we created a database

of eleven probands with 117 different videos in to-

tal. The probands are of different gender, age and

0 1 2 3 4

Frequency in Hz

Spectrum of IC

gui

± 0.25 Hz

ref

gui

Figure 5: For the adaptive ﬁltering, only frequency peaks

that are in the range of f

gui

± 0.25 Hz were considered. f

gui

represents the guide frequency, HR the selected heart rate

and HR

ref

the corresponding reference heart rate.

skin colour to guarantee a high variability. In total,

twelve different scenarios were considered: Starting

from a control scenario without any noise sources, we

recorded scenarios with illumination artefacts caused

by a lighting source placed above the face or placed

at one side of the face, which results in different kinds

of shadows. Moreover, the probands had to perform

different motions to obtain scenarios with translations

and rotations of the head (pitch, yaw and roll), scaling

as well as non-rigid movements to represent motion

artefacts. Furthermore, we combined motion artefacts

and intensity artefacts in one scenario. In order to ge-

nerate videos with a varying heart rate, which is na-

tural in the context of rehabilitation exercises, videos

after sport and during cycling exercises were made.

For all recordings, an industrial camera, i. e. an Al-

lied Manta G201c, was chosen. The automatic expo-

sure time control and the automatic white balancing

were disabled in order not to inﬂuence the measure-

ments. The video sequences had a length of 1,000 fra-

mes and were recorded with a ﬁxed frame rate of

10 FPS.

A Polar FT1 heart rate monitor was used as a re-

ference system. This system measures the heart rate

by means of a chest strap and displays it. This display

was visible in all recorded videos, so that a reference

value for the heart rate could be obtained for every

frame.

4.2 Accuracy

The evaluation criterion that we have chosen for

the accuracy analysis is the root-mean-square error

(RMSE) for a sequence m, see Equation 25.

RMSE

∑

n=1

HR(n) − HR

ref

(n)

(25)

Robust Remote Heart Rate Determination for E-Rehabilitation - A Method that Overcomes Motion and Intensity Artefacts

497

In this equation, HR is the estimated heart rate and

ref

the reference heart rate. For the single scena-

rios, we calculated the mean value RMSE of all se-

quences M.

RMSE =

∑

m=1

RMSE

(26)

Every video consist of 91 segments, which results in

10,647 evaluated segments in total for 117 videos.

This outlines the extent of the data base and its sta-

tistical relevance.

In Table 1, the results for the single scenarios are

presented. As expected, the control scenario without

any challenges shows the best RMSE with 1.19 BPM.

Since the error of the reference system can be quan-

tiﬁed with ±1 BPM, this result proves to be of high

quality.

The scenarios with illumination artefacts show

shadows and reﬂection. This causes the RMSE to in-

crease up to 1.38 BPM for the side illumination and

1.48 BPM for the upper illumination, which is still

accurate. Solely occuring illumination artefacts do

not show a large impact on the proposed algorithm.

While the determined heart rate for translation can

be rated as accurate as well, the error is increasing for

the scaling and rotation scenarios. This can be ex-

plained by a more challenging tracking and therefore

larger changes in the size of the bounding box. The

non-rigid movements show the largest RMSE for the

motion scenarios with 2.46 BPM. This is logical: due

to the change of the shape of the face, which is a re-

sult of speaking and facial expressions, the size and

the location of the bounding box is inﬂuenced.

When motion and illumination artefacts are com-

bined in one scenario, the RMSE increases up to

2.92 BPM. The scenarios after the sport and during

the cycling showed an increased RMSE of 1.53 BPM

and 2.11 BPM. Especially the heart rate determination

during the cycling is very challenging because of its

periodically motions. However, all scenarios showed

an RMSE below 3 BPM, which seems to be accurate

for the use case e-rehabilitation.

4.3 Robustness

For the evaluation, not only mean values of a com-

plete sequence, such as the RMSE, are relevant. It is

also of high importance that the differences between

the estimated heart rate and the reference heart rate

are not too high for single segments. This criterion

is referred to as robustness. In Figure 6, for example,

the reference heart rate and the estimated heart rate

are shown in one plot for a video after the sport. It

can be seen that the estimated heart rate is very close

Table 1: RMSE for all scenarios in BPM.

Evaluated Scenario RMSE

Control 1.19

Upper illumination 1.49

Side illumination 1.38

Translation 1.36

Yaw 1.70

Pitch 1.93

Roll 1.86

Scaling 1.81

Non-rigid motion 2.46

Motion and illumination 2.93

After sport 1.53

During cycling 2.11

to the reference heart rate for the majority of the seg-

ments. For some segments, however, this difference

is slightly higher.

0 200 400

600

800 1,000

100

110

120

130

Frame number n

HR in BPM

ref

Figure 6: Comparison of the computed heart rate (blue dots)

and the reference heart rate (black curve) after a sport exer-

cise.

In order to perform a more detailed analysis of

the single differences, the amount of segments Φ that

have a difference d below a certain value is plotted

over the difference using all scenarios, as shown in

Figure 7. The difference d is calculated as follows:

d(φ) = HR(φ) − HR

ref

(φ). (27)

phi denotes the segment number.

In Figure 7, it can be seen that 98.3 % of the seg-

ments in the control sequences have a difference be-

low 4 BPM, for example. For the upper illumina-

tion 97 % and the side illumination 97.5 % of the

segments have a difference smaller than 4 BPM. For

the rigid motion 93.3 % and for the non-rigid moti-

ons 88.8 % of the segments show a maximum diffe-

rence of 4 BPM. In the case where strong motions

and intensity artefacts occur together, this rate drops

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

498

0 1 2 3 4

5 6

7 8 9 10

100

Difference d between estimated HR and reference HR in BMP

Amount of segments Φ in %

Control

Scaling

Yaw

Non-rigid motion

Roll

Translation

Pitch

Upper illumination

Side illumination

Motion and illumination

Figure 7: The overall results are visualised. The y-axis indicates how many percent of the measured data points show a better

performance than certain difference in BPM.

to 80.9 %. If all scenarios are considered, we determi-

ned that 90 % of the segments have a difference that

is smaller than 4 BPM. This robustness is regarded

as sufﬁcient for the ﬁeld of e-rehabilitation. In this

application, it is not of high importance whether the

heart rate at a certain time is exactly 120 BPM or 122

BPM, for example. The detection of relative changes

or the velocity of heart rate changes within or after an

exercise is more important.

5 CONCLUSIONS

In this study, we presented a new method for remote

heart rate determination, which is robust against in-

tensity and motion artefacts. This method consist of

an accurate tracking and an individual, situation de-

pending skin colour determination. That is accompa-

nied by a bandpass ﬁltering, an ICA and a frequency

determination.

For the evaluation, the accuracy was calculated by

means of a reference system. With an RMSE below

3 BPM, this method provides a good basis for an ap-

plication in e-rehabilitation. Even in the scenarios du-

ring sport activities, this method demonstrated robus-

tness.

In future, we plan to evaluate this method in a ﬁeld

study in rehabilitation facilities. Furthermore, we in-

tend to extent the algorithms for the use of thermal

cameras. Finally, it is planned to evaluate this method

in other application ﬁelds, such as AAL or driver’s

monitoring.

ACKNOWLEDGEMENTS

This project is funded by the European Social Fund

(ESF). We thank all volunteers who took part in the

recordings.

REFERENCES

Allen, J. (2007). Photoplethysmography and its application

in clinical physiological measurement. Physiological

Measurement, 28(3):R1–R39.

Balakrishnan, G., Durand, F., and Guttag, J. (2013). De-

tecting Pulse from Head Motions in Video. In Com-

puter Vision and Pattern Recognition (CVPR), 2013

IEEE Conference on, pages 3430–3437.

Gal, N., Andrei, D., Neme, D. I., Ndan, E., and Stoicu-

Tivadar, V. (2015). A Kinect based intelligent e-

rehabilitation system in physical therapy. Digital He-

althcare Empowering Europeans, pages 489–493.

Garbey, M., Sun, N., Merla, A., and Pavlidis, I. (2007).

Contact-Free Measurement of Cardiac Pulse Based on

the Analysis of Thermal Imagery. Biomedical Engi-

neering, IEEE Transactions on, 54(8):1418–1426.

Garud, H., Ray, A. K., Mahadevappa, M., Chatterjee, J., and

Mandal, S. (2014). A fast auto white balance scheme

for digital pathology. In 2014 IEEE-EMBS Internati-

Robust Remote Heart Rate Determination for E-Rehabilitation - A Method that Overcomes Motion and Intensity Artefacts

499

onal Conference on Biomedical and Health Informa-

tics, BHI 2014, pages 153–156.

Hertzman, A. B. and Spealman, C. R. (1937). Observations

on the ﬁnger volume pulse recorded photoelectrically.

American Journal of Physiology, 119:334–335.

Humphreys, K., Markham, C., and Ward, T. (2005). A

CMOS camera-based system for clinical photoplet-

hysmographic applications. In Proceedings of SPIE,

volume 5823, pages 88–95.

Hyv

arinen, A. (1999). Fast and robust ﬁxed-point algo-

rithms for independent component analysis. Neural

Networks, IEEE Transactions on, 10(3):626–634.

Lewandowska, M., Ruminski, J., Kocejko, T., and Nowak,

J. (2011). Measuring pulse rate with a webcam -

a non-contact method for evaluating cardiac activity.

In Computer Science and Information Systems (Fe-

dCSIS), 2011 Federated Conference on, pages 405–

410.

Liao, S., Jain, A. K., and Li, S. Z. (2016). A Fast and Accu-

rate Unconstrained Face Detector. IEEE Transacti-

ons on Pattern Analysis and Machine Intelligence,

38(2):211–223.

Poh, M.-Z., McDuff, D., and Picard, R. (2010). Non-

contact, automated cardiac pulse measurements using

video imaging and blind source separation. Optics Ex-

press, 18(10):10762–10774.

Poh, M.-Z., McDuff, D., and Picard, R. (2011). Advan-

cements in Noncontact, Multiparameter Physiological

Measurements Using a Webcam. Biomedical Engi-

neering, IEEE Transactions on, 58(1):7–11.

Shi, J. and Tomasi, C. (1993). Good Features to Track.

Technical report, Cornell University, Ithaca, NY,

USA.

Smith, A. R. (1978). Color gamut transform pairs. ACM

SIGGRAPH Computer Graphics, 12(3):12–19.

Su, C.-J., Chiang, C.-Y., and Huang, J.-Y. (2014). Kinect-

enabled home-based rehabilitation system using Dy-

namic Time Warping and fuzzy logic. Applied Soft

Computing, 22:652–666.

Tarassenko, L., Villarroel, M., Guazzi, A., Jorge, J., Clif-

ton, D. A., and Pugh, C. (2014). Non-contact video-

based vital sign monitoring using ambient light and

auto-regressive models. Physiological Measurement,

35(5):807–831.

Tomasi, C. and Kanade, T. (1991). Detection and Tracking

of Point Features. Technical report, Carnegie Mellon

University.

van Gastel, M., Zinger, S., Kemps, H., and de With, P.

(2014). e-health video system for performance ana-

lysis in heart revalidation cycling. In Consumer Elec-

tronics Berlin (ICCE-Berlin), 2014 IEEE Fourth In-

ternational Conference on, pages 31–35.

Verkruysse, W., Svaasand, L. O., and Nelson, J. S. (2008).

Remote plethysmographic imaging using ambient

light. Optics Express, 16(26):21434–21445.

Viola, P. and Jones, M. J. (2004). Robust real-time face

detection. International Journal of Computer Vision,

57(2):137–154.

Wiede, C., Richter, J., Apitzsch, A., KhairAldin, F., and

Hirtz, G. (2016a). Remote Heart Rate Determination

in RGB Data. In Proceedings of the 5th International

Conference on Pattern Recognition Applications and

Methods, pages 240–246, Rome.

Wiede, C., Richter, J., and Hirtz, G. (2016b). Signal fusion

based on intensity and motion variations for remote

heart rate determination. In 2016 IEEE International

Conference on Imaging Systems and Techniques (IST),

pages 526–531.

Zhu, X. and Ramanan, D. (2012). Face detection, pose esti-

mation, and landmark localization in the wild. In Pro-

ceedings of the IEEE Computer Society Conference

on Computer Vision and Pattern Recognition, pages

2879–2886.

VISAPP 2018 - International Conference on Computer Vision Theory and Applications

500