the skin pixels by their relative importance. In doing
so, we explicitly favor areas of the face where there is
more information.
In this paper, we propose a model-based ROI seg-
mentation that explicitly favors the most important fa-
cial regions. The model describing the spatial distri-
bution of rPPG information on the face was trained
with ten videos. This model is then used to weight
the pixels during spatial averaging. This approach
has been validated using our in-house publicly availa-
ble video dataset (Bobbia et al., 2017), called UBFC-
RPPG specifically geared towards research on rPPG
techniques. We have shown that this modification in
how the spatial averaging of the ROI pixels is calcu-
lated can significantly increase the final performance
of heart rate estimate compared with other state of the
art methods such as face detection, skin classification
and landmarks detection.
In section 2, the state of the art methods are briefly
introduced. The proposed ROI segmentation method
is explained in details in section 3. The experiment is
described in section 4. The conclusion is presented in
section 5.
2 STATE OF THE ART OF ROI
SEGMENTATION METHODS
Most ROI segmentation techniques are based on the
result of classical face detection and tracking algo-
rithms. The ROI is then possibly refined with skin
pixel classification or more precise ROI definition ba-
sed on a set of landmarks. In this section we present
several state of the art ROI segmentation techniques.
Since all of the video dataset are the recordings
of heads and faces, the most straightforward method
to detect the ROI is to use face detector and tracker
(later called face). For this algorithm, the classical
Viola-Jones face detector (Viola and Jones, 2001) and
Kanade-Lucas-Tomasi tracking (Lucas et al., ) algo-
rithm (cf. Fig 1(a)) can be used for implementation.
Since the rPPG information is only present on skin
pixels, skin/non-skin classification (later called skin)
is a popular improvement over the classical face de-
tection and tracking. For instance, some researchers
(Macwan et al., 2017) used this algorithm (Conaire
et al., 2007) in their rPPG research work. The skin
detection algorithm is achieved by thresholding of a
non-parametric histogram which is trained by manu-
ally classified skin/non-skin pixels. The significant
advantage of this algorithm is that it works very fast
since it is based on a Look-Up-Table (LUT). Fig. 1(b)
is an example of this method.
The ROI detection can also be segmented by de-
fining a facial contour with a set of landmarks (la-
ter called landmarks). For this algorithm the method
proposed by Kazemi can be used for implementation
(Kazemi and Sullivan, 2014). One example is shown
in Fig. 1(c).
The rPPG signal is not distributed homogeneously
on skin. Some skin regions contain more rPPG signal
than others. For example, it has been shown that SNR
of rPPG signals extracted from forehead or cheekbo-
nes are significantly higher than other face regions.
This assertion has already been used by different ROI
segmentation techniques. For example, in some work
only the cheeks and forehead were selected (Lewan-
dowska et al., 2011). In a previous study, ROI seg-
mentation, based on temporal superpixels, implicitly
favors regions of interest where the pulse trace is more
prominent (Bobbia et al., 2017). However, this data-
driven method is very sensitive to motion and errors
in superpixel tracking induce incorrect segmentation.
3 MODEL BASED ROI
SEGMENTATION
In this paper, we propose an effective technique for
explicitly favoring certain areas of the face during the
spatial averaging step of RGB pixels. The model that
encapsulates the spatial distribution of rPPG informa-
tion was trained using an in-house database of 10 vi-
deos recorded under very favorable conditions. For
this experiment, we used a EO-23121C camera re-
cording 1024 × 768 uncompressed images at 30 fps.
The average length of each video is about one minute.
Subjects sat on an chair with back support. To make
sure that the face is fixed in a specified position, we
used a shelf and asked the volunteers to put the heads
onto the shelf. Fig. 2 shows two sample images from
the dataset.
Then, the face sequence is aligned based on the lo-
cation of the eyes. The video frames were filtered by a
25×25 averaging filter to decrease quantization noise.
The rPPG signal is extracted using the chrominance-
based method (De Haan and Jeanne, 2013). This met-
hod is very fast and the computational complexity
is very low. It linearly combined the RGB channels
by projecting them onto two orthogonal chrominance
vectors:
X(t) = 3y
R
(t) − 2y
G
(t),
Y (t) = 1.5y
R
(t) + y
G
(t) − 1.5y
B
(t).
(1)
Where y
c
(t) is the RGB signal after filtering, c ∈
{R,G, B} are the color channels, and X and Y are
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
384