Improving the Accuracy of Face Detection for Damaged Video and
Distant Targets
Jun-Horng Chen
Department of Communication Engineering, Oriental Institute of Technology, New Taiepi City, Taiwan
Keywords:
Error Concealment, Face Detection, Super-resolution.
Abstract:
This work aims at improving the accuracy of face detection in two scenarios, when the video quality is de-
teriorated by the transmission link and when the target is far away from the camera. In block based coding,
the packet loss inevitably makes the corrupted face image lacks some blocks. This work proposes the sparse
modeling error concealment can coarsely recover the lost blocks, the fine texture can be obtained by dimin-
ishing the edge discontinuity, and a satisfied result for face detection can thus be recovered. Furthermore, this
work utilizes the relationship learning super-resolution method to enhance the resolution in the case of face
image taken from a long distance. Experimental results demonstrate that the proposed approach can effectively
increase the accuracy of face detection for severely degraded and low resolution face images.
1 INTRODUCTION
As the continuous growth of ubiquitously installed
cameras, the applications of computer vision tech-
niques are rapidly developed. Over the past decades,
face recognition has become one of the most popu-
lar biometric applications. The widespread surveil-
lance systems encourage the development and estab-
lishment of face recognition in public area. Gener-
ally, face recognition systems are composed of two
stages: detection stage and recognition stage, and are
analyzed separately (Marciniak et al., 2013). That is,
if the face can not be detected at the first stage, system
with high accuracy of recognition will not function
expectedly.
However, in some surveillance systems, the video
signal is fed into the recognition system via trans-
mission link. Therefore, the image quality is in-
evitably degraded by imperfect transmission, and the
degraded face video definitely diminishes the accu-
racy of recognition. Generally in video communica-
tion, the error concealment technique which recovers
the corrupted Macroblocks(MB) at the decoder site is
proposed for maintenance of the visual quality. The
sparse modeling error concealment (Lakshman et al.,
2010) has been proven to be an effective way to en-
hance the visual quality. Accordingly, this work will
utilize sparse modeling error concealment to recover
the corrupted face images so that the face detection
accuracy can thus be improved.
Furthermore, the impressive performance of face
recognition system is usually measured in controlled
conditions, such as ambient illumination, pose, res-
olution, etc. For example, in FRVT 2006 (Phillips
et al., 2007) , the interpupillary distance (IPD) of
some experiments can be as high as 400 pixels. It
is the main reason for some deployments (Bonner,
2001)(Dempsey and Forst, 2010) did not meet the
required accuracy. As for some successful deploy-
ments, the subject’s cooperation and the controlled
conditions are required and expected. Since the super-
resolution (SR) process is proposed to enhance reso-
lution image from one or multiple low resolution im-
ages, this work will utilize an effective SR approach
to estimate a high resolution image from a very low
resolution image which is taken by a camera located
at a long distance away from target.
2 SPARSE MODELING ERROR
CONCEALMENT
The sparse modeling error concealment technique
which recovers the corrupted or lost blocks at the
decoder site is proposed for maintenance of the im-
age visual quality in imperfect transmission link. In
contrast to the traditional error resilience techniques
e.g. FEC and ARQ, the error concealment is ex-
pected to diminish the channeleffect without the over-
351
Chen J..
Improving the Accuracy of Face Detection for Damaged Video and Distant Targets.
DOI: 10.5220/0005161603510355
In Proceedings of the International Conference on Neural Computation Theory and Applications (NCTA-2014), pages 351-355
ISBN: 978-989-758-054-3
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
head bandwidth. The authors of (Kaup et al., 2005)
(Lakshman et al., 2010) used sparse modeling to ex-
trapolated corrupted image data. By referring to the
available neighbor data, the recovered data is a lin-
ear combination of a set of basis functions. The cri-
terion of MMSE (minimizing mean-squared-error) of
the available image data is used to determine the co-
efficient of each basis iteratively. Let x R
N
be
an interested region, which contains a known part
x
a
and an unknown part x
b
, x can be represented
as a linear combination of a linearly independent set
Φ = { u
1
, u
2
, ··· , u
N
}. That is, in the n-th iteration,
the approximated ˜x
(n)
will be given by
˜x
(n)
=
u
k
Φ
(n)
c
k
u
k
. (1)
Accordingly, in the (n + 1)-th iteration, Φ
(n+1)
=
Φ
(n)
S
n
u
(n+1)
o
, where u
(n+1)
is the new chosen ba-
sis, and
˜x
(n+1)
= ˜x
(n)
+ c
n+1
u
(n+1)
, (2)
where c
n+1
is determined by minimizing the error,
E
(n+1)
= k˜x
(n+1)
a
x
a
k
2
. (3)
In (Chen, 2011), c
n+1
can be determined by
c
n+1
=
h
˜x
(n+1)
a
x
a
i
t
· u
(n+1)
a
h
u
(n+1)
a
i
t
· u
(n+1)
a
, (4)
and u
(n+1)
can be determined by
u
(n+1)
= argmax
u
k
Φ
h
E
(n+1)
E
(n)
i
. (5)
(a) Corrupted Image. (b) Recovered Image.
Figure 1: (a) The face in corrupted image can not be de-
tected. (b) The face in sparse modeling recovered image
can be detected.
Figure 1 shows the sparse modeling error conceal-
ment can improve the detection accuracy when im-
ages are corrupted by transmission link. The exam-
ple image is drawn from the MUCT face database
(Milborrow et al., 2010). The corrupted image as
shown in Fig. 1(a) is the simulation result of the im-
age coded in H.264/AVC with Flexible Macroblock
Ordering(FMO) error resilience technique suffers
packet loss during transmission. The face detection is
conducted by the oft-used Haar Cascade Classifier.
Although the pixels inside the corrupted image
block can be estimated by sparse modeling, it is no-
ticed that the edge across the boundary may not be
continued in Fig. 1(b). This is because the MMSE
approach solves Eq. (4) and (5) without considera-
tion of edge continuity. It is proven (Chen, 2011) that
the edge can be well extended across the boundary by
minimizing a pre-defined cost function of discontinu-
ity.
In (Chen, 2011), the parametric cost function was
defined by the absolute magnitude difference of the
gradient across the block boundary in four directions,
and represented by the product of a sparse matrix D
and a target region which is represented by a column
vector. That is,
J(c) = k D· ˆxk
2
, (6)
where
ˆ
x is the column vector form of estimated image
vector by sparse modeling error concealment. The co-
efficient vector c is the projection vector of
ˆ
x on a set
of linearly independent basis vectors Φ, that is,
ˆ
x =
N
i=1
c
i
u
i
= Φ· c . (7)
In this work, the 2D-DCT kernel functions are used
as the basis functions. Then, by the steepest descent
approach iteratively, the coefficient vector c is moved
towards
ˆ
c = c + δ∆c such that the cost function J in
Eq. (6) has a maximum reduction. That is, the moving
vector c is
c = J(c) , (8)
and can be analytically determined by (Chen, 2011)
c = 2(D·
N
i=1
c
i
u
i
)
t
· D· Φ . (9)
3 RELATIONSHIP LEARNING
BASED SUPER-RESOLUTION
In surveillance systems, the face image might be im-
poverished because the object is located at a large
distance away from the camera. Generally, the oft-
used face detection approach can not detect such low
resolution face image whose resolution is lower than
20 × 20. The super-resolution based image inpaint-
ing (Meur and Guillemot, 2012) has been proven to
NCTA2014-InternationalConferenceonNeuralComputationTheoryandApplications
352
(a) LR Image. (b) HR Image.
Figure 2: (a) The LR face image with size of 9× 9. (b) The
estimated HR image with size of 50× 50 .
be an effective way to estimate the missing region
in an image. This work propose that the face super-
resolution methods can be utilized to enhance the res-
olution of the images, and hopefully the detection ac-
curacy can be accordingly improved for long distant
targets. In (Wilman and Yuen, 2010), the authors im-
prove the existing learning-based super-resolution ap-
proachesby modeling the super-resolution problem as
a regression problem. By optimizing the constraint on
high resolution image space, the proposed relation-
ship learning based super-resolution provides more
detailed and discriminative information, which makes
the resulting face image can be more accurately de-
tected by Haar Cascade Classifier.
In relationship learning super-resolution, a set of
training HR and LR images pairs are used to de-
termine the relationship between HR and LR pairs
at the training stage. At the query stage, the re-
lationship is accordingly used to estimate an HR
face image from a given LR image. Let S =
(x
L
1
, x
H
1
), (x
L
2
, x
H
2
), ··· , (x
L
N
, x
H
N
)
be a training set,
and R R
m×n
be the relationship matrix, the regres-
sion model can be represented as:
x
H
i
= R· x
L
i
+ e , for i = 1, 2, ··· , N , (10)
where m and n are the dimensionalities of HR and
LR images respectively, and e is the regression noise.
Therefore, the relationship matrix R can be deter-
mined by minimizing the regression error,
R = argmin
R
N
i=1
kx
H
i
R· x
L
i
k
2
. (11)
Equation (11) can be iteratively solved by gradient
descent approach. In each iteration, the the relation-
ship matrix R can be determined by
R
(n+1)
= R
(n)
δ
N
i=1
R
kx
H
i
R
(n)
· x
L
i
k
2
, (12)
where δ is the adjustment step size and the gradient of
regression error can be given by
N
i=1
R
kx
H
i
R· x
L
i
k
2
=
N
i=1
2(x
H
i
R· x
L
i
)
x
L
i
t
.
(13)
As shown in Fig. 2, the estimated HR face im-
age can be detected by Haar Cascade Classifier. In
(Wilman and Yuen, 2012), the authors demonstrate
the relationship based super-resolution outperforms
the existing super-resolution algorithms in terms of
visual quality and recognitionperformance. However,
the region of interest should be located before reso-
lution enhancement. Since the relationship matrix is
trained from face images pairs, even non-face images
may be mapped to face-like images. This work pro-
poses an inverse verification process to filter out non-
face images,
kT
R· x
L
x
L
k
2
< ε , (14)
where T is the down-sampling process which reduces
the resolution of the estimated HR image, and ε is a
preset threshold which controls the tolerable error.
4 EXPERIMENT RESULTS
4.1 Face Detection For Corrupted Face
Image
In order to verify the performance of sparse model-
ing error concealment, this work assumes the images
in the MUCT face database (Milborrow et al., 2010)
are compressed in H.264/AVC with Flexible Mac-
roblock Ordering(FMO) error resilience technique.
When some packets are lost, the image will lack some
blocks with size of 16×16, as shown in Fig. 3(a)-3(d).
It can be seen that some of corrupted face images can
not be detected, the detection accuracy is 49.9% in
(a) (b) (c) (d)
(e) (f) (g) (h)
Figure 3: (a)-(d) The corrupted images. (b) The recovered
images.
ImprovingtheAccuracyofFaceDetectionforDamagedVideoandDistantTargets
353
0 50 100 150 200 250
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Average Face Width
Detection Accurcy
Figure 4: The accuracy of face detection with various sizes
of face width.
this experiments. However, with the help of sparse
modeling error concealment, the detection accuracy
can be up to 99.8%.
4.2 Face Detection For Low Resolution
Face Image
As shown in Fig. 4, the face images with resolution
lower than 20×20 are difficult to be detected by Haar
Cascade Classifier. Therefore, this work uses the im-
ages in the MUCT face database (Milborrow et al.,
2010) to build the LH and HR training images pairs.
Each face image detected in the original image is re-
sized to 50 × 50 as the HR images, and is resized to
5× 5 and 9 × 9 as the LR images, as shown in Fig. 5
and 6 respectively. The experimentalresults show that
there is no face image with resolution 5 × 5 or 9 × 9
can be detected by Haar Cascade Classifier. However,
by using the proposed approach, the detection accu-
racy can be improved to 57.26% and 97.87% respec-
tively.
(a) (b) (c) (d) (e)
(f) (g) (h) (i) (j)
Figure 5: (a)-(e) The LR face images with size of 5× 5. (b)
The estimated HR images with size of 50× 50.
5 CONCLUSIONS
This work proposes the approaches to improve the ac-
(a) (b) (c) (d) (e)
(f) (g) (h) (i) (j)
Figure 6: (a)-(e) The LR face images with size of 9× 9. (b)
The estimated HR images with size of 50× 50.
curacy of face detection in two scenarios, corrupted
images and low resolution images. By sparse model-
ing error concealment, the face images of which some
blocks are lost during transmission can be also de-
tected. The experimental results demonstrate the ac-
curacy of detection can be significantly improved to
99.8%. Furthermore, this work proposed the relation
learning super-resolution with inverse verification can
effectively improve the face detection for face images
with very low resolution when the objects are located
in a long distance away from the camera. The ex-
perimental results demonstrate the proposed approach
makes the low resolution face images be detectable
with 57.26% and 97.87% of detection accuracy for
face images with sizes of 5× 5 and 9× 9 respectively.
REFERENCES
Bonner, J. (2001). Looking for faces in the super bowl
crowd. Access Control & Security System.
Chen, J.-H. (2011). An improved error concealment by di-
minishing the edge discontinuity. In Proceedings of
the IEEE International Conference on Image Process-
ing (ICIP), pages 2213–2216.
Dempsey, J. S. and Forst, L. S. (2010). An Introduction to
Policing. DELMAR CENGAGE Learning, 5 edition.
Kaup, A., Meisinger, K., and Aach, T. (2005). Frequency
selective signal extrapolation with applications to er-
ror concealment in image communication. AEUE -
International Journal of Electronics and Communica-
tions, 59:147–156.
Lakshman, H., K¨oppel, M., Ndjiki-Nya, P., and Wiegand,
T. (2010). Image recovery using sparse reconstruction
based texture refinement. In Proceedings of the Inter-
national Conference on Acoustics, Speech and Signal
Processing (ICASSP), pages 786–789.
Marciniak, T., Chmielewska, A., Weychan, R., Parzych, M.,
and Dabrowski, A. (2013). Influence of low resolution
of images on reliability of face detection and recogni-
tion. Multimedia Tools and Applications.
Meur, O. L. and Guillemot, C. (2012). Super-resolution-
based inpainting. Lecture Notes in Computer Science,
7577:554–567.
NCTA2014-InternationalConferenceonNeuralComputationTheoryandApplications
354
Milborrow, S., Morkel, J., and Nicolls, F. (2010).
The MUCT Landmarked Face Database. Pat-
tern Recognition Association of South Africa.
http://www.milbo.org/muct.
Phillips, J., Scruggs, W. T., OToole, A. J., Flynn, P. J.,
Bowyer, K. W., Schott, C. L., and Sharpe, M. (2007).
Frvt 2006 and ice 2006 large-scale results. NISTIR
7408.
Wilman, W. Z. and Yuen, P. C. (2010). Very low resolution
face recognition problem. In Proceedings of Fourth
IEEE International Conference on Biometrics: The-
ory Applications and Systems, pages 1–4.
Wilman, W. Z. and Yuen, P. C. (2012). Very low resolu-
tion face recognition problem. IEEE Transactions on
Image Processing, 21.
ImprovingtheAccuracyofFaceDetectionforDamagedVideoandDistantTargets
355