Improving the Accuracy of Face Detection for Damaged Video and

Distant Targets

Jun-Horng Chen

Department of Communication Engineering, Oriental Institute of Technology, New Taiepi City, Taiwan

Keywords:

Error Concealment, Face Detection, Super-resolution.

Abstract:

This work aims at improving the accuracy of face detection in two scenarios, when the video quality is de-

teriorated by the transmission link and when the target is far away from the camera. In block based coding,

the packet loss inevitably makes the corrupted face image lacks some blocks. This work proposes the sparse

modeling error concealment can coarsely recover the lost blocks, the ﬁne texture can be obtained by dimin-

ishing the edge discontinuity, and a satisﬁed result for face detection can thus be recovered. Furthermore, this

work utilizes the relationship learning super-resolution method to enhance the resolution in the case of face

image taken from a long distance. Experimental results demonstrate that the proposed approach can effectively

increase the accuracy of face detection for severely degraded and low resolution face images.

1 INTRODUCTION

As the continuous growth of ubiquitously installed

cameras, the applications of computer vision tech-

niques are rapidly developed. Over the past decades,

face recognition has become one of the most popu-

lar biometric applications. The widespread surveil-

lance systems encourage the development and estab-

lishment of face recognition in public area. Gener-

ally, face recognition systems are composed of two

stages: detection stage and recognition stage, and are

analyzed separately (Marciniak et al., 2013). That is,

if the face can not be detected at the ﬁrst stage, system

with high accuracy of recognition will not function

expectedly.

However, in some surveillance systems, the video

signal is fed into the recognition system via trans-

mission link. Therefore, the image quality is in-

evitably degraded by imperfect transmission, and the

degraded face video deﬁnitely diminishes the accu-

racy of recognition. Generally in video communica-

tion, the error concealment technique which recovers

the corrupted Macroblocks(MB) at the decoder site is

proposed for maintenance of the visual quality. The

sparse modeling error concealment (Lakshman et al.,

2010) has been proven to be an effective way to en-

hance the visual quality. Accordingly, this work will

utilize sparse modeling error concealment to recover

the corrupted face images so that the face detection

accuracy can thus be improved.

Furthermore, the impressive performance of face

recognition system is usually measured in controlled

conditions, such as ambient illumination, pose, res-

olution, etc. For example, in FRVT 2006 (Phillips

et al., 2007) , the interpupillary distance (IPD) of

some experiments can be as high as 400 pixels. It

is the main reason for some deployments (Bonner,

2001)(Dempsey and Forst, 2010) did not meet the

required accuracy. As for some successful deploy-

ments, the subject’s cooperation and the controlled

conditions are required and expected. Since the super-

resolution (SR) process is proposed to enhance reso-

lution image from one or multiple low resolution im-

ages, this work will utilize an effective SR approach

to estimate a high resolution image from a very low

resolution image which is taken by a camera located

at a long distance away from target.

2 SPARSE MODELING ERROR

CONCEALMENT

The sparse modeling error concealment technique

which recovers the corrupted or lost blocks at the

decoder site is proposed for maintenance of the im-

age visual quality in imperfect transmission link. In

contrast to the traditional error resilience techniques

e.g. FEC and ARQ, the error concealment is ex-

pected to diminish the channeleffect without the over-

351

Chen J..

Improving the Accuracy of Face Detection for Damaged Video and Distant Targets.

DOI: 10.5220/0005161603510355

In Proceedings of the International Conference on Neural Computation Theory and Applications (NCTA-2014), pages 351-355

ISBN: 978-989-758-054-3

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

head bandwidth. The authors of (Kaup et al., 2005)

(Lakshman et al., 2010) used sparse modeling to ex-

trapolated corrupted image data. By referring to the

available neighbor data, the recovered data is a lin-

ear combination of a set of basis functions. The cri-

terion of MMSE (minimizing mean-squared-error) of

the available image data is used to determine the co-

efﬁcient of each basis iteratively. Let x ∈ R

an interested region, which contains a known part

and an unknown part x

, x can be represented

as a linear combination of a linearly independent set

Φ = { u

, u

, ··· , u

}. That is, in the n-th iteration,

the approximated ˜x

(n)

will be given by

˜x

(n)

∑

∈Φ

(n)

. (1)

Accordingly, in the (n + 1)-th iteration, Φ

(n+1)

(n)

(n+1)

, where u

(n+1)

is the new chosen ba-

sis, and

˜x

(n+1)

= ˜x

(n)

+ c

n+1

(n+1)

, (2)

where c

n+1

is determined by minimizing the error,

(n+1)

= k˜x

(n+1)

− x

. (3)

In (Chen, 2011), c

n+1

can be determined by

n+1

˜x

(n+1)

− x

· u

(n+1)

· u

(n+1)

, (4)

and u

(n+1)

can be determined by

(n+1)

= argmax

∈Φ

(n+1)

− E

(n)

. (5)

(a) Corrupted Image. (b) Recovered Image.

Figure 1: (a) The face in corrupted image can not be de-

tected. (b) The face in sparse modeling recovered image

can be detected.

Figure 1 shows the sparse modeling error conceal-

ment can improve the detection accuracy when im-

ages are corrupted by transmission link. The exam-

ple image is drawn from the MUCT face database

(Milborrow et al., 2010). The corrupted image as

shown in Fig. 1(a) is the simulation result of the im-

age coded in H.264/AVC with Flexible Macroblock

Ordering(FMO) error resilience technique suffers

packet loss during transmission. The face detection is

conducted by the oft-used Haar Cascade Classiﬁer.

Although the pixels inside the corrupted image

block can be estimated by sparse modeling, it is no-

ticed that the edge across the boundary may not be

continued in Fig. 1(b). This is because the MMSE

approach solves Eq. (4) and (5) without considera-

tion of edge continuity. It is proven (Chen, 2011) that

the edge can be well extended across the boundary by

minimizing a pre-deﬁned cost function of discontinu-

ity.

In (Chen, 2011), the parametric cost function was

deﬁned by the absolute magnitude difference of the

gradient across the block boundary in four directions,

and represented by the product of a sparse matrix D

and a target region which is represented by a column

vector. That is,

J(c) = k D· ˆxk

, (6)

where

x is the column vector form of estimated image

vector by sparse modeling error concealment. The co-

efﬁcient vector c is the projection vector of

x on a set

of linearly independent basis vectors Φ, that is,

x =

∑

i=1

= Φ· c . (7)

In this work, the 2D-DCT kernel functions are used

as the basis functions. Then, by the steepest descent

approach iteratively, the coefﬁcient vector c is moved

towards

c = c + δ∆c such that the cost function J in

Eq. (6) has a maximum reduction. That is, the moving

vector ∆c is

∆c = −∇J(c) , (8)

and can be analytically determined by (Chen, 2011)

∆c = 2(D·

∑

i=1

)

· D· Φ . (9)

3 RELATIONSHIP LEARNING

BASED SUPER-RESOLUTION

In surveillance systems, the face image might be im-

poverished because the object is located at a large

distance away from the camera. Generally, the oft-

used face detection approach can not detect such low

resolution face image whose resolution is lower than

20 × 20. The super-resolution based image inpaint-

ing (Meur and Guillemot, 2012) has been proven to

NCTA2014-InternationalConferenceonNeuralComputationTheoryandApplications

352

(a) LR Image. (b) HR Image.

Figure 2: (a) The LR face image with size of 9× 9. (b) The

estimated HR image with size of 50× 50 .

be an effective way to estimate the missing region

in an image. This work propose that the face super-

resolution methods can be utilized to enhance the res-

olution of the images, and hopefully the detection ac-

curacy can be accordingly improved for long distant

targets. In (Wilman and Yuen, 2010), the authors im-

prove the existing learning-based super-resolution ap-

proachesby modeling the super-resolution problem as

a regression problem. By optimizing the constraint on

high resolution image space, the proposed relation-

ship learning based super-resolution provides more

detailed and discriminative information, which makes

the resulting face image can be more accurately de-

tected by Haar Cascade Classiﬁer.

In relationship learning super-resolution, a set of

training HR and LR images pairs are used to de-

termine the relationship between HR and LR pairs

at the training stage. At the query stage, the re-

lationship is accordingly used to estimate an HR

face image from a given LR image. Let S =



, x

), (x

, x

), ··· , (x

, x

)



be a training set,

and R ∈ R

m×n

be the relationship matrix, the regres-

sion model can be represented as:

= R· x

+ e , for i = 1, 2, ··· , N , (10)

where m and n are the dimensionalities of HR and

LR images respectively, and e is the regression noise.

Therefore, the relationship matrix R can be deter-

mined by minimizing the regression error,

R = argmin

∑

i=1

− R· x

. (11)

Equation (11) can be iteratively solved by gradient

descent approach. In each iteration, the the relation-

ship matrix R can be determined by

(n+1)

= R

(n)

− δ

∑

i=1

∇

− R

(n)

· x

, (12)

where δ is the adjustment step size and the gradient of

regression error can be given by

∑

i=1

∇

− R· x

∑

i=1

−2(x

− R· x

)





(13)

As shown in Fig. 2, the estimated HR face im-

age can be detected by Haar Cascade Classiﬁer. In

(Wilman and Yuen, 2012), the authors demonstrate

the relationship based super-resolution outperforms

the existing super-resolution algorithms in terms of

visual quality and recognitionperformance. However,

the region of interest should be located before reso-

lution enhancement. Since the relationship matrix is

trained from face images pairs, even non-face images

may be mapped to face-like images. This work pro-

poses an inverse veriﬁcation process to ﬁlter out non-

face images,



R· x



− x

< ε , (14)

where T is the down-sampling process which reduces

the resolution of the estimated HR image, and ε is a

preset threshold which controls the tolerable error.

4 EXPERIMENT RESULTS

4.1 Face Detection For Corrupted Face

Image

In order to verify the performance of sparse model-

ing error concealment, this work assumes the images

in the MUCT face database (Milborrow et al., 2010)

are compressed in H.264/AVC with Flexible Mac-

roblock Ordering(FMO) error resilience technique.

When some packets are lost, the image will lack some

blocks with size of 16×16, as shown in Fig. 3(a)-3(d).

It can be seen that some of corrupted face images can

not be detected, the detection accuracy is 49.9% in

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 3: (a)-(d) The corrupted images. (b) The recovered

images.

ImprovingtheAccuracyofFaceDetectionforDamagedVideoandDistantTargets

353

0 50 100 150 200 250

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Average Face Width

Detection Accurcy

Figure 4: The accuracy of face detection with various sizes

of face width.

this experiments. However, with the help of sparse

modeling error concealment, the detection accuracy

can be up to 99.8%.

4.2 Face Detection For Low Resolution

Face Image

As shown in Fig. 4, the face images with resolution

lower than 20×20 are difﬁcult to be detected by Haar

Cascade Classiﬁer. Therefore, this work uses the im-

ages in the MUCT face database (Milborrow et al.,

2010) to build the LH and HR training images pairs.

Each face image detected in the original image is re-

sized to 50 × 50 as the HR images, and is resized to

5× 5 and 9 × 9 as the LR images, as shown in Fig. 5

and 6 respectively. The experimentalresults show that

there is no face image with resolution 5 × 5 or 9 × 9

can be detected by Haar Cascade Classiﬁer. However,

by using the proposed approach, the detection accu-

racy can be improved to 57.26% and 97.87% respec-

tively.

(a) (b) (c) (d) (e)

(f) (g) (h) (i) (j)

Figure 5: (a)-(e) The LR face images with size of 5× 5. (b)

The estimated HR images with size of 50× 50.

5 CONCLUSIONS

This work proposes the approaches to improve the ac-

(a) (b) (c) (d) (e)

(f) (g) (h) (i) (j)

Figure 6: (a)-(e) The LR face images with size of 9× 9. (b)

The estimated HR images with size of 50× 50.

curacy of face detection in two scenarios, corrupted

images and low resolution images. By sparse model-

ing error concealment, the face images of which some

blocks are lost during transmission can be also de-

tected. The experimental results demonstrate the ac-

curacy of detection can be signiﬁcantly improved to

99.8%. Furthermore, this work proposed the relation

learning super-resolution with inverse veriﬁcation can

effectively improve the face detection for face images

with very low resolution when the objects are located

in a long distance away from the camera. The ex-

perimental results demonstrate the proposed approach

makes the low resolution face images be detectable

with 57.26% and 97.87% of detection accuracy for

face images with sizes of 5× 5 and 9× 9 respectively.

REFERENCES

Bonner, J. (2001). Looking for faces in the super bowl

crowd. Access Control & Security System.

Chen, J.-H. (2011). An improved error concealment by di-

minishing the edge discontinuity. In Proceedings of

the IEEE International Conference on Image Process-

ing (ICIP), pages 2213–2216.

Dempsey, J. S. and Forst, L. S. (2010). An Introduction to

Policing. DELMAR CENGAGE Learning, 5 edition.

Kaup, A., Meisinger, K., and Aach, T. (2005). Frequency

selective signal extrapolation with applications to er-

ror concealment in image communication. AEUE -

International Journal of Electronics and Communica-

tions, 59:147–156.

Lakshman, H., K¨oppel, M., Ndjiki-Nya, P., and Wiegand,

T. (2010). Image recovery using sparse reconstruction

based texture reﬁnement. In Proceedings of the Inter-

national Conference on Acoustics, Speech and Signal

Processing (ICASSP), pages 786–789.

Marciniak, T., Chmielewska, A., Weychan, R., Parzych, M.,

and Dabrowski, A. (2013). Inﬂuence of low resolution

of images on reliability of face detection and recogni-

tion. Multimedia Tools and Applications.

Meur, O. L. and Guillemot, C. (2012). Super-resolution-

based inpainting. Lecture Notes in Computer Science,

7577:554–567.

NCTA2014-InternationalConferenceonNeuralComputationTheoryandApplications

354

Milborrow, S., Morkel, J., and Nicolls, F. (2010).

The MUCT Landmarked Face Database. Pat-

tern Recognition Association of South Africa.

http://www.milbo.org/muct.

Phillips, J., Scruggs, W. T., OToole, A. J., Flynn, P. J.,

Bowyer, K. W., Schott, C. L., and Sharpe, M. (2007).

Frvt 2006 and ice 2006 large-scale results. NISTIR

7408.

Wilman, W. Z. and Yuen, P. C. (2010). Very low resolution

face recognition problem. In Proceedings of Fourth

IEEE International Conference on Biometrics: The-

ory Applications and Systems, pages 1–4.

Wilman, W. Z. and Yuen, P. C. (2012). Very low resolu-

tion face recognition problem. IEEE Transactions on

Image Processing, 21.

ImprovingtheAccuracyofFaceDetectionforDamagedVideoandDistantTargets

355