resolution  reconstruction,  the  other  is  the  deep 
learning-based face detection. 
Generally  speaking,  the  super  resolution 
construction  are  some  kinds  of  restoration 
techniques,  which  consists  frequency  domain 
algorithm  and  time  domain  algorithm,  for  the 
original high resolution image based on multi-frame 
low  resolution  images  (Zhang,  2010).  All  the  low 
resolution images is captured in the same scene with 
the  original  high  resolution  image  and  there  just 
exists  slight  changes.  If  there  only  exists  one  low 
resolution,  the  ordinary  method  to  get  the  high 
resolution image is interpolation.  
In  the  case  of  only  one  low  resolution  image, 
different  form  the  traditional  interpolation  method, 
in  (Luo,  2011)  the  authors  proposed  the  deep 
learning-based  strategy  for  single  image  super-
resolution.  With  light  weighted  structure  deep 
convolution  neural  network  (CNN),  this  method 
directed  learns  an  end-to-end  mapping  between  the 
low/high  resolution  images.  They  also  proved  that 
the  sparse-coding-based  SR  can  be  viewed  as  a 
convolutional neural network. This work claimed the 
state-of-the-art  performance  and  suitable  for  the 
online usage.  
 
Figure  4:  Given  a  low  resolution  image  Y,  the  first 
convolution  layer  extracts  a  set  of  feature  maps.  The 
second layer maps these feature maps nonlinearly to high 
resolution  patch  representation.  The  last  layer  combines 
the  predictions  with  a  spatial  neighbourhood  to  produce 
the final high resolution image F(Y).   
In  this  above-mentioned  work,  the  authors  took 
the low resolution image as the input and output the 
high  resolution  one.  To  execute  the  image  quality 
enhancement using this deep-learning-based method, 
the training  stage  should  be carried  out  prior  to the 
output  stage.  Refer  to  this  method,  we  utilize  over 
5000  pairs  of  LR  images  and  HR  images,  which 
with  126*102  pixels  and  441*358  pixels 
respectively, as training dataset. 
Fig.4  shows  the  schematic  diagram  of  the  deep 
learning-based SR. 
Face  detection  in  the  complex  scenes  is  an 
essential  but  rarely  rough  task.  To  the  fixed 
surveillance  camera,  the  field  of  view  (FOV)  is 
constant. In  this  scene, the  face region in  the  frame 
image  is  enough  bit  to  execute  the  face  detection. 
But  in  the  ordinary  surveillance  scenes,  to  those 
peoples  far  away  from  the  fixed-focus  camera,  the 
face  region  maybe  too  small  to  be  detected.  In  this 
case,  the  pedestrian  detection  should  be  utilized  to 
detect  the  concerned  people  and  track  this  people 
until his approach makes the face region enough big 
to  be  detected.  This  strategy  was  proposed  in  our 
previous  work  (Yan,  2014)  and  proved  to  be 
effective and efficient.  
Considering  the  complexity  of  face  detection  in 
the  ordinary  surveillance  scenes,  the  researchers 
presents  a  new  state-of-the  art  approach  in  (Chen, 
2014).  They  observed  that  the  aligned  face  shapes 
provides  better  features  for  face  classification.  To 
combine  the  face  alignment  and  detection  more 
effectively,  they  learned  this  two  tasks  in  the  same 
cascade.  By  exploiting  the  joint  learning,  the 
capability  of  cascade  detection  and  real  time 
performance can both achieve the satisfied status.   
 
Figure 5: The key point annotation on face shape. 
As  shown  in  Fig.5,  we  use  38  key  points  to 
describe the face shape, 10 points for face contour, 6 
points for eyebrows, 10 points for eyes, 7 points for 
nose and 5 points for lip respectively. 
We  bought  a  face  image  dataset  consisted  of 
about 20, 000 face images and 20, 000 natural scene 
images without faces from web. All face images are 
transferred  into grayscale  images.  After all the  face 
images are labelled, the dataset is utilized to train the 
classification/regression tree. 
3  EXPERIMENT AND 
CONCLUSION 
We  utilized  the  combination  of  the  on-site 
surveillance  camera  and  RFID  reader  to  realize  the 
self-service  passenger  pass.  The  key  techniques 
focus in the on-site face detection effectively and the 
online  SR  reconstruction  for  the  low  resolution  ID 
electronic photos. As a comparison, we also directly 
ISME 2015 - Information Science and Management Engineering III
132
ISME 2015 - International Conference on Information System and Management Engineering