related literature is reviewed in section 2, followed by
a detailed description of our approach in section 3.
We describe the experiments and results in section 4
and finally conclude in section 5.
2 RELATED WORK
In this section, firstly, we describe the related
works for transfer learning and deep learning, then
we present the Faster R-CNN deep network (Ren
et al., 2015) and the SMC scene specialization
algorithm (Ma
ˆ
amatou et al., 2016) which are two
main references for our paper.
Transfer learning aims to address the problem
when the distribution of the training data from
the source domain is different from that of the
target domain. Over the past decades, numerous
methods have been suggested for transfer learning in
pedestrian detection (Ma
ˆ
amatou et al., 2016), (Wang
et al., 2014). Recently, addressing this problem
with deep neuronal networks has gained an increased
attention. Some deep models have been investigated
in the unsupervised and transfer learning challenge
(Guyon et al., 2011). Transfer learning using deep
models has been turned out to be effective in some
challenges (Mesnil et al., 2012), (Goodfellow et al.,
2012) like traffic-object detection (Zeng et al., 2014),
(Li et al., 2015) and sentiment analysis (Glorot et al.,
2011).
In addition, deep learning has led to great
performance on a variety problems of computer
vision like vehicle detection (Li et al., 2015), action
recognition (Will Y. Zou, 2011), face recognition
(Huang et al., 2012), (Taigman et al., 2014) and image
classification (Duan et al., 2009). Among various
types of deep neural networks, convolutional neuronal
networks (Jia et al., 2014), (LeCun et al., 1998)
have been proved to make great successes in machine
learning and computer vision applications.
In this paper, we propose a new framework based
on SMC filter to specialize the recent deep detector,
the Faster R-CNN (Ren et al., 2015) for pedestrian
detection.
The Faster R-CNN (Ren et al., 2015) has been
put forward to accurately detect general objects in
pictures. It has achieved a state-of-the-art 73.2
mean average precision on the PASCAL VOC 2007
dataset (Everingham et al., 2010). By using both
region-proposal network for localization task and
detector network together into one large network.
The Faster R-CNN was composed of two
modules: The first module is a Region Proposal
Network (RPN) that provided a set of rectangular
object proposals from an input image, and the second
one was the Fast R-CNN network which took as
inputs this set of object proposals and then used them
for detection. The entire system was a single, unified
network for object detection.
In this paper, the proposed method is developed
based on the SMC framewoek (Ma
ˆ
amatou et al.,
2016) due to its superb efficiency and performance in
traffic object detection. Maamatou et al. (Ma
ˆ
amatou
et al., 2016) put forward a transductive transfer
learning method based on an SMC filter to iteratively
build a new specialized dataset that was used to
train a new specialized pedestrian detector. This
new produced dataset is composed of both source
and target samples that estimated the unknown target
distribution. The specialization algorithm was applied
on a HOG-SVM detector. The general framework
presented in this paper is inspired from this work. We
propose a transfer learning framework based on the
SMC filter to specialize the Faster R-CNN detector
to a target scene. The specialization framework
presented in this paper proposes various differences
over the SMC framework proposed in (Ma
ˆ
amatou
et al., 2016), we cite mainly: we remove the sampling
step of the SMC filter and keeping only the prediction
and the update steps. The aim of this improvement is
to optimize the specialization chain.
In this section, we purport an approach based
on the SMC filter to automatically specialize the
Faster R-CNN deep model to a target scene for
pedestrian detection. The block diagram of our
suggested specialized Faster R-CNN is given in
Fig.1. At the first iteration, we fine-tune an public
ImageNet-pre-trained model (VGG 16) (Simonyan
and Zisserman, 2014) to the Pascal VOC dataset to
create a generic pedestrian detector. This latter is
utilized in the first step of the SMC ”Prediction” to
suggest samples from the target scene and then we
apply the likelihood function in the update step to
correctly select weight samples from a specific scene
and determine the relevant ones for the specialization
process. A new specialized detector is fine-tuned by
the specialized dataset in the fine-tuning step and it
will become the input of the prediction step in the
next iteration. The prediction, update and fine-tuning
steps are called until a stopping criterion is reached,
for example a fixed number of iterations.
In what follows, we first describe the adaptation
of the two SMC steps with the Faster R-CNN model,
and then deal with the fine-tuning step.
VISAPP 2017 - International Conference on Computer Vision Theory and Applications
18