work. However, for re-identification task, since a pos-
itive sample (true match) consists of images from the
same people, the number of the people in the dataset
restricts the number of the positive samples. This
makes the number of positive samples usually much
smaller than that of negative samples. In the training
phase, the negative samples should be limited within
a certain amount to avoid overfitting.
Multiple-shot dataset of re-identification task is a
good choice for solving this problem. As shown in ex-
perimental results, by creating more positive samples
crossing multiple images, the performance of network
on CAVIAR4REID is better than that on VIPeR. Dur-
ing the experiments, we observe that the performance
of the network on some multiple shot datasets, such
as ETHZ (Ess et al., 2007) and Person Re-ID 2011
(Hirzer et al., 2011), is not impressive. By further in-
specting these datasets, we find that multiple images
of the people are extracted from video sequences. The
difference between images is small, and they cannot
contribute to training work.
From another point of view, similar as denoising
AutoEncoder, it is possible to increase positive sam-
ples by applying some transformations to original im-
ages, such as partial corruption of the input image
pairs, extracting random patch of the images and so
on. By such a strategy, we can not only increase pos-
itive samples to combat overfitting issue, but also can
improve the robustness of the network against noise.
4 CONCLUSIONS
How to find a good feature representation to bridge
the “gap” between appearances of the same people
is a very challenging task. Existing methods either
employ hand craft features or use machine learning
method with existing features to form a specific rep-
resentation. However, there are a lot of uncertainty
in these methods due to human factors and specific
applications. Deep learning, with ability to learn a
proper feature representation from the bottom of the
raw images, seems to be a promising solution for the
people re-identification tasks.
In this paper, we utilize deep convolutional neu-
ral network to solve people re-identification problem.
We integrate feature learning and re-identification
into one framework, and accomplish learning and re-
identification simultaneously. In order to deal with
the ranking-like comparison problem, we introduce
a linear support vector machine to replace the soft-
max lay for measuring the similarity of the comparing
images. Since there is a large amount of parameters
of the network needed to be estimated, while only a
small number of training data are available, the pre-
trained unsupervised learning and dropout technique
are used to reduce overfitting.
Although the proposed is quite simple, we still
achieve very encourage performance compared with
baseline methods, which gives us great confidence.
But compared with the state-of-the-art methods, our
performance needs to be further improved. The care-
ful analysis on the results shows that the serious over-
fitting caused by the lack of positive training samples
seems to be the reason. This is our future work.
ACKNOWLEDGEMENTS
This research was supported by the National In-
stitute of Information and Communication Technol-
ogy (NICT), and by the Strategic Information and
Communications R&D Promotion Programme (No.
131306004). Yu Wang is supported by Grant-in-Aid
for Japan Society for the Promotion of Science and
Guanwen Zhang is also supported by the Fund of the
China Scholarship Council.
REFERENCES
Bazzani, L., Cristani, M., Perina, A., and Murino, V. (2012).
Multiple-shot Person Re-identification by Chromatic
and Epitomic Analyses, volume 33.
Cheng, D. S., Cristani, M., Stoppa, M., Bazzani, L., and
Murino, V. (2011). Custom Pictorial Structures for
Re-identification.
Dikmen, M., Akbas, E., Huang, T. S., and Ahuja, N. (2010).
Pedestrian Recognition with a Learned Metric. Proc.
Asia Conf. Computer Vision, pages 501–512.
Ess, A., Leibe, B., and van Gool, L. (2007). Depth and
Appearance for Mobile Scene Analysis. Proc. Int’l
Conf. Computer Vision, pages 1–8.
Farenzena, M., Bazzani, L., Perina, A., Murino, V., and
Cristani, M. (2010). Person Re-Identification by
Symmetry-Driven Accumulation of Local Features.
Gray, D., Brennan, S., and Tao, H. (2007). Evaluating ap-
pearance models for recognition, reacquisition, and
tracking. In 10th IEEE Int’l Workshop on Perfor-
mance Evaluation of Tracking and Surveillance.
Gray, D. and Tao, H.(2008). Viewpoint Invariant Pedestrian
Recognition with an Ensemble of Localized Features.
Proc. European Conf. Computer Vision, pages 262–
275.
Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing
the dimensionality of data with neural networks, vol-
ume 313.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I.,
and Salakhutdinov, R. (2012). Improving neural net-
works by preventing co-adaptation of feature detec-
tors, volume abs/1207.0580.
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
222