domain, then we fine tune our CNN in these pseudo
labels and learn specific characteristics from the tar-
get domain. As the training is performed with the ac-
tual target domain images we expect to increase even
more our performance, even though the pseudo label
generate a noisy label space for the target domain.
In our experiments, we evaluated the CNN perfor-
mance in the target domain (direct transfer), we eval-
uated the same CNN trained with the dataset adapted
by a cycleGAN to the target distribution and we also
evaluated the same CNN trained in the target domain
using our pseudo label method. Our method surpasses
the baseline accuracy in all test cases. All of that is
achieved by replacing step 3 by our technique and will
be explained in more details in sections 3 and 4.
In addition, we observed that the highly unbal-
anced nature of the person re-identification problem
means that training batches may be heavily biased to-
wards negative samples. To deal with that, we use a
batch scheduler algorithm that allows to train a CNN
with a triplet loss in cases where the data is noisy.
Next section discusses related work. Section 3
presents our method and Section 4 presents experi-
ments and results. This paper concludes in Section 5.
2 RELATED WORK
The state-of-art on person re-identification follows
a pattern of using either attention-based neural net-
works (Liu et al., 2017), factorization neural networks
(Chang et al., 2018) or body parts detection (Zhao
et al., 2017). The common point in these works is try-
ing to disregard the background information, so they
can give the proper weight for the image areas where
the person is visible. These methods achieve great re-
sults, but have a high complexity, as they are based
on combinations of several elements. However, dif-
ferent datasets have different characteristics and cer-
tain combination of methods may not work across all
datasets. In this paper, our focus is on the exploitation
of domain adaptation for this application. To design
more controlled experiments, we use a relatively sim-
ple end-to-end system based on the ResNet-50 (He
et al., 2016) as a backbone.
Typically, the person re-identification challenge
is approached as a metric learning task (Zhao et al.,
2017; Deng et al., 2018). But it can also be ap-
proached as a classification task where each person
from the dataset is a class (Liu et al., 2017; Chang
et al., 2018). The problem of the classification-based
approach is that the space of labels is fixed and has
a large cardinality. Such methods are rarely applica-
ble in practice, unless the set of identities of people
who transit through a set of environments is always
the same. Our target application is public spaces,
therefore it is not possible to restrict the set of labels.
Therefore we approach this as a metric learning chal-
lenge. Further to being applicable to public spaces,
the task of comparing samples is the same across dif-
ferent domains. This enables the application of un-
supervised domain adaptation methods to adapt the
marginal distribution of the data.
Recently, some works presented domain adapta-
tions techniques for person re-identification. (Zhao
et al., 2017) created a new dataset to evaluate the gen-
eralization capacity of his model. Their CNN was
evaluated in it without further training. (Zhong et al.,
2018) used a cycleGAN to approximate the camera
views in a dataset trying to learn a camera latent space
metric. (Xiao et al., 2016) trained his CNN with a
super dataset created concatenating multiple datasets.
They proposed a domain guided dropout to further
specialize their CNN for each dataset. In this work,
we consider that the target domains have no labeled
data, then we cannot use the approaches of (Zhong
et al., 2018) or (Xiao et al., 2016). The approach of
(Zhao et al., 2017) can be called direct transfer, be-
cause it just evaluates a CNN on a target domain. We
shall demonstrate that our method outperforms direct
transfer.
3 PROPOSED METHOD
Our technique is based on training a CNN to learn a
metric, so we can ensure that distinct domains will
have the same task. Therefore, we train a ResNet-50
(Section 3.1) with the triplet loss (Section 3.2) to learn
the desired metric in an Euclidean vector space. The
core of the domain adaptation method is based in a
cycleGAN that will perform an image-image transla-
tion to approximate source and target domains (Sec-
tion 3.3). Then, we use the CNN trained in the in-
termediate dataset to extract the features of the tar-
get domain images and use a clustering algorithm to
generate pseudo-labels for the target domain (Section
3.4).
3.1 Baseline CNN
As said in Section 2, the state-of-art in person re-
identification use techniques that exploit information
from CNNs at multiple levels, bringing multiple se-
mantic levels to the final features. Those semantic
levels may carry specific person attributes like gen-
der, clothing, textures and clothing, which are impor-
tant for matching people across views.
VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications
696