Automatic Face Corpus Creation
Ladislav Lenc and Pavel Kr´al
Department of Computer Science and Engineering, University of West Bohemia, Plzeˇn, Czech Republic
Keywords:
Automatic Face Recognition, Czech News Agency, Scale Invariant Feature Transform, Corpus Creation.
Abstract:
This paper deals with the automatic real-world face corpus creation. The main contribution consists in propo-
sition and evaluation of the automatic face corpus creation algorithm. Next, we statistically analysed the
structure of the created face corpus when the automatic algorithm is used. We further compared the face
recognition accuracy of our previously developed face recognition approach on this corpus while using differ-
ent size/quality datasets. We have shown that the manual verification of the corpus is not necessary. Therefore,
we concluded that our proposed algorithm is suitable for the further use by the Czech News Agency, our com-
mercial partner.
1 INTRODUCTION
In this paper, we are focusing on the automatic la-
beling of people in the database with a huge number
of the real-world photographs. Certain portion of the
pictures is labeled (informationabout the person iden-
tity available). The rest is unlabeled. In this case, the
recognition is tightly connected with the face detec-
tion and extraction steps because the pictures do not
contain only the faces.
Automatic corpus creation methods have been de-
veloped and evaluated particularly in the speech pro-
cessing domain (Chen and Nie, 2000; Tom´as et al.,
2001). Unfortunately, to the best of our knowledge
there is only little work on the automatic corpus cre-
ation in the face recognition field. All the well known
face databases have been created manually. However,
manual labeling is a very time-consuming and expen-
sive task.
The main goal of this paper thus consists in cre-
ation of a huge real-world face database. The creation
process must be as automatic as possible. The labeled
examples will be used for this task. The main con-
tribution of this work is proposition and evaluation
of the automatic face corpus creation algorithm. An-
other contribution of this paper is the statistical anal-
ysis of the results of the face corpus creation process
when the automatic creation algorithm is used. The
newly created face corpus will be used to evaluate
some face recognition approaches, which represents
the next contribution of this paper. We also compare
the face recognition accuracy while using different
size/quality datasets. The results of this work will be
used by the Czech News Agency (
ˇ
CTK).
For the face recognition, we use the adapted Scale
Invariant Feature Transform (SIFT)-based Kepenekci
method (Lenc and Kr´al, 2012), which has shown very
good recognition accuracy on standard datasets (e.g.
ORL). It is based on the SIFT algorithm proposed by
David Lowe in (Lowe, 2004). The SIFT algorithm is
used for feature extraction and the matching scheme
proposed by Kepenekci (Kepenekci, 2001) is used for
face comparison.
The following section describes the proposed cor-
pus creation algorithm and the structure of the created
face corpus. The next section contains the face recog-
nition results on the created corpus. Finally, the Sec-
tion 4 summarizes the results and givessome ideas for
the future research.
2 AUTOMATIC CORPUS
CREATION
We used the
ˇ
CTK photo-database for all experiments.
Every picture contains one face of a known person
(with the label). Unfortunately, the photos contain not
only the face itself. They may be composed of more
people, some background objects, etc.
2.1 Proposed Algorithm
Therefore, we propose an algorithm in order to detect
and extract the faces from the pictures and to create
582
Lenc L. and Král P..
Automatic Face Corpus Creation.
DOI: 10.5220/0004333305820586
In Proceedings of the 5th International Conference on Agents and Artificial Intelligence (ICAART-2013), pages 582-586
ISBN: 978-989-8565-39-6
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)