Table 1: Mean accuracies and 95% confidence intervals of the logistic regression method trained separately (first row) and
following our shared logistic approach (second row). When more than 4 verification tasks are simultaneously trained, the
error rates of the shared approach become lower.
1 2 3 4 5
Logistic 68.1± 8.2 65.5± 6.4 69.5± 5.3 68.2± 4.2 69.8± 3.9
Shared Logistic 59.4± 4.2 64.2± 5.4 67.9± 5.2 71.3± 5.2
6 7 8 9 10
Logistic 68.2± 3.6 68.6± 3.3 70.4± 3.3 69.8± 3.0 70.4± 2.9
Shared Logistic 72.8± 4.2 76.4± 3.1 78.2± 2.8 82.5± 2.4 84.6± 2.3
the improvement is more significant when the number
of jointly trained verification tasks increases, being a
15% higher in the case of 10 simultaneous verifica-
tions.
The probabilistic modelling presented in this pa-
per suggests new lines of future research. In our
first formulation, the sharing knowledge property is
imposed by constraining the parameter space of the
classifiers along the multiple tasks. Other approaches
could be followed, such as a more complex modelling
based on a hidden model that generates the parameter
space.
Moreover, the addition of extra related tasks from
different domains could be studied. For example, a
gender or ethnicity recognition problem. The enlarge-
ment of the task pool should benefit the amount of
shared information between the related tasks, miti-
gating the effects of the small sample size problem
in face verification.
ACKNOWLEDGEMENTS
This work is supported by MEC grant TIN2006-
15308-C02-01, Ministerio de Ciencia y Tecnologia,
Spain.
REFERENCES
Ando, R. and Zhang, T. (2005). A framework for learn-
ing predicitve structures from multiple tasks and un-
labeled data. Journal of Machine Learning Research,
6:1817–1853.
Belhumeur, P., Hespanha, J., and Kriegman, D. (1997).
Eigenfaces vs. fisherfaces: Recognition using class
specific linear projection. IEEE Trans. Pattern Analy-
sis and Machine Intelligence, 19(7):711–720.
Caruana, R. (1997). Multitask learning. Machine Learning,
28(1):41–75.
Evgeniou, T., Micchelli, C., and Pontil, M. (2005). Learn-
ing multiple tasks with kernel methods. Journal of
Machine Learning Research, 6:615–637.
Fisher, R. (1936). The use of multiple measurements in
taxonomic problems. Ann. Eugenics, 7:179–188.
Fukunaga, K. and Mantock, J. (1983). Nonparametric dis-
criminant analysis. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 5(6):671–678.
Hyvarinen, A. (1999). The fixed-point algorithm and max-
imum likelihood estimation for independent compo-
nent analysis. Neural Process. Lett., 10(1):1–5.
Lee, D. and Seung, S. (2000). Algorithms for non-negative
matrix factorization. In NIPS, pages 556–562.
Martinez, A. and Benavente, R. (1998). The AR Face data-
base. Technical Report 24, Computer Vision Center.
Masip, D., Kuncheva, L. I., and Vitria, J. (2005). An
ensemble-based method for linear feature extraction
for two-class problems. Pattern Analysis and Appli-
cations, 8:227–237.
Masip, D. and Vitri`a, J. (2006). Boosted discriminant pro-
jections for nearest neighbor classification. Pattern
Recognition, 39(2):164–170.
Moghaddam, B., Wahid, W., and Pentland, A. (1998).
Beyond eigenfaces: Probabilistic matching for face
recognition. In Proc. of Int’l Conf. on Automatic Face
and Gesture Recognition (FG’98), pages 30–35, Nara,
Japan.
Thrun, S. and Pratt, L. (1997). Learning to Learn. Kluwer
Academic.
Torralba, A., Murphy, K., and Freeman, W. (2004). Sharing
features: efficient boosting procedures for multiclass
object detection. In Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition.