system is evaluated on the MOBIO corpus. The
results show that the full video processing gives a
relative improvement of performance around 10%
for the face system. In addition, the fusion between
face and speaker systems relatively improves the
performance by 30 to 60% comparing to the best
uni-modal system. Additionally, full video
processing decreases number of media where face
could not be found from 25 to 0 files for enrolment
data and from 118 to 9 for test data. Potentially, it
allows using more medias for score fusion and
improves overall system performance.
After verifying our bi-modal system on the
publically available mobile data corpus (MOBIO),
we develop an iPad prototype bi-modal biometric
application (demo video is available at
https://vid.me/wPJk). This application performs bi-
modal biometric user verification in a time of
around 3s.
Future works will be dedicated to conduct more
experiments with bi-modal verification with full
video processing on a mobile device using existing
iPad prototype application. Moreover, different
quality measures to select frame subset (such as
“mouth close” and “eyes open”,) will be tested to
improve the verification results.
ACKNOWLEDGEMENTS
This work is partially supported by the FUI 15
Equip’Age and H2020-DS-02-2014 SpeechXRays
projects.
REFERENCES
Bonastre, J., Scheffer, N., Matrouf, D., Fredouille, C.,
Larcher, A., Preti, A., Pouchoulin, G., Evans, N.,
Fauve, B., and Mason, J. (2008). Alize/spkdet: a state
of-the-art open source software for speaker
recognition. In The Speaker and Language
Recognition Workshop, Odyssey.
Cootes, T. F., Taylor, C. J., Cooper, D. H., and Graham, J.
(1995). Active shape models their training and
application. In Computer Vision and Image
Understanding, pages 38–59.
Gravier, G. (2009). Spro: Speech signal processing toolkit,
release 4.1.
Khoury, E., Vesnicer, B., Franco-Pedroso, J., Violato, RP.,
Boulkenafet, Z., Mazaira Fernandez, L.M., Diez, M.,
Kosmala, J., Khemiri, H., Cipr, T., Saedi, R., Gunther,
M., Zganec-Gros, J., Zazo Candil R., Simoes F.,
Bengherabi, M., Alvarez Marquina, A., Penagarikano,
M., Abad, A., Boulaymen, M., Schwarz, P., van
Leeuwen, D., Gonzalez-Dominguez, J., Uliani Neto,
M., Boutellaa, E., Gomez Vilda, P., Varona, A.,
Petrovska-Delacrétaz, D., Matejka, P., Gonzalez-
Rodriguez, J., De Freitas Pereira, T., Harizi, F.,
Rodriguez-Fuentes, L.J., El Shafey, L., Angeloni, M.,
Bordel, G., Chollet, G., and Marcel, S., (2013). The
2013 speaker recognition evaluation in mobile
environment, In ICB '13 : The 6th IAPR International
Conference on Biometrics.
Lowe, D. G. (2000). Distinctive image features from
scale-invariant keypoints. International Journal of
Computer Vision, 60:91–110.
McCool, C., Marcel, S., Hadid, A., Pietikainen, M.,
Matejka, P., Cernocky, J., Poh, N., Kittler, J., Larcher,
A., Levy, C., Matrouf, D., Bonastre, J.-F., Tresadern,
P., and Cootes, T. (2012). Bi-modal person
recognition on a mobile phone: Using mobile phone
data. In Multimedia and Expo Workshops (ICMEW),
2012 IEEE International Conference on, pages 635–
640.
Petrovska-Delacrétaz, D., Chollet, G., and Dorizzi, B.
(2009). Guide to Biometric Reference Systems and
Performance Evaluation. Springer Verlag.
Reynolds, D., Quatieri, T., and Dunn, R. (2000). Speaker
verification using adapted gaussian mixture models.
Digital Signal Processing, 10(13):19 – 41.
Stegmann, M. B., Ersbll, B. K., and Larsen, R. (2003).
Fame a flexible appearance modelling environment.
IEEE Trans. On Medical Imaging, 22(10):1319–
1331–110.
Zhou, D., Petrovska-Delacrétaz, D., and Dorizzi, B.
(2009). Automatic landmark location with a combined
active shape model. In International Conference on
Biometrics: Theory, Applications, and Systems, pages
1–7.
MacLean, K., VoxForge (2012). Ken MacLean. [Online].
Available: http://www.voxforge.org/home.
Phillips, P. J., Flynn, P. J., Scruggs, T., Bowyer, K. W.,
Chang, J., Hoffman, K., ... & Worek, W. (2005, June).
Overview of the face recognition grand challenge. In
Computer vision and pattern recognition, 2005. CVPR
2005. IEEE computer society conference on (Vol. 1,
pp. 947-954). IEEE.