possible to determine the position of the eyes, and
therefore to find the iris. The eye detection can be
based on (among other methods) template matching,
appearance classification, or feature detection. In the
template matching methods, a generic eye model is
first created based on the eye shape, and a template
matching process is then used to search eyes in the im-
age. The appearance based methods detect eyes based
on their appearance using a classifier trained using a
large amount of image patches representing the eyes
of several users under different orientations and illu-
mination conditions. The feature detection methods
explore the visual characteristics of the eyes (such as
edge, intensity of iris, or color distributions) to iden-
tify some distinctive features around the eyes.
9 CONCLUSIONS
One of the most difficult problems in content-based
image indexing and retrieval is the automatic identi-
fication of regions of interest from images. The diffi-
culty is related to the subjective semantics associated
to the region. This difficulty is the main reason to
consider semi-automatic approach as the most real-
istic approach to extract regions of interest from im-
ages for indexing and retrieval. More image regions
are semantically labeled, better is the quality of index-
ing. The semi-automatic approach needs user collab-
oration and cooperation with algorithms to determine
regions of interests. Generally, experts use graphical
tools to determine these regions. This task, although
popular, is time consuming when considering huge
quantities of images.
Exploiting the information carried in natural hu-
man gaze is an interesting approach to determine effi-
ciently potential semantic regions with almost no hu-
man effort. Our model is based on the use of succes-
sive fixations and saccades from people watching the
media. It processes these data in order to determine
clusters of points, and to extract several metrics for
estimating the importance of areas in the image. The
metrics are: the cardinal, the variance, the surface,
the time-weighted visit count and the revisit count.
All these metrics are combined together into a sin-
gle estimator of the importance of image regions, in
a human-centered fashion. The use of this natural in-
formation is en effective way of dealing with the high
number of images in searched collections, which is a
crucial issue in indexing.
Also we have concentrated our study on static im-
ages/keyframes, video indexing and retrieval can ben-
efit from this approach, where the quantity of data is
very high. Extensions of our work to the specificity of
the video will need to face other challenging problems
related to the temporal aspect of data.
REFERENCES
Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern In-
formation Retrieval. Addison-Wesley.
Itti, L. and Koch, C. (1999). Learning to detect salient ob-
jects in natural scenes using visual attention. In In
Image Understanding Workshop.
Itti, L., Koch, C., and Niebur, E. (1998). A model of
saliency-based visual attention for rapid scene anal-
ysis. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, 20(11):1254–1259.
Jacob, R. J. and Karn, K. S. (2004). Eye tracking in human-
computer interaction and usability research: Ready to
deliver the promises. In Elsevier Science, Oxford, U.,
editor, The Mind’s Eyes: Cognitive and Applied As-
pects of Eye Movements.
Jing, F., Li, M., jiang Zhang, H., and Zhang, B. (2002).
Learning region weighting from relevance feedback in
image retrieval. In in Image Retrieval, Proc. the 27th
IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP.
Martinet, J., Satoh, S., Chiaramella, Y., and Mulhem, P.
(2008). Media objects for user-centered similarity
matching. Multimedia Tools and Applications, Spe-
cial Issue on Multimedia Semantics.
Nguyen, A., Chandran, V., and Sridharan, S. (2006).
Gaze tracking for region of interest coding in jpeg
2000. Signal Processing: Image Communication,
21(5):359–377.
Osberger, W. and Maeder, A. J. (1998). Automatic identi-
fication of perceptually important regions in an image
using a model of the human visual system. In Interna-
tional Conference on Pattern Recognition, Brisbane,
Australia.
Poole, A., Ball, L. J., and Phillips, P. (2004). In search of
salience: A response-time and eye-movement analysis
of bookmark recognition. In Conference on Human-
Computer Interaction (HCI), pages 19–26.
Salton, G. (1971). The SMART Retrieval System. Prentice
Hall.
Stentiford, F. (2003). An attention based similarity mea-
sure with application to content based information re-
trieval.
Wang, J., J.L., and Wiederhold, G. (2001). SIMPLIcity:
Semantics-sensitive Integrated Matching for picture
LIbraries. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 23(9):947–963.
Wang, J. Z. and Du, Y. (2001). Rf x ipf: A weight-
ing scheme for multimedia information retrieval. In
ICIAP, pages 380–385.
Yarbus, A. L. (1967). Eye Movements and Vision. Plenum
Press, New York.
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
734