curve is the percentage of the image pixels classified,
while the y-axis is the percentage of true fixations that
are classified. We obtained the ROC for each user
in a group and averaged the results within a group.
For the Gaussian blob prediction, we take the accu-
mulated fixation map of users in the group, and fit it
to a circularly-symmetric two-dimensional Gaussian
distribution by matching mean and variance.
Figure 8 shows the prediction performances by the
ROC curves for the both novices and experts. The
dashed curves illustrate the human prediction perfor-
mance while the continuous curves show the Gaus-
sian blob prediction performance. Obviously, the fix-
ation locations of a novice predict those of another
novice much better than one of the experts predicts
another expert. It means the eye fixation of novices
has higher consistency than experts. That is per-
haps because expert photographers put their training
and experience in the appreciation of photographs,
whereas novices tend to look at the obvious in pho-
tographs. From the Gaussian blob prediction result,
we can see the fixation map of novices are more cen-
tral, usually the location of foreground objects. From
the characteristics of the ROC curve, human predic-
tion performs better on novices than experts, and the
same conclusion holds for Gaussian blob prediction.
As mentioned previously, the most salient regions
of the novices usually intersect with experts’ most
salient regions. To examine that effect statistically,
we used each individual user in one group to predict
another user in the other group. The averaged inter-
group prediction ROC curves is shown in Figure 9. It
shows that the prediction of the novice fixation loca-
tion by a master fixation location is much better than
the prediction of a master by a novice. The result may
be understood by noting that the salient region in pho-
tograph for a novice is usually the foreground region,
while the expert considers both foreground and back-
ground regions.
5 CONCLUSIONS
This paper presents two data-driven approaches for
understanding the photographic compositions. The
first is a feature-based method, in which we trained
a GMM model for SIFT features extracted from
monochromic photographs from master photogra-
phers. The similarity of each image pair is mea-
sured by evaluating the gradients of log-likelihood of
the GMM with weighting given by the Fisher matrix.
Then a photographers relationship graph is obtained
by using multi-dimensional scaling. In the second ap-
proach, we used gaze plots measured by eye-tracking
equipment. In that data, the prediction performanceof
both humans and Gaussian blobs are evaluated with
the help of ROC metric. We find that eye fixations
of the novices are much more consistent than those
of expert photographers, and that experts predict the
novices much better than the reverse case. In fu-
ture work, we will examine whether SIFT features
are the best for understanding composition, and study
eye-tracking with “bad” compositions by novices as
judged by experts.
REFERENCES
Bressan, M., Cifarelli, C., and Perronnin, F. (2008). An
analysis of the relationship between painters based on
their work. In ICIP, pages 113–116.
Datta, R., Joshi, D., Li, J., and Wang, J. Z. (2006). Studying
aesthetics in photographic images using a computa-
tional approach. In Proc. ECCV, pages 7–13.
Jaakkola, T. and Haussler, D. (1998). Exploiting generative
models in discriminative classifiers. In In Advances
in Neural Information Processing Systems 11, pages
487–493. MIT Press.
Jones-Smith, K. and Mathur, H. (2006). Fractal analysis:
Revisiting pollock’s drip paintings. Nature, pages E9–
E10.
Judd, T., Ehinger, K., Durand, F., and Torralba, A. (2009).
Learning to predict where humans look. In ICCV.
Kagian, A., Dror, G., Leyvand, T., Cohen-Or, D., and Rup-
pin, E. (2006). A humanlike predictor of facial attrac-
tiveness. In NIPS, pages 649–656.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60:91–110.
Lyu, S., Rockmore, D., and Farid, H. (2004). A digital tech-
nique for art authentication. Proceedings of the Na-
tional Academy of Sciences, 101(49):17006–17010.
Perronnin, F. and Dance, C. R. (2007). Fisher kernels on vi-
sual vocabularies for image categorization. In CVPR.
Taylor, R. P., Micolich, A. P., and Jonas, D. (1999). Fractal
analysis of pollock’s drip paintings. Nature, 399:422.
van der Heijden, F., Duin, R., de Ridder, D., and Tax, D.
M. J. (2004). Classification, Parameter Estimation
and State Estimation. Wiley.
Zakia, R. (2007). Perception and imaging: photography–a
way of seeing. Elsevier Science Ltd.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
430