Figure 6: Log-likelihoods in case of an individual from the
face data. A constant has been added along y-axis so that the
worst performing structure receives a score close to zero.
The best performing structure is marked by an asterisk.
Figure 7: Solid line represents the chain structure learned on
an individual of a dataset of faces. Each node is represented
by one face in the node. The dashed line correspond to the
extra edges which the meta structure creates.
entation of faces. We use the Sheffield (previously
UMIST) Face Database (Graham and Allinson, 1998)
which consists of 564 images of 20 individuals. The
range of poses vary from from profile to frontal views.
We discover that the chain structure is the most prob-
able, as indicated (for an individual) in figure 6.
The chain structure gives now a perfect solution in
organizing faces according to their orientation (figure
7). It is also remarkable that the meta structure creates
exactly the same clustering as the chain does and quite
similar likelihood too, only few extra edges have been
added to otherwise pure chain structure. However, we
can see the capability of the meta structure to adapt to
the natural structure of the data.
Another thing this example demonstrates (figure
6) is that tree-shaped structures cannot reveal the nat-
ural organization of face orientations. This concerns
not only the structures presented in this paper but
likely all the hierarchical organizations that exist.
5 CONCLUSIONS
We have presented a generic, unsupervised way to
find the structure to describe image data. Previous
methods in image categorization are able to create
only an instance of a single, predefined form, usu-
ally tree form. Kemp’s algorithm used in this paper
defines a more generic view of finding the underlying
structure in data. We have suggested how to apply the
algorithm for visual objects and shown how this might
help to find the more natural organization of a set of
unlabeled images. In addition, we proposed our pro-
totype for the most generic structure, meta structure.
This creates graphs which can capture the relations in
data even more accurately and can adapt to any un-
derlying structure. The categorization or classifica-
tion results are competitive with topic discovery mod-
els (LDA, hLDA). Moreover, the way we can present
image categories and the relations between categories
seems to be more natural and definitely more flexible
than in the state-of-the-art methods.
REFERENCES
Ahuja, N. and Todorovic, S. (2007). Learning the taxonomy
and models of categories present in arbitrary images.
In Proc. ICCV.
Bart, E., Porteous, I., Perona, P., and Welling, M. (2008).
Unsupervised learning of visual taxonomies. In Proc.
ICPR.
Blei, D., Ng, A., and Jordan, M. (2003). Latent dirichlet
allocation. Journal of Machine Learning Research,
3:993–1022.
Bosch, A., Zisserman, A., and Muoz, X. (2006). Scene
classification via plsa. In Proc. ECCV.
Graham, D. and Allinson, N. (1998). Characterizing vir-
tual eigensignatures for general purpose face recogni-
tion. Face Recognition: From Theory to Applications,
NATO ASI Series F, Computer and Systems Sciences,
163:446–456.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The
Elements of Statistical Learning. Springer, New York.
Hoffmann, T. (2001). Unsupervised learning by proba-
bilistic latent semantic analysis. Machine Learning,
42(1):177–196.
Kemp, C. and Tenenbaum, J. (2008). The dis-
covery of structural form. In Proceed-
ings of the National Academy of Sciences.
http://www.psy.cmu.edu/
˜
ckemp/.
Lowe, D. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60(2):91–110.
Marszalek, M. and Schmid, C. (2008). Constructing cat-
egory hierarchies for visual recognition. In Proc.
ECCV.
Parikh, D. and Chen, T. (2007). Unsupervised learning
of hierarchical semantics of objects (hSOs). In Proc.
CVPR.
Shi, J. and Malik, J. (2002). Normalized cuts and image
segmentation. IEEE TPAMI, 22(8):888–905.
Sivic, J., Russell, B., Efros, A., Zisserman, A., and Free-
man, W. (2005). Discovering object categories in im-
age collections. In Proc. ICCV.
Sivic, J., Russell, B., Zisserman, A., Freeman, W., and
Efros, A. (2008). Unsupervised discovery of visual
object class hierarchies. In Proc. CVPR.
van de Sande, K., Gevers, T., and Snoek, C. (2010). Eval-
uating color descriptors for object and scene recogni-
tion. IEEE TPAMI, (in press).
Winn, J., Criminisi, A., and Minka, T. (2005). Object cat-
egorization by learned universal visual dictionary. In
Proc. ICCV.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
350