100 200 300
−100
0
100
200
300
400
(a)
100 200 300
0
100
200
300
400
500
(b)
Figure 6: Two selected examples of winning model instan-
tiation for the second version of the experiment (occlusion
by random noise), the colour interpretation is the same as in
the Figure 5.
all images that were correctly labelled were also the
models correctly located, achieving 86.7% accuracy
without any fine-tuning of the parameters. The com-
plete confusion matrix can be found in the Table 2 and
two illustrative results are shown in the Figure 6.
7 CONCLUSIONS
In this paper, a new hierarchical probabilistic model
for modelling object appearance is introduced and an
efficient method for its inference is described. Main
and novel features of this approach are the follow-
ing: first, the model is acyclic by definition, which
allows for precise computation of probabilities using
a very simple version of
Belief propagation
. Second,
thanks to the partitioning of layers it is easy to com-
pute the probabilities of the model conditioned on the
observed data, which is very useful for
Maximum-
likelihood
learning of parameters. This partitioning
also prevents us from the combinatorial explosion in
the
Bottom-up
generation of hypotheses and allows
for parallel processing. The model and the method is
also robust to occlusions.
The experiments on a controlled dataset of images
of letters with a hand-made hierarchical model show
that the proposed approach is generally usable for vi-
sual data. The dataset is characterized by small ro-
tations and scale changes as well as occlusions (non-
presence or replacing by random noise) of 25% of the
image, none of which causes significant difficulties to
the inference algorithm.
To briefly outline the future directions, the frame-
work is planned to be used in a completely unsuper-
vised structural learning method and is going to be
compared to the state-of-the-art methods in structural
learning and recognition.
REFERENCES
Bienenstock, E., Geman, S., and Potter, D. (1996). Com-
positionality, MDL priors, and object recognition. In
Mozer, M., Jordan, M. I., and Petsche, T., editors,
NIPS, pages 838–844. MIT Press.
Bishop, C. M. (2006). Pattern Recognition and Machine
Learning. Springer Science+Bussiness Media, New
York, NY.
Felzenszwalb, P. and Huttenlocher, D. (2000). Efficient
matching of pictorial structures. In Computer Vision
and Pattern Recognition, 2000. Proceedings. IEEE
Conference on, volume 2, pages 66 –73 vol.2.
Fidler, S., Berginc, G., and Leonardis, A. (2006). Hier-
archical statistical learning of generic parts of object
structure. In Proc. CVPR, pages 182–189.
Fidler, S. and Leonardis, A. (2007). Towards scalable repre-
sentations of object categories: Learning a hierarchy
of parts. In Proc. CVPR.
Karp, R. M. (1972). Reducibility among combinatorial
problems. In Miller, R. E. and Thatcher, J. W., editors,
Complexity of Computer Computations, The IBM Re-
search Symposia Series, pages 85–103. Plenum Press,
New York.
Kokkinos, I. and Yuille, A. (2011). Inference and learning
with hierarchical shape models. International Journal
of Computer Vision, 93:201–225. 10.1007/s11263-
010-0398-7.
Mooij, J. and Kappen, H. (2007). Sufficient conditions
for convergence of the sum-product algorithm. Infor-
mation Theory, IEEE Transactions on, 53(12):4422–
4437.
Ommer, B. and Buhmann, J. (2010). Learning the compo-
sitional nature of visual object categories for recog-
nition. IEEE Trans. Pattern Anal. Mach. Intell.,
32(3):501 –516.
Pearl, J. (1988). Probabilistic reasoning in intelligent sys-
tems: networks of plausible inference. Morgan Kauf-
mann Publishers Inc., San Francisco, CA, USA.
Tsotsos, J. K. (1990). Analyzing vision at the complex-
ity level. Behavioral and Brain Sciences, 13(03):423–
445.
Ullman, S. (2007). Object recognition and segmentation
by a fragment-based hierarchy. Trends in Cognitive
Sciences, 11(2):58 – 64.
Weiss, Y. (2000). Correctness of Local Probability Propaga-
tion in Graphical Models with Loops. Neural Comp.,
12(1):1–41.
Zhu, L., Chen, Y., and Yuille, A. L. (2010). Learning a hi-
erarchical deformable template for rapid deformable
object parsing. IEEE Trans. Pattern Anal. Mach. In-
tell., 32(6):1029–1043.
Zhu, L. L., Chen, Y., and Yuille, A. (2011). Recursive com-
positional models for vision: Description and review
of recent work. J. Math. Imaging Vis., 41(1-2):122–
146.
Zhu, S.-C. and Mumford, D. (2006). A stochastic gram-
mar of images. Found. Trends. Comput. Graph. Vis.,
2:259–362.
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
506