Laura Antanas, Martijn van Otterlo, José Oramas, Tinne Tuytelaars, Luc De Raedt


Understanding images in terms of hierarchical and logical structures is crucial for many semantic tasks, including image retrieval, scene understanding and robot vision. This paper combines compositional hierarchies, qualitative spatial relations, relational instance-based learning and robust feature extraction in one framework. For each layer in the hierarchy, substructures in the images are detected, classified and then employed one layer up the hierarchy to obtain higher-level semantic structures, by making use of qualitative spatial relations. The approach is applied to street view images. We employ a four-layer hierarchy in which subsequently corners, windows and doors, and individual houses are detected.


  1. Bar-Hillel, A. and Weinshall, D. (2008). Efficient learning of relational object class models. IJCV, 77(1-3):175- 198.
  2. Busygin, S. (2006). A new trust region technique for the maximum weight clique problem. Discrete Appl. Math., 154(15):2080-2096.
  3. Caetano, T. S., McAuley, J. J., Cheng, L., Le, Q. V., and Smola, A. J. (2009). Learning graph matching. TPAMI, 31(6):1048-1058.
  4. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR, pages 886- 893.
  5. De Raedt, L. (2008). Logical and Relational Learning. Springer.
  6. De Raedt, L. and Ramon, J. (2009). Deriving distance metrics from generality relations. Pattern Recognition Letters, 30(3):187-191.
  7. Deselaers, T. and Ferrari, V. (2010). Global and efficient self-similarity for object classification and detection. In CVPR, pages 1633-1640.
  8. Dubba, K. S. R., Cohn, A. G., and Hogg, D. C. (2010). Event model learning from complex videos using ILP. In ECAI, pages 93-98.
  9. Esposito, F., Malerba, D., and Semeraro, G. (1992). Classification in noisy environments using a distance measure between structural symbolic descriptions. PAMI, 14(3):390-402.
  10. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. (2008). The PASCAL Visual Object Classes Challenge 2008.
  11. Felzenszwalb, P., Girshick, R., McAllester, D., and Ramanan, D. (2010). Object detection with discriminatively trained part-based models. TPAMI, 32(9):1627 -1645.
  12. Fergus, R., Perona, P., and Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. IJCV, 71(3):273-303.
  13. Ferilli, S., Mauro, N. D., Basile, T. M. A., and Esposito, F. (2003). A complete subsumption algorithm. In AI*IA 2003, pages 23-26.
  14. Ferrari, V., Fevrier, L., Jurie, F., , and Schmid, C. (2008). Groups of adjacent contour segments for object detection. TPAMI, pages 36-51.
  15. Getoor, L., Koller, D., Taskar, B., and Friedman, N. (2000). Learning probabilistic relational models with structural uncertainty. In Proceedings of the ICML-2000 Workshop on Attribute-Value and Relational Learning:Crossing the Boundaries, pages 13-20.
  16. Hanson, A. and Riseman, E. (1978). Visions: A computer system for interpreting scenes. In CVS', pages 303- 333.
  17. Harchaoui, Z. and Bach, F. (2007). Image classification with segmentation graph kernels. In CVPR, pages 1- 8.
  18. Hartz, J. (2009). Learning probabilistic structure graphs for classification and detection of object structures. In ICMLA 7809, pages 5-11.
  19. Hartz, J. and Neumann, B. (2007). Learning a knowledge base of ontological concepts for high-level scene interpretation. In ICMLA, pages 436-443.
  20. Horváth, T., Wrobel, S., and Bohnebeck, U. (2001). Relational instance-based learning with lists and terms. ML, 43(1/2):53-80.
  21. Kirsten, M., Wrobel, S., and Horváth, T. (2000). Distance based approaches to relational learning and clustering. Relational Data Mining, pages 213-230.
  22. Koutsourakis, P., Simon, L., Teboul, O., Tziritas, G., and Paragios, N. (2009). Single view reconstruction using shape grammars for urban environments. In ICCV, pages 1795-1802.
  23. Li, L.-J., Socher, R., and Fei-Fei, L. (2009). Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. CVPR, 0:2036-2043.
  24. Lippow, M. A., Kaelbling, L. P., and Lozano-Perez, T. (2008). Learning grammatical models for object recognition. In Technical Report.
  25. Lozin, V. and Milanic, M. (2010). On the maximum independent set problem in subclasses of planar graphs. Journal of Graph Algorithms and Applications, 14:269-286.
  26. Muggleton, S. and Buntine, W. L. (1988). Machine invention of first order predicates by inverting resolution. In ML, pages 339-352.
  27. Müller, P., Zeng, G., Wonka, P., and Van Gool, L. J. (2007). Image-based procedural modeling of facades. ACM Transactions on Graphics, 26(3):85.
  28. Nienhuys-Cheng, S.-H. (1997). Distance between herbrand interpretations: A measure for approximations to a target concept. In ILP, pages 213-226.
  29. O sterga°rd, P. R. J. (2002). A fast algorithm for the maximum clique problem. Discrete Appl. Math., 120:197- 207.
  30. Petrou, M. (2008). The tower of knowledge: a novel architecture for organising knowledge combining logic and probability. In Logic and Probability for Scene Interpretation, Dagstuhl Seminar Proceedings.
  31. Pinz, A. J., Bischof, H., Kropatsch, W. G., Schweighofer, G., Haxhimusa, Y., Opelt, A., and Ion, A. (2009). Representations for cognitive vision: A review of appearance-based, spatio-temporal, and graph-based approaches. Electronic Letters on Computer Vision and Image Analysis, 7(2):35-61.
  32. Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. (2008). LabelMe: A database and web-based tool for image annotation. IJCV, 77(1-3):157-173.
  33. Sudderth, E. B., Torralba, A., Freeman, W. T., and Willsky, A. S. (2008). Describing visual scenes using transformed objects and parts. IJCV, 77(1-3):291-330.
  34. Szeliski, R. (2010). Computer Vision: Algorithms and Applications. Springer.
  35. Torralba, A., Murphy, K. P., and Freeman, W. T. (2004). Sharing features: Efficient boosting procedures for multiclass object detection. In CVPR, pages 762-769.
  36. Tuytelaars, T. and Mikolajczyk, K. (2007). Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision, 3(3):177-280.
  37. Zhao, P., Fang, T., Xiao, J., Zhang, H., Zhao, Q., and Quan, L. (2010). Rectilinear parsing of architecture in urban environment. In CVPR, pages 342-349.
  38. Zhu, S.-C. and Mumford, D. (2006). A stochastic grammar of images. Found. Trends. Comput. Graph. Vis., 2(4):259-362.

Paper Citation

in Harvard Style

Antanas L., van Otterlo M., Oramas J., Tuytelaars T. and De Raedt L. (2012). A RELATIONAL DISTANCE-BASED FRAMEWORK FOR HIERARCHICAL IMAGE UNDERSTANDING . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM, ISBN 978-989-8425-99-7, pages 206-218. DOI: 10.5220/0003779702060218

in Bibtex Style

author={Laura Antanas and Martijn van Otterlo and José Oramas and Tinne Tuytelaars and Luc De Raedt},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,},

in EndNote Style

JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,
SN - 978-989-8425-99-7
AU - Antanas L.
AU - van Otterlo M.
AU - Oramas J.
AU - Tuytelaars T.
AU - De Raedt L.
PY - 2012
SP - 206
EP - 218
DO - 10.5220/0003779702060218