Monocular Depth Ordering using Perceptual Occlusion Cues

Babak Rezaeirowshan, Coloma Ballester, Gloria Haro


In this paper we propose a method to estimate a global depth order between the objects of a scene using information from a single image coming from an uncalibrated camera. The method we present stems from early vision cues such as occlusion and convexity and uses them to infer both a local and a global depth order. Monocular occlusion cues, namely, T-junctions and convexities, contain information suggesting a local depth order between neighbouring objects. A combination of these cues is more suitable, because, while information conveyed by T-junctions is perceptually stronger, they are not as prevalent as convexity cues in natural images. We propose a novel convexity detector that also establishes a local depth order. The partial order is extracted in T-junctions by using a curvature-based multi-scale feature. Finally, a global depth order, i.e., a full order of all shapes that is as consistent as possible with the computed partial orders that can tolerate conflicting partial orders is computed. An integration scheme based on a Markov chain approximation of the rank aggregation problem is used for this purpose. The experiments conducted show that the proposed method compares favorably with the state of the art.


  1. Basha, T., Moses, Y., and Avidan, S. (2012). Photo sequencing. In Computer Vision-ECCV 2012, pages 654-667. Springer.
  2. Burge, J., Fowlkes, C., and Banks, M. (2010). Naturalscene statistics predict how the figure-ground cue of convexity affects human depth perception. The Journal of Neuroscience, 30(21):7269-7280.
  3. Calderero, F. and Caselles, V. (2013). Recovering Relative Depth from Low-Level Features Without Explicit Tjunction Detection and Interpretation. International Journal of Computer Vision, 104:38-68.
  4. maps and local contrast changes in natural images. International Journal of Computer Vision, 33(1):5-27.
  5. Chen, X., Li, Q., Zhao, D., and Zhao, Q. (2013). Occlusion cues for image scene layering. Computer Vision and Image Understanding, 117(1):42-55.
  6. Dimiccoli, M., Morel, J.-M., and Salembier, P. (2008). Monocular depth by nonlinear diffusion. In Computer Vision, Graphics & Image Processing, 2008. ICVGIP'08. Sixth Indian Conference on, pages 95- 102. IEEE.
  7. Dimiccoli, M. and Salembier, P. (2009a). Exploiting t-junctions for depth segregation in single images. Acoustics, Speech and Signal . . . , pages 1229-1232.
  8. Dimiccoli, M. and Salembier, P. (2009b). region-based representation for segmentation and filtering with depth in single images. Image Processing (ICIP), 2009 16th, 1:3533-3536.
  9. Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. (2001). Rank aggregation methods for the web. In Proceedings of the 10th international conference on World Wide Web, pages 613-622. ACM.
  10. Eigen, D., Puhrsch, C., and Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems, pages 2366-2374.
  11. Esedoglu, S. and March, R. (2003). Segmentation with depth but without detecting junctions. Journal of Mathematical Imaging and Vision, 18(1):7-15.
  12. Gao, R., Wu, T., Zhu, S., and Sang, N. (2007). Bayesian inference for layer representation with mixed markov random field. Energy Minimization Methods in Computer Vision and Pattern Recognition, 4679:213-224.
  13. Guzmán, A. (1968). Decomposition of a visual scene into three-dimensional bodies. In Proceedings of the December 9-11, 1968, fall joint computer conference, part I, pages 291-304. ACM.
  14. Hoiem, D., Efros, A., and Hebert, M. (2011). Recovering occlusion boundaries from an image. International Journal of Computer Vision, 91(3):328-346.
  15. Jia, Z., Gallagher, A., Chang, Y., and Chen, T. (2012). A learning-based framework for depth ordering. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 294-301. IEEE.
  16. Kanizsa, G. (1979). Organization in vision: essays on Gestalt perception. NY, Praeger.
  17. Liu, B., Gould, S., and Koller, D. (2010). Single image depth estimation from predicted semantic labels. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 1253-1260. IEEE.
  18. Malik, J. (1987). Interpreting line drawings of curved objects. International Journal of Computer Vision, 73403.
  19. Marr, D. (1982). Vision: A computational approach. San Francisco: Free-man & Co.
  20. Martin, D., Fowlkes, C., Tal, D., and Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int'l Conf. Computer Vision, volume 2, pages 416-423.
  21. Matheron, G. (1968). Modèle séquentiel de partition aléatoire. Technical report, CMM.
  22. McDermott, J. (2004). Psychophysics with junctions in real images. Perception, 33(9):1101-1127.
  23. Nitzberg, M. and Mumford, D. (1990). The 2.1-d sketch. In Computer Vision, 1990. Proceedings, Third International Conference on, pages 138-144. IEEE.
  24. Nitzberg, M., Mumford, D., and Shiota, T. (1993). Filtering, segmentation, and depth, volume 662. Lecture notes in computer science, Springer.
  25. Palou, G. and Salembier, P. (2011). Occlusion-based depth ordering on monocular images with binary partition tree. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 1093-1096. IEEE.
  26. Palou, G. and Salembier, P. (2013). Monocular depth ordering using t-junctions and convexity occlusion cues. IEEE Transactions on Image Processing, 22(5):1926- 1939.
  27. Pao, H., Geiger, D., and Rubin, N. (1999). Measuring convexity for figure/ground separation. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, volume 2, pages 948-955. IEEE.
  28. Rubin, N. (2001). Figure and ground in the brain. Nature Neuroscience, 4:857-858.
  29. Santner, J., Pock, T., and Bischof, H. (2010). Interactive multi-label segmentation. In Proceedings 10th Asian Conference on Computer Vision (ACCV), Queenstown, New Zealand.
  30. Saxena, A., Chung, S. H., and Ng, A. Y. (2008). 3-d depth reconstruction from a single still image. International journal of computer vision, 76(1):53-69.
  31. Serra, J. (1986). Introduction to mathematical morphology, volume 35(3). Elsevier.
  32. Zeng, Q., Chen, W., Wang, H., Tu, C., Cohen-Or, D., Lischinski, D., and Chen, B. (2015). Hallucinating stereoscopy from a single image. In Computer Graphics Forum, volume 34, pages 1-12. Wiley Online Library.

Paper Citation

in Harvard Style

Rezaeirowshan B., Ballester C. and Haro G. (2016). Monocular Depth Ordering using Perceptual Occlusion Cues . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 431-441. DOI: 10.5220/0005726404310441

in Bibtex Style

author={Babak Rezaeirowshan and Coloma Ballester and Gloria Haro},
title={Monocular Depth Ordering using Perceptual Occlusion Cues},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},

in EndNote Style

JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Monocular Depth Ordering using Perceptual Occlusion Cues
SN - 978-989-758-175-5
AU - Rezaeirowshan B.
AU - Ballester C.
AU - Haro G.
PY - 2016
SP - 431
EP - 441
DO - 10.5220/0005726404310441