(a) view 6 (b) view 7 (c) view 8
(d) (e)
Figure 2: "Uli" images and difference images: (a)~(c): images from view 6, 7 and 8 respectively; (d)~(e): the difference
images between practical views and synthesized views using estimated global depth, (d) for view 6 and (e) for view 8.
For simplicity, we only use view 6, 7 and 8 in
the experiment, as shown in Fig.2. With the given
camera parameters, the initial depth value is obtained
by computing the convergent point through solving
linear equations. The computed initial depth for view
7 is 3030.7mm, and the ground truth value of the
bright reflection on the glasses of left eye is
3076.2mm, which is computed from the provided
scene point M
W
=[35.07, 433.93, -1189.78] in world
coordinates. Since the depth change of the "Uli"
scene is not large, the estimated initial depth is
reasonable.
After get the initial depth, we can set appropriate
search range and step size which makes the global
depth estimated efficiently and quickly with less
computation. In the experiment, the search range is
set
%20± of the initial depth, and the step size is set
1% of the initial depth. When the projected pixel in
the synthesized view does not fall on the integer-grid
point, it is rounded to the nearest integer pixel. If the
projected pixel is out of image, the image value for
the pixel is set as that of the nearest image edge
point. And the difference between the synthesized
view and practical view is summed over the whole
image using all the RGB components. The final
estimated global depth for view 7 is 3151.9mm.
To intuitively show the estimation performance
of the global depth, we give out the difference image
between the practical view and the synthesized view
based on the estimated global depth. From Fig. 2 (d)
(e), we see that the differences in most areas are
around zero (black areas), which show the estimated
global depth is reasonable and right.
REFERENCES
Smolic, A. and Mueller, K. et al., 2007. Multi-View Video
plus Depth (MVD) Format for Advanced 3D Video
Systems. ISO/IEC JTC1/SC29/WG11,Doc.JVT-W100.
San Jose, USA.
Scharstein, D. and Szeliski, R., 2002. A Taxonomy and
Evaluation of Dense Two-Frame Stereo Correspon-
dence Problem. International Journal of Computer
Vision 47(1/2/3): 7-42.
Okutomi, M. and Kanade, K., 1993. A Multiple-Baseline
Stereo. IEEE Transaction on Pattern Analysis and
Machine Intelligence 15 (4): 353– 363.
Kauff, P. and Atzpadin, N. et al., 2007. Depth Map
Creation and Image-Based Rendering for Advanced
3DTV Services Providing Interoperability and
Scalability. Signal Processing: Image Communication,
Volume 22, Issue 2, Special Issue on 3D Video and
TV, pp. 217-234.
Zitnick, C.L., Kang, S.B., Uyttendaele, M. Winder, S. and
Szeliski, R., 2004. High-quality Video View
Interpolation Using a Layered Representation. In
Proceedings of the ACM SIGGRAPH, Los Angeles,
CA, USA, pp. 600–608.
Ho, Y. S., Oh, K. J., et al., 2006. Global Disparity
Compensation for Multi-view Video Coding. ISO/IEC
JTC1/SC29/WG11,Doc.JVT-T136, Klagenfurt,
Austria.
Vetro, A., 2007. Summary of BoG Discussion on View
Interpolation Prediction. ISO/IEC JTC1/SC29/ WG11,
Doc. JVT-W133
,
San Jose, USA.
Yea, S. and Vetro, A., 2007. Report of CE6 on View
Synthesis Prediction. ISO/IEC JTC1/SC29/WG11
,
Doc. JVT-W059, San Jose, USA.
VISAPP 2008 - International Conference on Computer Vision Theory and Applications
634