5.4 Qualitative Results
To show the shape estimation results intuitively, we
present the visual results of landmarks localization
and the 3D car shape with the estimated camera pose
in Fig. 6. The results cover various cases with differ-
ent viewpoints and type of the car. It can be obviously
observed that, the localization results by the proposed
method appear nearly the same to the results of di-
rectly 2D regression method (SDM2D). At the same
time, the 3D shape and camera pose can be seen well
estimated.
6 CONCLUSIONS
In this paper, we have proposed a method for 3D
shape reconstruction and landmarks localization. By
representing the 3D shape as a linear combination of
a set of shape bases, we have proposed a cascaded
framework to regress the global geometry structure
and the object pose. We proposed a new objective
to train the regressors, by minimizing the appearance
and the shape differences at the same time, which can
overcome the ambiguity of the landmarks description
in feature space. Experimental results showed com-
petitive performance on shape and pose estimation
without degenerating the localization performance,
compared with some previous methods.
ACKNOWLEDGEMENTS
This work was supported in part by the National
Basic Research Project of China (973) under Grant
2013CB329006 and in part by National Natural Sci-
ence Foundation of China under Grant 61622110,
Grant 61471220, Grant 91538107.
REFERENCES
Andriluka, M., Roth, S., and Schiele, B. (2009). Pictorial
structures revisited: People detection and articulated
pose estimation. In Proc. CVPR, pages 1014–1021.
IEEE.
Boumal, N., Mishra, B., Absil, P.-A., and Sepulchre, R.
(2014). Manopt, a matlab toolbox for optimization
on manifolds. J. Mach. Learn. Res., 15:1455–1459.
Cao, X., Wei, Y., Wen, F., and Sun, J. (2014). Face align-
ment by explicit shape regression. Int. J. Comput. Vis.,
107(2):177–190.
Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library
for support vector machines. ACM Trans. Intell. Syst.
Technol., 2:27:1–27:27.
Cootes, T. F., Edwards, G. J., and Taylor, C. J. (2001). Ac-
tive appearance models. IEEE Trans. Pattern Anal.
Mach. Intell., 23(6):681–685.
Cristinacce, D. and Cootes, T. F. (2007). Boosted regression
active shape models. In BMVC, volume 1, page 7.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gra-
dients for human detection. In Proc. CVPR, volume 1,
pages 886–893. IEEE.
Doll
´
ar, P., Welinder, P., and Perona, P. (2010). Cascaded
pose regression. In Proc. CVPR, pages 1078–1085.
IEEE.
Dryden, I. and Mardia, K. (1998). Statistical analysis of
shape. John Wiley & Sons.
Eld
´
en, L. and Park, H. (1999). A procrustes problem on the
stiefel manifold. Numer. Math., 82(4):599–619.
Engø, K. (2001). On the bch-formula in so(3). BIT Numer-
ical Mathematics, 41(3):629–632.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and
Ramanan, D. (2010). Object detection with discrimi-
natively trained part-based models. IEEE Trans. Pat-
tern Anal. Mach. Intell., 32(9):1627–1645.
Ferrari, C., Lisanti, G., Berretti, S., and Del Bimbo, A.
(2017). A dictionary learning based 3d morphable
shape model. IEEE Trans. Multimedia.
Guo, Y., Sohel, F., Bennamoun, M., Wan, J., and Lu, M.
(2014). An accurate and robust range image registra-
tion algorithm for 3d object modeling. IEEE Trans.
Multimedia, 16(5):1377–1390.
Hartley, R. and Zisserman, A. (2003). Multiple view geom-
etry in computer vision. Cambridge university press.
Hejrati, M. and Ramanan, D. (2012). Analyzing 3d objects
in cluttered images. In NIPS, pages 593–601.
Ho, C.-H. and Lin, C.-J. (2012). Large-scale linear support
vector regression. J. Mach. Learn. Res., 13(1):3323–
3348.
Kazemi, V. and Sullivan, J. (2014). One millisecond face
alignment with an ensemble of regression trees. In
Proc. CVPR, pages 1867–1874.
Leotta, M. J. and Mundy, J. L. (2011). Vehicle surveillance
with a generic, adaptive, 3d vehicle model. IEEE
Trans. Pattern Anal. Mach. Intell., 33(7):1457–1469.
Li, Y., Gu, L., and Kanade, T. (2011). Robustly aligning
a shape model and its application to car alignment of
unknown pose. IEEE Trans. Pattern Anal. Mach. In-
tell., 33(9):1860–1876.
Lin, Y.-L., Morariu, V. I., Hsu, W., and Davis, L. S.
(2014). Jointly optimizing 3d model fitting and fine-
grained classification. In Proc. ECCV, pages 466–480.
Springer.
Marsden, J. E. and Ratiu, T. (1999). Introduction to me-
chanics and symmetry: a basic exposition of classical
mechanical systems. Springer-Verlag.
Miao, Y., Tao, X., and Lu, J. (2016). Robust monocular 3d
car shape estimation from 2d landmarks. IEEE Trans.
Circuits Syst. Video Technol.
Nair, P. and Cavallaro, A. (2009). 3-d face detection, land-
mark localization, and registration using a point dis-
tribution model. IEEE Trans. Multimedia, 11(4):611–
623.
Joint Monocular 3D Car Shape Estimation and Landmark Localization via Cascaded Regression
231