(a) (b)
(c) (d)
(e) (f)
Figure 11: Outputs of the segmentation networks (left col-
umn) and corresponding estimated poses (right column) in
the scene clouds.
deployed on a mobile platform. We show that the pose
of the object can be estimated using the centroid of the
presegmented cloud as an initial pose guess followed
by ICP, thereby not having to rely on a more sophis-
ticated pose estimation algorithm. We show that the
accuracy of the resulting pose estimation is within the
constraints of the use case.
REFERENCES
Besl, P. J. and McKay, N. D. (1992). A method for regis-
tration of 3-d shapes. IEEE Transactions on Pattern
Analysis and Machine Intelligence.
Do, T.-T., Pham, T., Cai, M., and Reid, I. D. (2018). Lienet:
Real-time monocular object instance 6d pose estima-
tion. In BMVC.
Dwibedi, D., Misra, I., and Hebert, M. (2017). Cut, paste
and learn: Surprisingly easy synthesis for instance de-
tection. CoRR, abs/1708.01642.
Fischler, M. A. and Bolles, R. C. (1981). Random sample
consensus: A paradigm for model fitting with appli-
cations to image analysis and automated cartography.
Commun. ACM.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2014). Generative adversarial networks. Ad-
vances in Neural Information Processing Systems.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,
Wang, W., Weyand, T., Andreetto, M., and Adam,
H. (2017). Mobilenets: Efficient convolutional neu-
ral networks for mobile vision applications. CoRR,
abs/1704.04861.
Johnson-Roberson, M., Barto, C., Mehta, R., Nittur Sridhar,
S., and Vasudevan, R. (2016). Driving in the matrix:
Can virtual worlds replace human-generated annota-
tions for real world tasks? CoRR, abs/1610.01983.
Juel, W. K., Haarslev, F., Ram
´
ırez, E. R., Marchetti, E., Fis-
cher, K., Shaikh, D., Manoonpong, P., Hauch, C., Bo-
denhagen, L., and Kr
¨
uger, N. (2019). Smooth robot:
Design for a novel modular welfare robot. Journal of
Intelligent & Robotic Systems.
Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick,
R. B., Hays, J., Perona, P., Ramanan, D., Doll
´
ar, P.,
and Zitnick, C. L. (2014). Microsoft COCO: common
objects in context. CoRR, abs/1405.0312.
P
´
erez, P., Gangnet, M., and Blake, A. (2003). Poisson im-
age editing. ACM Trans. Graph.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-
stein, M., Berg, A. C., and Fei-Fei, L. (2015). Ima-
genet large scale visual recognition challenge. Inter-
national Journal of Computer Vision.
Russell, B. C., Torralba, A., Murphy, K. P., and Freeman,
W. T. (2008). Labelme: A database and web-based
tool for image annotation. Int. Journal of Computer
Vision.
Stein, G. J. and Roy, N. (2018). Genesis-rt: Generating syn-
thetic images for training secondary real-world tasks.
Int. Conf. on Robotics and Automation (ICRA).
von Ahn, L. and Dabbish, L. (2004). Labeling images with
a computer game. Proceedings of the SIGCHI Con-
ference on Human Factors in Computing Systems.
Wong, J. M., Kee, V., Le, T., Wagner, S., Mariottini, G.,
Schneider, A., Hamilton, L., Chipalkatty, R., Hebert,
M., Johnson, D. M. S., Wu, J., Zhou, B., and Torralba,
A. (2017). Segicp: Integrated deep semantic segmen-
tation and pose estimation. In Int. Conf, on Intelligent
Robots and Systems (IROS).
Xiang, Y., Schmidt, T., Narayanan, V., and Fox, D. (2017).
Posecnn: A convolutional neural network for 6d
object pose estimation in cluttered scenes. CoRR,
abs/1711.00199.
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang,
N. (2018). Bisenet: Bilateral segmentation net-
work for real-time semantic segmentation. CoRR,
abs/1808.00897.
Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017).
Unpaired image-to-image translation using cycle-
consistent adversarial networks. Int. Conf. on Com-
puter Vision (ICCV).
Synthetic Ground Truth for Presegmentation of Known Objects for Effortless Pose Estimation
489