3D model of an object is not necessary when generat-
ing synthetic training images, as long as a set of vary-
ing 3D models are used. Validation on the Pascal3D
dataset has shown that this method also generalizes to
domains other than tools.
ACKNOWLEDGEMENTS
This study was supported by the Special Research
Fund (BOF) of Hasselt University. The mandate ID
is BOF20OWB24. Research was done in alignment
with Flanders Make’s PILS and FAMAR projects.
REFERENCES
Bay, H., Tuytelaars, T., and Van Gool, L. (2006). Surf:
Speeded up robust features. In Leonardis, A., Bischof,
H., and Pinz, A., editors, Computer Vision – ECCV
2006, pages 404–417, Berlin, Heidelberg. Springer
Berlin Heidelberg.
Burt, P. J. and Adelson, E. H. (1983). A multiresolution
spline with application to image mosaics. ACM Trans.
Graph., 2(4):217–236.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE conference on com-
puter vision and pattern recognition, pages 248–255.
Ieee.
Dwibedi, D., Misra, I., and Hebert, M. (2017). Cut, paste
and learn: Surprisingly easy synthesis for instance de-
tection. pages 1310–1319.
Fischler, M. A. and Bolles, R. C. (1981). Random sample
consensus: A paradigm for model fitting with appli-
cations to image analysis and automated cartography.
Commun. ACM, 24(6):381–395.
Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T.-Y., Cubuk,
E. D., Le, Q. V., and Zoph, B. (2020). Simple copy-
paste is a strong data augmentation method for in-
stance segmentation.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep resid-
ual learning for image recognition.
Hinterstoisser, S., Lepetit, V., Wohlhart, P., and Kono-
lige, K. (2017). On pre-trained image features
and synthetic images for deep learning. CoRR,
abs/1710.10710.
Hoda
ˇ
n, T., Vineet, V., Gal, R., Shalev, E., Hanzelka, J.,
Connell, T., Urbina, P., Sinha, S. N., and Guenter, B.
(2019). Photorealistic image synthesis for object in-
stance detection. In 2019 IEEE International Confer-
ence on Image Processing (ICIP), pages 66–70.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Ac-
celerating deep network training by reducing internal
covariate shift. CoRR, abs/1502.03167.
Kingma, D. P. and Ba, J. (2017). Adam: A method for
stochastic optimization.
Law, H. and Deng, J. (2020). Cornernet: Detecting objects
as paired keypoints. International Journal of Com-
puter Vision, 128.
Long, J., Zhang, N., and Darrell, T. (2014). Do convnets
learn correspondence? CoRR, abs/1411.1091.
Lowe, D. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60:91–.
Newell, A., Yang, K., and Deng, J. (2016). Stacked hour-
glass networks for human pose estimation. In Leibe,
B., Matas, J., Sebe, N., and Welling, M., editors, Com-
puter Vision – ECCV 2016, pages 483–499, Cham.
Springer International Publishing.
Pavlakos, G., Zhou, X., Chan, A., Derpanis, K. G., and
Daniilidis, K. (2017). 6-dof object pose from semantic
keypoints. In 2017 IEEE International Conference on
Robotics and Automation (ICRA), pages 2011–2018.
Peng, S., Liu, Y., Huang, Q., Zhou, X., and Bao, H. (2019).
Pvnet: Pixel-wise voting network for 6dof pose esti-
mation. pages 4556–4565.
P
´
erez, P., Gangnet, M., and Blake, A. (2003). Poisson im-
age editing. ACM Trans. Graph., 22(3):313–318.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I.,
and Salakhutdinov, R. (2014). Dropout: A simple way
to prevent neural networks from overfitting. Journal
of Machine Learning Research, 15(56):1929–1958.
Toshev, A. and Szegedy, C. (2014). Deeppose: Human pose
estimation via deep neural networks. In 2014 IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 1653–1660.
Tulsiani, S. and Malik, J. (2014). Viewpoints and keypoints.
CoRR, abs/1411.6067.
Wei, S., Ramakrishna, V., Kanade, T., and Sheikh, Y.
(2016). Convolutional pose machines. In 2016 IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 4724–4732.
Xiang, Y., Mottaghi, R., and Savarese, S. (2014). Beyond
pascal: A benchmark for 3d object detection in the
wild. In IEEE Winter Conference on Applications of
Computer Vision (WACV).
Xiao, B., Wu, H., and Wei, Y. (2018). Simple baselines
for human pose estimation and tracking. In Ferrari,
V., Hebert, M., Sminchisescu, C., and Weiss, Y., edi-
tors, Computer Vision – ECCV 2018, pages 472–487,
Cham. Springer International Publishing.
yuanyuanli85 (2018). Stacked
hourglass network keras.
https://github.com/yuanyuanli85/Stacked Hourglass
Network Keras.
Zhou, X., Karpur, A., Luo, L., and Huang, Q. (2018).
Starmap for category-agnostic keypoint and viewpoint
estimation. CoRR, abs/1803.09331.
Real-time Detection of 2D Tool Landmarks with Synthetic Training Data
47