Figure 9: Difference in recognition performance due to parameter setting. Each image is recognition result for same scene but
with different parameters (with each image blurred). Parameters that correctly capture boundaries of large boxed items and
those that detect very small areas, such as plastic bottle caps, often differ, and there are cases in which it is not possible to set
parameters that allow all objects to be recognized simultaneously.
We are also gratefu l to Mr. Daisuke K a tsumata and
Mr. Tsubasa Watanabe for their cooperation in ob-
taining the experimental d a ta .
REFERENCES
Aleotti, J., Baldassarri, A., Bonf`e, M., Carr icato, M., Chiar-
avalli, D., Di Leva, R., Fantuzzi, C., Farsoni, S., In-
nero, G., Rizzini, D. L., Melchiorri, C., Monica, R.,
Palli, G., Rizzi, J., Sabattini, L., Sampietro, G., and
Zaccaria, F. (2021). Toward future automatic ware-
houses: An autonomous depalletizing system based
on mobile manipulation and 3d perception. Applied
Sciences (Switzerland), 11(13).
Doliotis, P., McMurrough, C. D., Criswell, A., Middleton,
M. B., and Rajan, S. T. (2016). A 3D perception-based
robotic manipulation system for automated truck un-
loading. IEEE International Conference on Automa-
tion Science and Engineering, 2016-Novem:262–267.
Eto, H., Nakamoto, H., Sonoura, T., Tanaka, J., and Ogawa,
A. (2019). Development of automated high-speed de-
palletizing system for complex stacking on roll box
pallets. Journal of Advanced Mechanical Design, Sys-
tems and Manufacturing, 13(3):1–12.
Fischler, M. A. and Bolles, R. C. (1981). Random sample
consensus: a paradigm for model fitting with appli-
cations to image analysis and automated cartography.
Communications of the ACM, 24(6):381–395.
Fuji, T., Kimura, N., and Ito, K. (2015). Architecture for
recognizing stacked box objects for automated ware-
housing robot system. In Proceedings of the 17th Irish
Machine Vision and Image Processing conference.
Gou, L., Wu, S., Yang, J., Yu, H., Lin, C., Li, X., and Deng,
C. (2021). Carton dataset synthesis method for do-
main shift based on foreground texture decoupling and
replacement. arXiv preprint arXiv:2103.10738, (Xi-
aoping Li).
He, Y., Sun, W., Huang, H., Li u, J., Fan, H., and Sun, J.
(2020). PVN3D: A deep point-wise 3D keypoints vot-
ing network for 6DoF pose estimation. Proceedings of
the IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, pages 11629–11638.
Hodaˇn, T., Matas, J., and Obdrˇz´alek,
ˇ
S. (2016). On evalu-
ation of 6d object pose estimation. In European Con-
ference on Computer Vision, pages 606–619. Springer.
Holz, D., Ichim, A. E., Tombari, F., Rusu, R. B., and
Behnke, S. (2015). Registration with the point cloud
library: A modular fr amework for aligning in 3-d.
IEEE Robotics & Automation Magazine, 22(4):110–
124.
Katsoulas, D., Bergen, L., and Tassakos, L. (2002). A versa-
tile depalletizer of boxes based on range imagery. Pro-
ceedings - IEEE International Conference on Robotics
and Automation, 4(May):4313–4319.
Katsoulas, D. K. and Kosmopoulos, D. I. (2001). An ef-
ficient depalletizing system based on 2D range im-
agery. Proceedings - IEEE International Conference
on Robotics and Automation, 1:305–312.
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., and Navab, N.
(2017). SSD-6D: Making RGB-Based 3D Detection
and 6D Pose Estimation Great Again. Proceedings of
the IEEE International Conference on Computer Vi-
sion, 2017-October:1530–1538.
Kimura, N., I to, K., Fuji, T., Fujimoto, K., Esaki, K.,
Beniyama, F., and Moriya, T. (2016). Mobile dual-
arm robot for automated order picking system in ware-
house containing various kinds of products. 2015
IEEE/SICE International Symposium on System Inte-
gration, SII 2015, pages 332–338.
Kirchheim, A ., Burwinkel, M., and Echelmeyer, W. (2008).
Automatic unloading of heavy sacks from contain-
ers. Proceedings of the IEEE International Con-
ference on Automation and Logistics, ICAL 2008,
(September):946–951.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60(2):91–110.
Mitash, C., Wen, B., Bekris, K., and Boularias, A. (2020).
Scene-level pose estimation for multiple instances of
densely packed objects. In Conference on Robot
Learning, pages 1133–1145. PMLR.