
tice flownet for scene flow estimation on large-scale
point clouds. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
Hermes, N., Bigalke, A., and Heinrich, M. P. (2023). Point
cloud-based scene flow estimation on realistically de-
formable objects: A benchmark of deep learning-
based methods. Journal of Visual Communication and
Image Representation, page 103893.
Hermes, N., Hansen, L., Bigalke, A., and Heinrich, M. P.
(2022). Support point sets for improving contactless
interaction in geometric learning for hand pose esti-
mation. pages 89–94.
Hu, T., Lin, G., Han, Z., and Zwicker, M. (2021). Learn-
ing to generate dense point clouds with textures on
multiple categories. In 2021 IEEE Winter Conference
on Applications of Computer Vision (WACV), pages
2169–2178, Los Alamitos, CA, USA. IEEE Computer
Society.
Kingma, D. P. and Ba, J. (2014). Adam: A method for
stochastic optimization.
Li, R., Lin, G., He, T., Liu, F., and Shen, C. (2021). Hcrf-
flow: Scene flow from point clouds with continuous
high-order crfs and position-aware flow embedding.
In 2021 IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR), pages 364–373, Los
Alamitos, CA, USA. IEEE Computer Society.
Li, S. and Lee, D. (2019). Point-to-pose voting based hand
pose estimation using residual permutation equivari-
ant layer. In Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, pages
11927–11936.
Liu, X., Qi, C. R., and Guibas, L. J. (2019). Flownet3d:
Learning scene flow in 3d point clouds. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR).
Liu, X., Yu, S.-y., Flierman, N. A., Loyola, S., Kamermans,
M., Hoogland, T. M., and De Zeeuw, C. I. (2021). Op-
tiflex: Multi-frame animal pose estimation combining
deep learning with optical flow. Frontiers in Cellular
Neuroscience, 15.
Ma, Y., Mao, Z.-H., Jia, W., Li, C., Yang, J., and Sun,
M. (2011). Magnetic hand tracking for human-
computer interface. IEEE Transactions on Magnetics,
47(5):970–973.
Mittal, H., Okorn, B., and Held, D. (2020). Just go with the
flow: Self-supervised scene flow estimation. In Pro-
ceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR).
Moon, G., Chang, J., and Lee, K. M. (2018). V2v-
posenet: Voxel-to-voxel prediction network for accu-
rate 3d hand and human pose estimation from a single
depth map. In The IEEE Conference on Computer Vi-
sion and Pattern Recognition (CVPR).
Nielsen, M., St
¨
orring, M., Moeslund, T. B., and Granum, E.
(2004). A procedure for developing intuitive and er-
gonomic gesture interfaces for hci. In Camurri, A.
and Volpe, G., editors, Gesture-Based Communica-
tion in Human-Computer Interaction, pages 409–420,
Berlin, Heidelberg. Springer Berlin Heidelberg.
Puy, G., Boulch, A., and Marlet, R. (2020). FLOT: Scene
Flow on Point Clouds Guided by Optimal Transport.
In European Conference on Computer Vision.
Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2016). Pointnet:
Deep learning on point sets for 3d classification and
segmentation. arXiv preprint arXiv:1612.00593.
Qi, C. R., Yi, L., Su, H., and Guibas, L. J. (2017). Point-
net++: Deep hierarchical feature learning on point sets
in a metric space. arXiv preprint arXiv:1706.02413.
Rezaei, M., Rastgoo, R., and Athitsos, V. (2023). Trihorn-
net: A model for accurate depth-based 3d hand pose
estimation. Expert Systems with Applications, page
119922.
Schwarz, L. A., Mkhitaryan, A., Mateus, D., and Navab,
N. (2012). Human skeleton tracking from depth data
using geodesic distances and optical flow. Image and
Vision Computing, 30(3):217–226. Best of Automatic
Face and Gesture Recognition 2011.
Sun, X., Wei, Y., Liang, S., Tang, X., and Sun, J. (2015).
Cascaded hand pose regression. In 2015 IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 824–832.
Tang, D., Chang, H. J., Tejani, A., and Kim, T.-K. (2014).
Latent regression forest: Structured estimation of 3d
articulated hand posture. In 2014 IEEE Conference
on Computer Vision and Pattern Recognition, pages
3786–3793.
Tompson, J., Stein, M., Lecun, Y., and Perlin, K. (2014).
Real-time continuous pose recovery of human hands
using convolutional networks. ACM Transactions on
Graphics, 33.
Wan, C., Probst, T., Gool, L. V., and Yao, A. (2018).
Dense 3d regression for hand pose estimation. In 2018
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pages 5147–5156.
Wang, H., Pang, J., Lodhi, M. A., Tian, Y., and Tian, D.
(2021). Festa: Flow estimation via spatial-temporal
attention for scene point clouds. In Proceedings of
the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), pages 14173–14182.
Wang, R. Y. and Popovi
´
c, J. (2009). Real-time hand-
tracking with a color glove. ACM Trans. Graph.,
28(3).
Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M.,
and Solomon, J. M. (2018). Dynamic graph CNN for
learning on point clouds. CoRR, abs/1801.07829.
Xiong, F., Zhang, B., Xiao, Y., Cao, Z., Yu, T., Zhou Tianyi,
J., and Yuan, J. (2019). A2j: Anchor-to-joint regres-
sion network for 3d articulated pose estimation from a
single depth image. In Proceedings of the IEEE Con-
ference on International Conference on Computer Vi-
sion (ICCV).
Zhang, Z., Xie, S., Chen, M., and Zhu, H. (2020). Han-
daugment: A simple data augmentation method for
depth-based 3d hand pose estimation. arXiv, pages
arXiv–2001.
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
294