REFERENCES
Ballan, L., Taneja, A., Gall, J., Van Gool, L., & Pollefeys,
M. (2012). Motion capture of hands in action using
discriminative salient points. Proceedings of European
Conference on Computer Vision, 7577 LNCS(PART 6),
640–653. https://doi.org/10.1007/978-3-642-33783-
3_46
Barsoum, E. (2016). Articulated Hand Pose Estimation
Review. 1–50. http://arxiv.org/abs/1604.06195
Erol, A., Bebis, G., Nicolescu, M., Boyle, R. D., &
Twombly, X. (2007). Vision-based hand pose
estimation: A review. Computer Vision and Image
Understanding, 108(1–2), 52–73. https://doi.org/
10.1016/j.cviu.2006.10.012
Ge, L., Ren, Z., Li, Y., Xue, Z., Wang, Y., Cai, J., & Yuan,
J. (2019). 3D hand shape and pose estimation from a
single RGB image. Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern
Recognition, 2019-June, 10825–10834. https://doi.org/
10.1109/CVPR.2019.01109
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014).
Rich feature hierarchies for accurate object detection
and semantic segmentation. Proceedings of the IEEE
Computer Society Conference on Computer Vision and
Pattern Recognition, 580–587. https://doi.org/
10.1109/CVPR.2014.81
Gkioxari, G., Girshick, R., Dollár, P., & He, K. (2018).
Detecting and Recognizing Human-Object Interactions.
Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 1(c),
8359–8367. https://doi.org/10.1109/CVPR.2018.00872
Hamer, H., Gall, J., Weise, T., & Van Gool, L. (2010). An
object-dependent hand pose prior from sparse training
data. Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern
Recognition, 671–678. https://doi.org/10.1109/CVP
R.2010.5540150
He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017).
Mask R-CNN. Proceedings of the IEEE International
Conference on Computer Vision, 2017-Octob, 2980–
2988. https://doi.org/10.1109/ICCV.2017.322
He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial
pyramid pooling in deep convolutional networks for
visual recognition. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 8691 LNCS(PART
3), 346–361. https://doi.org/10.1007/978-3-319-10578-
9_23
Hei Law, Yun Teng, Olga Russakovsky, J. D. (2019).
CornerNet-Lite : Efficient Keypoint-Based Object
Detection.
Iasonas Oikonomidis, Nikolaos Kyriazis, and A. A. A.
(2011). Markerless and Efficient 26-DOF Hand Pose
Recovery. Proceedings of the 10th Asian Conference
on Computer Vision, 6978 LNCS(PART 1), 365–373.
https://doi.org/10.1007/978-3-642-24085-0_38
Law, H., & Deng, J. (2018). CornerNet. European
Conference on Computer Vision(ECCV), 765–781.
Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O.,
Xu, W., & Theobalt, C. (2018). Monocular 3D human
pose estimation in the wild using improved CNN
supervision. Proceedings - 2017 International
Conference on 3D Vision, 3DV 2017, 506–516.
https://doi.org/10.1109/3DV.2017.00064
Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Elgharib,
M., Fua, P., Seidel, H. P., Rhodin, H., Pons-Moll, G., &
Theobalt, C. (2020). XNect: Real-time Multi-Person
3D Motion Capture with a Single RGB Camera. ACM
Transactions on Graphics, 39(4), 1–24.
https://doi.org/10.1145/3386569.3392410
Moon, G., Chang, J. Y., & Lee, K. M. (2019). Camera
distance-aware top-down approach for 3D multi-person
pose estimation from a single RGB image. Proceedings
of the IEEE International Conference on Computer
Vision, 2019-Octob, 10132–10141. https://doi.org/
10.1109/ICCV.2019.01023
Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas,
D., & Theobalt, C. (2017). Real-Time Hand Tracking
under Occlusion from an Egocentric RGB-D Sensor.
Proceedings of the IEEE International Conference on
Computer Vision, 2017-Octob, 1163–1172.
https://doi.org/10.1109/ICCV.2017.131
Oikonomidis, I., Kyriazis, N., & Argyros, A. (2011).
Efficient model-based 3D tracking of hand
articulations using Kinect. June 2014, 101.1-101.11.
https://doi.org/10.5244/c.25.101
Pavllo, D., Feichtenhofer, C., Grangier, D., & Auli, M.
(2019). 3D human pose estimation in video with
temporal convolutions and semi-supervised training.
Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 2019-
June, 7745–7754. https://doi.org/10.1109/CVPR.201
9.00794
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016).
You only look once: Unified, real-time object detection.
Proceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 2016-
Decem, 779–788. https://doi.org/10.1109/CVPR.20
16.91
Redmon, J., & Farhadi, A. (2018). YOLOv3: An
Incremental Improvement. http://arxiv.org/abs/1804.0
2767
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-
CNN: Towards Real-Time Object Detection with
Region Proposal Networks. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 39(6),
1137–1149. https://doi.org/10.1109/TPAMI.2016.257
7031
Sharp, T., Keskin, C., Robertson, D., Taylor, J., Shotton, J.,
Kim, D., Rhemann, C., Leichter, I., Vinnikov, A., Wei,
Y., Freedman, D., Kohli, P., Krupka, E., Fitzgibbon, A.,
& Izadi, S. (2015). Accurate, robust, and flexible
realtime hand tracking. Conference on Human Factors
in Computing Systems - Proceedings
, 2015-April,
3633–3642. https://doi.org/10.1145/2702123.2702179
Sridhar, S., Mueller, F., Zollhöfer, M., Casas, D.,
Oulasvirta, A., & Theobalt, C. (2016). Real-time joint
tracking of a hand manipulating an object from RGB-D
input. International Journal of Computer Vision, 9906