handle more complex interactions, such as those in-
volved in procedural medical skills, as part of an au-
tomated instructional training system.
ACKNOWLEDGEMENTS
This research has been conducted under an Irish Rese-
arch Council Enterprise Partnership Scholarship with
Intel Ireland.
REFERENCES
Bergstra, J. S., Bardenet, R., Bengio, Y., and Kgl, B.
(2011). Algorithms for hyper-parameter optimization.
In Shawe-Taylor, J., Zemel, R. S., Bartlett, P. L., Pe-
reira, F., and Weinberger, K. Q., editors, Advances
in Neural Information Processing Systems 24, pages
2546–2554. Curran Associates, Inc.
Breiman, L. (2001). Random forests. Machine Learning,
45(1):5–32.
Butler, D. A., Izadi, S., Hilliges, O., Molyneaux, D., Hod-
ges, S., and Kim, D. (2012). Shake’n’sense: Reducing
interference for overlapping structured light depth ca-
meras. In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems, CHI ’12, pa-
ges 1933–1936, New York, NY, USA. ACM.
Debarnot, U., Sperduti, M., Di Rienzo, F., and Guillot, A.
(2014). Experts bodies, experts minds: How physi-
cal and mental training shape the brain. Frontiers in
Human Neuroscience, 8:280.
Donahue, J., Hendricks, L. A., Rohrbach, M., Venugo-
palan, S., Guadarrama, S., Saenko, K., and Darrell,
T. (2016). Long-term recurrent convolutional net-
works for visual recognition and description. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence.
Ericsson, K. A., Krampe, R. T., and Tesch-Romer, C.
(1993). The role of deliberate practice in the acqui-
sition of expert performance. Psychological Review,
100(3):363–406.
Friedman, J. H. (2001). Greedy function approximation: A
gradient boosting machine. The Annals of Statistics,
29(5):1189–1232.
Gal, Y. and Ghahramani, Z. (2016). A theoretically groun-
ded application of dropout in recurrent neural net-
works. In Lee, D. D., Sugiyama, M., Luxburg, U. V.,
Guyon, I., and Garnett, R., editors, Advances in Neu-
ral Information Processing Systems 29, pages 1019–
1027. Curran Associates, Inc.
Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab,
N., Fua, P., and Lepetit, V. (2012). Gradient response
maps for real-time detection of textureless objects.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 34(5):876–888.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural Computing, 9(8):1735–1780.
Kuehne, H., Gall, J., and Serre, T. (2016). An end-to-end
generative framework for video segmentation and re-
cognition. In 2016 IEEE Winter Conference on Appli-
cations of Computer Vision (WACV), pages 1–8.
Lea, C., Reiter, A., Vidal, R., and Hager, G. D. (2016).
Segmental spatiotemporal cnns for fine-grained action
segmentation. In Computer Vision - ECCV 2016, Lec-
ture Notes in Computer Science, pages 36–52. Sprin-
ger, Cham.
Profanter, C. and Perathoner, A. (2015). Dops (direct ob-
servation of procedural skills) in undergraduate skills-
lab: Does it work? analysis of skills-performance and
curricular side effects. GMS Zeitschrift fr Medizinis-
che Ausbildung, 32(4).
Richard, A. and Gall, J. (2016). Temporal action detection
using a statistical language model. In 2016 IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 3131–3140.
Rohrbach, M., Amin, S., Andriluka, M., and Schiele, B.
(2012). A database for fine grained activity de-
tection of cooking activities. In 2012 IEEE Confe-
rence on Computer Vision and Pattern Recognition,
pages 1194–1201.
Rusu, R. B. (2010). Semantic 3d object maps for every-
day manipulation in human living environments. KI -
Knstliche Intelligenz, 24(4):345–348.
Salti, S., Tombari, F., and Di Stefano, L. (2014). Shot: Uni-
que signatures of histograms for surface and texture
description. Computer Vision and Image Understan-
ding, 125:251–264.
Schnabel, R., Wahl, R., and Klein, R. (2007). Efficient
ransac for point-cloud shape detection. In Computer
Graphics Forum, volume 26, pages 214–226. Wiley
Online Library.
Segal, A., Haehnel, D., and Thrun, S. (2009). Generalized-
icp. In Robotics: Science and Systems, volume 2.
Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finoc-
chio, M., Blake, A., Cook, M., and Moore, R. (2013).
Real-time human pose recognition in parts from single
depth images. Commun. ACM, 56(1):116–124.
Stein, S. and McKenna, S. J. (2013). Combining embed-
ded accelerometers with computer vision for recogni-
zing food preparation activities. In Proceedings of the
2013 ACM International Joint Conference on Perva-
sive and Ubiquitous Computing, UbiComp ’13, pages
729–738, New York, NY, USA. ACM.
Stein, S. and McKenna, S. J. (2017). Recognising complex
activities with histograms of relative tracklets. Com-
puter Vision and Image Understanding, 154:82–93.
Tang, D., Chang, H., Tejani, A., and Kim, T. K. (2016).
Latent regression forest: Structured estimation of 3d
hand poses. IEEE Transactions on Pattern Analysis
and Machine Intelligence, PP(99):1–1.
Wang, H., Klaser, A., Schmid, C., and Liu, C.-L. (2011).
Action recognition by dense trajectories. In 2011
IEEE Conference on Computer Vision and Pattern Re-
cognition (CVPR), pages 3169–3176.
Zhong, Y. (2009). Intrinsic shape signatures: A shape des-
criptor for 3d object recognition. In 2009 IEEE 12th
International Conference on Computer Vision Works-
hops (ICCV Workshops), pages 689–696.
Recognising Actions for Instructional Training using Pose Information: A Comparative Evaluation
489