
points would be beneficial to obtain more statistical
information. Alternatively, thresholds between cor-
rect and incorrect executions could be determined by
a human expert.
We analyzed several cases in which the classi-
fication of correct and incorrect variants failed, in
order to investigate possible reasons. One relevant
source of errors is how each variant was performed
by each person. In some cases, the difference be-
tween incorrect and correct action performance was
relatively small. While human experts are able to dis-
tinguish these variants, the differences in executions
were not always captured by the automatic system.
For some parameters, we expect that additional per-
user calibration may be needed, as inter-person dif-
ferences may be higher than differences between cor-
rect and incorrect variants. Occasionally, problems
stemmed from incorrect pose estimation. Mediapipe
is robust to varying environmental conditions, more-
over employing moving average filters out most out-
liers. However, some errors in pose estimation still
occur, which may be particularly problematic for pa-
rameters based on minimum or maximum values.
6 CONCLUSIONS
In this work, we addressed the problem of qualita-
tive evaluation of actions in fencing footwork, includ-
ing assessment of technical skill and physical perfor-
mance. The goal was to provide relevant information
for fencing training evaluation.
In cooperation with fencing experts, we designed
and recorded a novel dataset including sequences of
fencing footwork practice as well as 40 variants of
actions per person (28 with incorrect execution vari-
ants to be recognized). The dataset includes manual
labels for actions and variants, provided by fencing
experts. To the best of our knowledge, this is cur-
rently by far the most detailed dataset of fencing ac-
tions. The employed method for temporal segmenta-
tion and action classification is sufficiently effective
to be used in practical applications. We designed and
evaluated specific methods for measuring motion pa-
rameters relevant to each variant of incorrect execu-
tion. Results indicate that in most cases, our system
can provide relevant feedback for fencers.
In future work, we intend to focus on improving
the recognition of several action variants by including
user-specific calibration or adaptation mechanisms.
Moreover, we plan to investigate the idea of using
expert-based thresholds instead of automatic ones. Fi-
nally, the proposed solution is currently being imple-
mented in a mobile application. Therefore, we expect
to validate our approach during fencing training ses-
sions and gather valuable feedback from fencers and
coaches.
ACKNOWLEDGEMENTS
The research presented in this paper was supported
by the National Centre for Research and Develop-
ment (NCBiR) under Grant No. LIDER/37/0198/L-
12/20/NCBR/2021. Data acquisition and expert
consultations were carried out in cooperation with
Aramis Fencing School (aramis.pl).
REFERENCES
Barandas, M., Folgado, D., Fernandes, L., Santos, S.,
Abreu, M., Bota, P., Liu, H., Schultz, T., and Gam-
boa, H. (2020). Tsfel: Time series feature extraction
library. SoftwareX, 11:100456.
Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T.,
Zhang, F., and Grundmann, M. (2020). Blazepose:
On-device real-time body pose tracking. arXiv
preprint arXiv:2006.10204.
Beddiar, D. R., Nini, B., Sabokrou, M., and Hadid, A.
(2020). Vision-based human activity recognition: a
survey. Multimedia Tools and Applications, 79(41-
42):30509–30555.
Feichtenhofer, C., Pinz, A., and Wildes, R. P. (2017).
Spatiotemporal multiplier networks for video action
recognition. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 4768–
4777.
Fr
`
ere, J., G
¨
opfert, B., N
¨
uesch, C., Huber, C., Fischer, M.,
Wirz, D., and Friederich, N. (2010). Kinematical and
emg-classifications of a fencing attack. International
journal of sports medicine, pages 28–34.
Fu, X., Zhang, K., Wang, C., and Fan, C. (2020). Multiple
player tracking in basketball court videos. Journal of
Real-Time Image Processing, 17:1811–1828.
Google (2023). Mediapipe.
Han, J., Shao, L., Xu, D., and Shotton, J. (2013). Enhanced
computer vision with microsoft kinect sensor: A re-
view. IEEE Transactions on Cybernetics, 43(5):1318–
1334.
Host, K. and Iva
ˇ
si
´
c-Kos, M. (2022). An overview of human
action recognition in sports based on computer vision.
Heliyon.
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Suk-
thankar, R., and Fei-Fei, L. (2014). Large-scale video
classification with convolutional neural networks. In
Proceedings of the IEEE conference on Computer Vi-
sion and Pattern Recognition, pages 1725–1732.
Kendall, A., Grimes, M., and Cipolla, R. (2015). Posenet: A
convolutional network for real-time 6-dof camera re-
localization. In Proceedings of the IEEE international
conference on computer vision, pages 2938–2946.
Automatic Assessment of Skill and Performance in Fencing Footwork
643