Experiments that compare six class classification
with the proposed hierarchical classification were
conducted, where the former classifies the six classes:
Front A-pass, Front B-pass, Front C-pass and Back
A-pass, Back B-pass, Back C-pass. Two
TimeSformer models were used as video
classification models: TimeSformer-L and
TimeSformer-HR. Also, two data sets of different
lengths were used, where Dataset1 is long videos that
cut out the sequence of serve, reception, set, attack
and the next action after the attack, and Dataset2 is
short videos that cut out only the reception and set
from Dataset1. To improve accuracy, the optimum
sampling rate was set for each combination of dataset
and model, and training, validation and testing were
carried out under these conditions.
The best accuracy by the six class classification is
only 72.5%.
In contrast, in the hierarchical classification, the
Step 1, which classifies Front and Back data,
achieved 100% accuracy. The Step 2, which classifies
(A/B)-pass and C-pass, achieved 94.94% and 93.06%
accuracies on Front and Back data, respectively,
under the combination of Dataset2, model HR and
sampling rate = 6. The Step 3, which classifies A-pass
and B-pass, achieved 76.56% and 79.34% accuracies
on Front and Back data, respectively, under the
combination of Dataset2, model L and sampling rate
= 1.
The hierarchical classification did not result in
misclassification of Front and Back data. It also
reduced the number of misclassifications and the
error rate in all the cases except for one case for step2.
Thus, it can be said that the hierarchical classification
is superior to the six class classification in terms of
reducing the risk of misclassification. Another
advantage over the six class classification is that
hierarchical one allows a separate approach to the
tasks and characteristics of each step, successfully
subdividing the problem of automating the
assessment of the reception quality.
In the future, we will focus on developing a
method for classifying A- and B-passes, where the
current accuracies are relatively low. In particular, we
aim to improve the accuracy through a method that
effectively use local information, such as the setter’s
movement, which is considered important in the
assessment of the reception quality.
Furthermore, hierarchical classification is
expected to be applied to other sports videos.
REFERENCES
Silva, M., Marcelino, R., Lacerda, D., & João, P.V. (2016).
Match analysis in Volleyball: A systematic review.
Montenegrin Journal of Sports Science and Medicine,
5(1), 35-46.
Liang, L., Cheng, X., & Ikenaga, T. (2019). Team
formation mapping and sequential ball motion state
based event recognition for automatic data volley.
Proceedings of the 16th International Conference on
Machine Vision Applications, MVA 2019, 8757998.
doi: 10.23919/MVA.2019.8757998
Cheng, X., Liu, Y., & Ikenaga, T. (2019). 3D global and
multi-view local features combination based qualitative
action recognition for volleyball game analysis. IEICE
Transactions on Fundamentals of Electronics,
Communications and Computer Sciences, E102A(12),
1891-1899. doi: 10.1587/transfun.E102.A.1891
Cheng, X., Liang, L., & Ikenaga, T. (2022). Automatic data
volley: game data acquisition with temporal-spatial
filters. Complex and Intelligent Systems, 8(6), 4993-
5010. doi: 10.1007/s40747-022-00752-3
Lei, Q., Du, J.X., Zhang, H.B., Ye, S., & Chen, D.S. (2019).
A survey of vision-based human action evaluation
methods. Sensors (Switzerland), 19(19), 4129. doi:
10.3390/s19194129
Xia, H., Tracy, R., Zhao, Y., Wang, Y., Wang, Y.F., & Shen,
W. (2023). Advanced Volleyball Stats for All Levels:
Automatic Setting Tactic Detection and Classification
with a Single Camera. IEEE International Conference
on Data Mining Workshops, ICDMW, 1407-1416. doi:
10.1109/ICDMW60847.2023.00179
Bhatt, D., Patel, C., Talsania, H., Patel, J., Vaghela, R.,
Pandya, S., … Ghayvat, H. (2021). Cnn variants for
computer vision: History, architecture, application,
challenges and future scope. Electronics (Switzerland),
10(20), 2470. doi: 10.3390/electronics10202470
Arshad, M.H., Bilal, M., & Gani, A. (2022). Human
Activity Recognition: Review, Taxonomy and Open
Challenges. Sensors, 22(17), 6463. doi:10.3390/s2217
6463
Guo, M.H., Xu, T.X., Liu, J.J., Liu, Z.N., Jiang, P.T., Mu,
T.J., ... Hu, S.M. (2022). Attention mechanisms in
computer vision: A survey. Computational Visual
Media, 8(3), 331-368. doi: 10.1007/s41095-022-0271-
y
Bertasius, G., Wang, H., & Torresani L. (2021). Is Space-
Time Attention All You Need for Video
Understanding? Proceedings of Machine Learning
Research, 139, 813-824. https://arxiv.org/pdf/2102.05
095
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A.N., ... Polosukhin, I. (2017). Attention is
all you need. Advances in Neural Information
Processing Systems, 2017-December, 5999-6009.
https://arxiv.org/pdf/1706.03762
Carreira, J., Noland, E., Banki-Horvath, A., Hillier, C., &
Zisserman, A. (2018). A short note about kinetics-
600.
arXiv preprint arXiv:1808.01340.
ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods