
Table 5: Comparison of the performance of the model using
skeleton detection after 30 epochs and 50 epochs, trained
only with generated data and tested on real data.
Metric
30 epochs 50 epochs
Accuracy 0.672 0.602
Precision 0.693 0.635
Recall 0.672 0.6019
F1 Score 0.676 0.606
6 CONCLUSIONS
Action recognition in animals presents a complex
challenge due to the variability in movement and the
diverse environments in which animals operate. Cre-
ating a high-quality dataset for training deep learning
models involved several challenges, particularly due
to a lack of annotated data for this specific task. We,
therefore, investigated the use of existing datasets and
assessed their suitability for our needs, while an alter-
native approach was also used as we generated image
data with deep learning models.
Comparing the two approaches to action recog-
nition revealed that the model using keypoints per-
formed marginally better than the non-keypoint
model. However, after including real data in the train-
ing set, we found that the model without keypoint
detection achieved significantly better results on real
photos. This suggests that non-keypoint models have
strong potential in canine action recognition and may
adapt more readily to the variability and complexity
of real-world images. We believe that using generated
data to augment the training set, combined with real
photographs, can substantially aid in training models
that require large datasets.
ACKNOWLEDGEMENTS
This work was supported by the grant KEGA 004UK-
4/2024 “DICH: Digitalization of Cultural Heritage”.
REFERENCES
Biggs, B., Boyne, O., Charles, J., Fitzgibbon, A., and
Cipolla, R. (2020). Who left the dogs out?: 3D an-
imal reconstruction with expectation maximization in
the loop. In ECCV.
Dawn, K. (2023). Animal Pose Estimation: Fine-tuning
YOLOv8 Pose Models. https://learnopencv.com/
animal-pose-estimation/#Data-Configuration. Last
Access: 13.11.2023.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE conference on com-
puter vision and pattern recognition, pages 248–255.
Ieee.
Department of Computer Science at Rice University
(2020). COCO Captions Explorer. https://vislang.ai/
coco-explorer. Last Access: 8.11.2024.
Fernandes, A. F. A., D
´
orea, J. R. R., and Rosa, G. J. d. M.
(2020). Image analysis and computer vision applica-
tions in animal sciences: an overview. Frontiers in
Veterinary Science, 7:551269.
getimg.ai (2024). Parameters. https://getimg.ai/guides/
stable-diffusion-parameters. Last Access: 31.3.2024.
Girish, D., Singh, V., and Ralescu, A. (2020). Understand-
ing action recognition in still images.
Jiang, L., Lee, C., Teotia, D., and Ostadabbas, S. (2022).
Animal pose estimation: A closer look at the state-
of-the-art, existing gaps and opportunities. Computer
Vision and Image Understanding, 222:103483.
Jocher, G. (2023). Image Classification Datasets Overview.
https://docs.ultralytics.com/datasets/classify/. Last
Access: 20.3.2024.
Jocher, G., Chaurasia, A., and Qiu, J. (2023a). Ultralytics
YOLOv8.
Jocher, G., Chaurasia, A., and Qiu, J. (2023b). Ultralytics
YOLOv8 - yolov8n-cls.pt.
Khosla, A., Jayadevaprakash, N., Yao, B., and Fei-Fei, L.
(2011). Novel Dataset for Fine-Grained Image Cate-
gorization. In First Workshop on Fine-Grained Visual
Categorization, IEEE Conference on Computer Vision
and Pattern Recognition, Colorado Springs, CO.
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick,
R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L.,
and Doll
´
ar, P. (2015). Microsoft COCO: Common Ob-
jects in Context.
LykosAI (2024). StabilityMatrix. https://github.com/
LykosAI/StabilityMatrix/tree/main.
Oppenlaender, J. (2022). A taxonomy of prompt modifiers
for text-to-image generation. arxiv. arXiv preprint
arXiv:2204.13988.
Realistic Vision (2024). Realistic Vision V6.0 B1. https:
//civitai.com/models/4201/realistic-vision-v60-b1.
Sun, Z., Ke, Q., Rahmani, H., Bennamoun, M., Wang, G.,
and Liu, J. (2022). Human action recognition from
various data modalities: A review. IEEE transac-
tions on pattern analysis and machine intelligence,
45(3):3200–3225.
Yang, Z., Zhou, M., Shan, M., Wen, B., Xuan, Z., Hill,
M., Bai, J., Qi, G.-J., and Wang, Y. (2024). Omn-
imotiongpt: Animal motion generation with limited
data. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
1249–1259.
Canine Action Recognition: Exploring Keypoint and Non-Keypoint Approaches Enhanced by Synthetic Data
591