5 CONCLUSION AND FUTURE
DIRECTIONS
In this paper, it has been shown that the sliding
window approach coupled with the Spatial-Temporal
Graph Convolutional Networks allows taking advan-
tage of it since the Graph convolutional Network uses
temporal information of the skeleton and can char-
acterize the noise around the action to determine the
right action in the sliding windows. We have shown a
sliding window is a good approach for online action
recognition in real-time with continuous data streams,
and it does not require a powerful processor like two
algorithms, one for data stream segmentation, a sec-
ond for action recognition. Our method provides only
one algorithm. And it can be embedded in a small
Electronic Control Unit (ECU) to provide a fast infer-
ence of the current action.
One of the limits is the size of the sliding and ef-
fective window when we know the average duration
of action. We validate our method with two states of
the art data sets with a common real-time motion ac-
tion and have shown a good performance.
Our future works will focus on a variable sliding
window that allows knowing several actions with dif-
ferent lengths. The main challenging is the amount
of data, the InHard dataset (Dallel et al., 2020) cor-
respond with the aim of action recognition in indus-
trial sites. But it needs much more data to generalized
action detection, it will be a part of our work to en-
large this dataset. Our method can also be improved
by using improved ST-GCN shown in the survey on
ST-GCN.
REFERENCES
Dallel, M., Havard, V., Baudry, D., and Savatier, X. (2020).
Inhard - an industrial human action recogniton dataset
in the context of industrial collaborative robotics. In
IEEE International Conference on Human-Machine
Systems ICHMS.
Datar, M., Gionis, A., Indyk, P., and Motwani, R. (2002).
Maintaining stream statistics over sliding windows.
SIAM journal on computing, 31(6):1794–1813.
Dehghani, A., Sarbishei, O., Glatard, T., and Shihab, E.
(2019). A quantitative comparison of overlapping
and non-overlapping sliding windows for human ac-
tivity recognition using inertial sensors. Sensors,
19(22):5026.
Du, Y., Fu, Y., and Wang, L. (2015). Skeleton based ac-
tion recognition with convolutional neural network. In
2015 3rd IAPR Asian Conference on Pattern Recogni-
tion (ACPR), pages 579–583. IEEE.
Eickeler, S., Kosmala, A., and Rigoll, G. (1998). Hidden
markov model based continuous online gesture recog-
nition. In Proceedings. Fourteenth International Con-
ference on Pattern Recognition (Cat. No. 98EX170),
volume 2, pages 1206–1208. IEEE.
Laguna, J. O., Olaya, A. G., and Borrajo, D. (2011). A
dynamic sliding window approach for activity recog-
nition. In International Conference on User Model-
ing, Adaptation, and Personalization, pages 219–230.
Springer.
Lara, O. D. and Labrador, M. A. (2012). A survey
on human activity recognition using wearable sen-
sors. IEEE communications surveys & tutorials,
15(3):1192–1209.
Lei, P. and Todorovic, S. (2018). Temporal deformable
residual networks for action segmentation in videos.
In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 6742–6751.
Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., and
Tian, Q. (2019). Actional-structural graph convolu-
tional networks for skeleton-based action recognition.
In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 3595–3603.
Li, W., Zhang, Z., and Liu, Z. (2010). Action recognition
based on a bag of 3d points. In 2010 IEEE Computer
Society Conference on Computer Vision and Pattern
Recognition-Workshops, pages 9–14. IEEE.
Li, Y., Lan, C., Xing, J., Zeng, W., Yuan, C., and Liu, J.
(2016). Online human action detection using joint
classification-regression recurrent neural networks. In
European Conference on Computer Vision, pages
203–220. Springer.
Liu, J., Shahroudy, A., Wang, G., Duan, L.-Y., and Kot,
A. C. (2019a). Skeleton-based online action predic-
tion using scale selection network. IEEE transac-
tions on pattern analysis and machine intelligence,
42(6):1453–1467.
Liu, J., Shahroudy, A., Xu, D., Kot, A. C., and Wang,
G. (2017a). Skeleton-based action recognition using
spatio-temporal lstm network with trust gates. IEEE
transactions on pattern analysis and machine intelli-
gence, 40(12):3007–3021.
Liu, J., Wang, G., Hu, P., Duan, L.-Y., and Kot, A. C.
(2017b). Global context-aware attention lstm net-
works for 3d action recognition. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pages 1647–1656.
Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao,
J., and Han, J. (2019b). On the variance of the
adaptive learning rate and beyond. arXiv preprint
arXiv:1908.03265.
Luzhnica, G., Simon, J., Lex, E., and Pammer, V. (2016).
A sliding window approach to natural hand gesture
recognition using a custom data glove. In 2016 IEEE
Symposium on 3D User Interfaces (3DUI), pages 81–
90. IEEE.
Ma, C., Li, W., Cao, J., Du, J., Li, Q., and Gravina, R.
(2020). Adaptive sliding window based activity recog-
nition for assisted livings. Information Fusion, 53:55–
65.
Miranda, L., Vieira, T., Mart
´
ınez, D., Lewiner, T., Vieira,
A. W., and Campos, M. F. (2014). Online gesture
VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications
434