the jump class pattern (Figure 6(d)), the whole body
moves, so the edges are concentrated at the center of
the body and also concentrated on the right arm.
Figure 7 shows an example of rendering the ac-
quired connection pattern on the dataset videos. Fig-
ure 7(a) shows that in the throwing class, the right
arm greatly moves and the edges are concentrated on
the right arm. Similarly, in the kicking class (Fig-
ure 7(b)), the right leg greatly moves and the edges
are concentrated on the right leg. Also, when throw-
ing, the participant is looking at the target, and when
kicking, the participant is looking at the kicking ob-
ject. In the jump class (Figure 7(c)), the arm greatly
moves with the movement of the whole body, so the
edges are concentrated on the center of the body and
the right arm. From these results, it can be seen that
the edges are concentrated on nodes with large move-
ments for each operation class, and connection pat-
terns specific to operation classes were successfully
acquired.
5 CONCLUSIONS
In this work, we proposed an action recognition
method considering connection patterns that are spe-
cific to action classes. In the proposed method, fea-
tures unique to each action class were acquired by
introducing multitask learning, and the optimal con-
nection for each action class was obtained by updat-
ing the adjacency matrix on the basis of the learning
weight matrix indicating the importance of edges dur-
ing learning. Evaluation experiments demonstrated
that the proposed method improved classification ac-
curacy in all action classes evaluated compared to ST-
GCN. Moreover, by visualizing the connection pat-
tern, we confirmed that patterns specific to each action
class were generated. Future work includes learning
methods that can obtain the optimal number of edges
for each action class.
REFERENCES
Defferrard, M., Bresson, X., and Vandergheynst, P. (2016).
Convolutional neural networks on graphs with fast lo-
calized spectral filtering. In Neural Information Pro-
cessing Systems.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and
Dahl, G. E. (2017). Neural message passing for quan-
tum chemistry. In International Conference on Ma-
chine Learning.
Hongsong, W. and Liang, W. (2017). Modeling temporal
dynamics and spatial configurations of actions using
two-stream recurrent neural networks. In Computer
Vision and Pattern Recognition.
Joan, B., Wojciech, Z., Arthur, S., and Yann, L. (2014).
Spatial networks and locally connected networks on
graphs. In International Conference on Learning Rep-
resentations.
Kipf, T. N. and Welling, M. (2017). Semi-supervised clas-
sification with graph convolutional networks. In Inter-
national Conference on Learning Representations.
Li, C., Cui, Z., Zheng, W., Xu, C., Ji, R., and Yang,
J. (2018). Action-attending graphic neural network.
IEEE Transactions on Image Processing, 27(7):3657–
3670.
Limin, W., Yuanjun, X., Zhe, W., Yu, Q., Dahua, L., Xi-
aoou, T., and Luc, V. (2016). Temporal segment net-
works: Towards good practices for deep action recog-
nition. In European Conference on Computer Vision.
Liu, M., Liu, H., and Chen, C. (2017). Enhanced skeleton
visualization for view invariant human action recogni-
tion. Pattern Recogn, 68(C):346–362.
Min, L., Qiang, C., and Shuicheng, Y. (2014). Network in
network. In 2nd International Conference on Learn-
ing Representations.
Shahroudy, A., Liu, J., Ng, T.-T., and Wang, G. (2016). Ntu
rgb+d: A large scale dataset for 3d human activity
analysis. In Computer Vision and Pattern Recogni-
tion.
Sijie, Y., Yuanjun, X., and Dahua, L. (2018). Spatial tem-
poral graph convolutional networks for skeleton-based
action recognition. In Association for the Advance-
ment of Artificial Intelligence.
Simonyan, K. and Zisserman, A. (2014). Two-stream con-
volutional networks for action recognition in videos.
In Advances in Neural Information Processing Sys-
tems.
Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri,
M. (2015). Learning spatiotemporal features with 3d
convolutional networks. In International Conference
on Computer Vision.
Vemulapalli, R., Arrate, F., and Chellappa, R. (2014). Hu-
man action recognition by representing 3d skeletons
as points in a lie group. In Computer Vision and Pat-
tern Recognition.
Wu, Y. (2012). Mining actionlet ensemble for action recog-
nition with depth cameras. In Conference on Com-
puter Vision and Pattern Recognition.
Zhe, C., Tomas, S., Shih-En, W., and Yaser, S. (2017). Real-
time multi-person 2d pose estimation using part affin-
ity fields. In Conference on Computer Vision and Pat-
tern Recognition.