loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Dichao Liu 1 ; Yu Wang 2 and Jien Kato 3

Affiliations: 1 Graduate School of Informatics, Nagoya University, Nagoya City and Japan ; 2 Graduate School of International Development, Nagoya University, Nagoya City and Japan ; 3 College of Information Science and Engineering, Ritsumeikan University, Kusatsu City and Japan

Keyword(s): Action Recognition, Video Understanding, Attention, Fine-grained, Deep Learning.

Related Ontology Subjects/Areas/Topics: Computer Vision, Visualization and Computer Graphics ; Image and Video Analysis ; Visual Attention and Image Saliency

Abstract: We aim to propose more effective attentional regions that can help develop better fine-grained action recognition algorithms. On the basis of the spatial transformer networks’ capability that implements spatial manipulation inside the networks, we propose an extension model, the Supervised Spatial Transformer Networks (SSTNs). This network model can supervise the spatial transformers to capture the regions same as hard-coded attentional regions of certain scale levels at first. Then such supervision can be turned off, and the network model will adjust the region learning in terms of location and scale. The adjustment is conditioned to classification loss so that it is actually optimized for better recognition results. With this model, we are able to capture attentional regions of different levels within the networks. To evaluate SSTNs, we construct a six-stream SSTN model that exploits spatial and temporal information corresponding to three levels (general, middle and detail). The re sults show that the deep-learned attentional regions captured by SSTNs outperform hard-coded attentional regions. Also, the features learned by different streams of SSTNs are complementary to each other and better result is obtained by fusing the features. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.145.60.149

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Liu, D.; Wang, Y. and Kato, J. (2019). Supervised Spatial Transformer Networks for Attention Learning in Fine-grained Action Recognition. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019) - Volume 4: VISAPP; ISBN 978-989-758-354-4; ISSN 2184-4321, SciTePress, pages 311-318. DOI: 10.5220/0007257803110318

@conference{visapp19,
author={Dichao Liu. and Yu Wang. and Jien Kato.},
title={Supervised Spatial Transformer Networks for Attention Learning in Fine-grained Action Recognition},
booktitle={Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019) - Volume 4: VISAPP},
year={2019},
pages={311-318},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007257803110318},
isbn={978-989-758-354-4},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019) - Volume 4: VISAPP
TI - Supervised Spatial Transformer Networks for Attention Learning in Fine-grained Action Recognition
SN - 978-989-758-354-4
IS - 2184-4321
AU - Liu, D.
AU - Wang, Y.
AU - Kato, J.
PY - 2019
SP - 311
EP - 318
DO - 10.5220/0007257803110318
PB - SciTePress