Skeleton-based Online Sign Language Recognition using Monotonic Attention

Natsuki Takayama, Gibran Benitez-Garcia, Hiroki Takahashi, Hiroki Takahashi

2022

Abstract

Sequence-to-sequence models have been successfully applied to improve continuous sign language word recognition in recent years. Although various methods for continuous sign language word recognition have been proposed, these methods assume offline recognition and lack further investigation in online and streaming situations. In this study, skeleton-based continuous sign language word recognition for online situations was investigated. A combination of spatial-temporal graph convolutional networks and recurrent neural networks with soft attention was employed as the base model. Further, three types of monotonic attention techniques were applied to extend the base model for online recognition. The monotonic attention included hard monotonic attention, monotonic chunkwise attention, and monotonic infinite lookback attention. The performance of the proposed models was evaluated in offline and online recognition settings. A conventional Japanese sign language video dataset, including 275 types of isolated word videos and 113 types of sentence videos, was utilized to evaluate the proposed models. The results showed that the effectiveness of monotonic attention to online continuous sign language word recognition.

Download


Paper Citation


in Harvard Style

Takayama N., Benitez-Garcia G. and Takahashi H. (2022). Skeleton-based Online Sign Language Recognition using Monotonic Attention. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 5: VISAPP; ISBN 978-989-758-555-5, SciTePress, pages 601-608. DOI: 10.5220/0010899400003124


in Bibtex Style

@conference{visapp22,
author={Natsuki Takayama and Gibran Benitez-Garcia and Hiroki Takahashi},
title={Skeleton-based Online Sign Language Recognition using Monotonic Attention},
booktitle={Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 5: VISAPP},
year={2022},
pages={601-608},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010899400003124},
isbn={978-989-758-555-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 5: VISAPP
TI - Skeleton-based Online Sign Language Recognition using Monotonic Attention
SN - 978-989-758-555-5
AU - Takayama N.
AU - Benitez-Garcia G.
AU - Takahashi H.
PY - 2022
SP - 601
EP - 608
DO - 10.5220/0010899400003124
PB - SciTePress