Authors:
Ryota Sato
1
;
Suzana Beleza
1
;
Erica Shimomoto
2
;
Matheus Silva de Lima
1
;
Nobuko Kato
3
and
Kazuhiro Fukui
1
Affiliations:
1
University of Tsukuba, Department of Computer Science, Tsukuba, Ibaraki, Japan
;
2
National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
;
3
Tsukuba University of Technology, Faculty of Industrial Technology, Tsukuba, Ibaraki, Japan
Keyword(s):
Sign Language Recognition, 3D Fast Fourier Transform, Product Grassmann Manifold, Subspace-Based Methods.
Abstract:
This paper proposes a subspace-based method for sign language recognition in videos. Typical subspace-based methods represent a video as a low-dimensional subspace generated by applying principal component analysis (PCA) to a set of images from the video. Such representation is compact and practical for motion recognition under few learning data. However, given the complex motion and structure in sign languages, subspace-based methods need to improve performance as they do not consider temporal information like the order of frames. To address this issue, we propose processing time-domain information on the frequency-domain by applying the three-dimensional fast Fourier transform (3D-FFT) to sign videos, where a sign video is represented as a 3D amplitude spectrum tensor, which is invariant to deviations in the spatial and temporal directions of target objects. Further, a 3D amplitude spectral tensor is regarded as one point on the Product Grassmann Manifold (PGM). By unfolding the te
nsor in all three dimensions, PGM can account for the temporal information. Finally, we calculate video similarity by using the distances between two corresponding points on the PGM. The effectiveness of the proposed method is demonstrated on private and public sign language recognition datasets, showing a significant performance improvement over conventional subspace-based methods.
(More)