Shuffle Mixing: An Efficient Alternative to Self Attention
Ryouichi Furukawa, Kazuhiro Hotta
2023
Abstract
In this paper, we propose ShuffleFormer, which replaces Transformer’s Self Attention with the proposed shuffle mixing. ShuffleFormer can be flexibly incorporated as the backbone of conventional visual recognition, precise prediction, etc. Self Attention can learn globally and dynamically, while shuffle mixing employs Depth Wise Convolution to learn locally and statically. Depth Wise Convolution does not consider the relationship between channels because convolution is applied to each channel individually. Therefore, shuffle mixing can obtain the information on different channels without changing the computational cost by inserting a shift operation in the spatial direction of the channel direction components. However, by using the shift operation, the amount of spatial components obtained is less than that of Depth Wise Convolution. ShuffleFormer uses overlapped patch embedding with a kernel larger than the stride width to reduce the resolution, thereby eliminating the disadvantages of using the shift operation by extracting more features in the spatial direction. We evaluated ShuffleFormer on ImageNet-1K image classification and ADE20K semantic segmentation. ShuffleFormer has superior results compared to Swin Transformer. In particular, ShuffleFormer-Base/Light outperforms Swin-Base in accuracy at about two-thirds of the computational cost.
DownloadPaper Citation
in Harvard Style
Furukawa R. and Hotta K. (2023). Shuffle Mixing: An Efficient Alternative to Self Attention. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP; ISBN 978-989-758-634-7, SciTePress, pages 700-707. DOI: 10.5220/0011720200003417
in Bibtex Style
@conference{visapp23,
author={Ryouichi Furukawa and Kazuhiro Hotta},
title={Shuffle Mixing: An Efficient Alternative to Self Attention},
booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP},
year={2023},
pages={700-707},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011720200003417},
isbn={978-989-758-634-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP
TI - Shuffle Mixing: An Efficient Alternative to Self Attention
SN - 978-989-758-634-7
AU - Furukawa R.
AU - Hotta K.
PY - 2023
SP - 700
EP - 707
DO - 10.5220/0011720200003417
PB - SciTePress