Shuffle Mixing: An Efficient Alternative to Self Attention

Ryouichi Furukawa, Kazuhiro Hotta

2023

Abstract

In this paper, we propose ShuffleFormer, which replaces Transformer’s Self Attention with the proposed shuffle mixing. ShuffleFormer can be flexibly incorporated as the backbone of conventional visual recognition, precise prediction, etc. Self Attention can learn globally and dynamically, while shuffle mixing employs Depth Wise Convolution to learn locally and statically. Depth Wise Convolution does not consider the relationship between channels because convolution is applied to each channel individually. Therefore, shuffle mixing can obtain the information on different channels without changing the computational cost by inserting a shift operation in the spatial direction of the channel direction components. However, by using the shift operation, the amount of spatial components obtained is less than that of Depth Wise Convolution. ShuffleFormer uses overlapped patch embedding with a kernel larger than the stride width to reduce the resolution, thereby eliminating the disadvantages of using the shift operation by extracting more features in the spatial direction. We evaluated ShuffleFormer on ImageNet-1K image classification and ADE20K semantic segmentation. ShuffleFormer has superior results compared to Swin Transformer. In particular, ShuffleFormer-Base/Light outperforms Swin-Base in accuracy at about two-thirds of the computational cost.

Download


Paper Citation


in Harvard Style

Furukawa R. and Hotta K. (2023). Shuffle Mixing: An Efficient Alternative to Self Attention. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP; ISBN 978-989-758-634-7, SciTePress, pages 700-707. DOI: 10.5220/0011720200003417


in Bibtex Style

@conference{visapp23,
author={Ryouichi Furukawa and Kazuhiro Hotta},
title={Shuffle Mixing: An Efficient Alternative to Self Attention},
booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP},
year={2023},
pages={700-707},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011720200003417},
isbn={978-989-758-634-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP
TI - Shuffle Mixing: An Efficient Alternative to Self Attention
SN - 978-989-758-634-7
AU - Furukawa R.
AU - Hotta K.
PY - 2023
SP - 700
EP - 707
DO - 10.5220/0011720200003417
PB - SciTePress