Authors:
Emilie Mathian
1
;
2
;
Huidong Liu
3
;
Lynnette Fernandez-Cuesta
1
;
Dimitris Samaras
4
;
Matthieu Foll
1
and
Liming Chen
2
Affiliations:
1
International Agency for Research on Cancer (IARC-WHO), Lyon, France
;
2
Ecole Centrale de Lyon, Ecully, France
;
3
Amazon, WA, U.S.A.
;
4
Stony Brook University, New York, U.S.A.
Keyword(s):
Anomaly Detection, HaloNet, Transformer, Auto-Encoder.
Abstract:
Unsupervised anomaly detection and localization is a crucial task in many applications, e.g., defect detection in industry, cancer localization in medicine, and requires both local and global information as enabled by the self-attention in Transformer. However, brute force adaptation of Transformer, e.g., ViT, suffers from two issues: 1) the high computation complexity, making it hard to deal with high-resolution images; and 2) patch-based tokens, which are inappropriate for pixel-level dense prediction tasks, e.g., anomaly localization,and ignores intra-patch interactions. We present HaloAE, the first auto-encoder based on a local 2D version of Transformer with HaloNet allowing intra-patch correlation computation with a receptive field covering 25% of the input image. HaloAE combines convolution and local 2D block-wise self-attention layers and performs anomaly detection and segmentation through a single model. Moreover, because the loss function is generally a weighted sum of sever
al losses, we also introduce a novel dynamic weighting scheme to better optimize the learning of the model. The competitive results on the MVTec dataset suggest that vision models incorporating Transformer could benefit from a local computation of the self-attention operation, and its very low computational cost and pave the way for applications on very large images a
(More)