Bergmann, P., L
¨
owe, S., Fauser, M., Sattlegger, D., and Ste-
ger, C. (2019). Improving unsupervised defect seg-
mentation by applying structural similarity to autoen-
coders. Proceedings of the 14th International Joint
Conference on Computer Vision, Imaging and Com-
puter Graphics Theory and Applications.
Essid, O., Samir, C., and Hamid, L. (2018). Automatic de-
tection and classification of manufacturing defects in
metal boxes. PLOS ONE.
Gondara, L. (2016). Medical image denoising using con-
volutional denoising autoencoders. 2016 IEEE 16th
International Conference on Data Mining Workshops
(ICDMW), pages 241–246.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2014). Generative adversarial networks. Ad-
vances in Neural Information Processing Systems, 3.
Hidalgo, I. and Kami
´
nski, J. (2011). The iron and steel
industry: A global market perspective. Mineral Re-
sources Management.
Kholief, E., Darwish, S., and Fors, M. (2017). Detection of
steel surface defect based on machine learning using
deep auto-encoder network. 7th International Confer-
ence on Industrial Engineering and Operations Man-
agement( IEOM).
Kingma, D. and Ba, J. (2014). Adam: A method for
stochastic optimization. International Conference on
Learning Representations.
Kingma, D. P. and Welling, M. (2014). Auto-encoding vari-
ational bayes. CoRR, abs/1312.6114.
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and
Frey, B. (2016). Adversarial autoencoders. ArXiv,
abs/1511.05644.
Manakov, I., Rohm, M., and Tresp, V. (2019). Walking
the tightrope: An investigation of the convolutional
autoencoder bottleneck. ArXiv, abs/1911.07460.
Pang, G., Shen, C., Cao, L., and Hengel, A. V. D. (2021).
Deep learning for anomaly detection. ACM Comput-
ing Surveys, 54(2):1–38.
Pereira, J. and Silveira, M. (2018). Unsupervised anomaly
detection in energy time series data using variational
recurrent autoencoders with attention. 2018 17th
IEEE International Conference on Machine Learning
and Applications (ICMLA), pages 1275–1282.
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid,
I., and Savarese, S. (2019). Generalized intersection
over union: A metric and a loss for bounding box re-
gression. 2019 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), pages 658–
666.
Rocca, J. (2019). Understanding variational autoencoders
(vaes) - building, step by step, the reasoning that leads
to vaes.
Rumelhart, D. E. and McClelland, J. L. (1987). Learning
internal representations by error propagation. Paral-
lel Distributed Processing: Explorations in the Mi-
crostructure of Cognition: Foundations, pages 318–
362.
Simard, P., Steinkraus, D., and Platt, J. (2003). Best prac-
tices for convolutional neural networks applied to vi-
sual document analysis. Seventh International Con-
ference on Document Analysis and Recognition, 2003.
Proceedings., pages 958–963.
Ulku, I. and Akagunduz, E. (2019). A survey on deep
learning-based architectures for semantic segmenta-
tion on 2d images. arXiv: Computer Vision and Pat-
tern Recognition.
Yang, J., Li, S., Wang, Z., Dong, H., Wang, J., and Tang,
S. (2020). Using deep learning to detect defects in
manufacturing: A comprehensive survey and current
challenges. Materials, 13:5755.
APPENDIX
Table 2: Encoder and decoder architecture used for our syn-
thetic datasets (d = 16) and the industrial datasets. n
c
= 3
for the surface normal images of material 1 and n
c
= 1 for
the albedo images of material 1 and gradient and curvature
images of material 2.
Layer Resolution Channels Input Activ. func.
Synthetic dataset
Encoder
conv1 W × H/W × H 1/16 Image Leaky ReLU
conv2
W
2
×
H
2
/
W
2
×
H
2
16/32 conv1 Leaky ReLU
conv3
W
4
×
H
4
/
W
4
×
H
4
32/64 conv2 Leaky ReLU
conv4
W
4
×
H
4
/
W
d
×
H
d
64/64 conv3 Leaky ReLU
conv5
W
d
×
H
d
/
W
16
×
H
d
64/16 conv4 Leaky ReLU
Bottleneck
Decoder
deconv1
W
d
×
H
d
/
W
d
×
H
d
16/64 lin4 Leaky ReLU
deconv2
W
d
×
H
d
/
W
4
×
H
4
64/64 deconv1 Leaky ReLU
deconv3
W
4
×
H
4
/
W
2
×
H
2
64/32 deconv2 Leaky ReLU
deconv4
W
2
×
H
2
/W × H 32/16 deconv3 Leaky ReLU
deconv5 W × H /W × H 16/1 deconv4 Sigmoid
Industrial dataset
Encoder
conv1 W × H/W × H n
c
/16 Image Leaky ReLU
conv2 W × H/W × H /
W
2
×
H
2
16/32 conv1 Leaky ReLU
conv3
W
2
×
H
2
/
W
4
×
H
4
32/64 conv2 Leaky ReLU
conv4
W
4
×
H
4
/
W
8
×
H
8
64/64 conv3 Leaky ReLU
conv5
W
8
×
H
8
/
W
16
×
H
16
64/64 conv4 Leaky ReLU
conv6
W
16
×
H
16
/
W
16
×
H
16
64/16 conv5 Leaky ReLU
Bottleneck
Decoder
deconv1
W
16
×
H
16
/
W
16
×
H
16
16/64 lin4 Leaky ReLU
deconv2
W
16
×
H
16
/
W
8
×
H
8
64/64 deconv1 Leaky ReLU
deconv3
W
8
×
H
8
/
W
4
×
H
4
64/64 deconv2 Leaky ReLU
deconv4
W
4
×
H
4
/
W
2
×
H
2
64/32 deconv3 Leaky ReLU
deconv5
W
2
×
H
2
/W × H 32/16 deconv4 Leaky ReLU
deconv6 W × H /W × H 16/n
c
deconv5 Sigmoid
Table 3: Bottleneck architectures used for our synthetic
datasets (conv
b
= conv5, r = 1024, n
z
= 100) and the in-
dustrial datasets (conv
b
= conv6, r = 1024, n
z
= 100).
AE type Layer Resolution Input Activ. func.
CAE/AAE
lin1 1 ×
W
16
∗
H
16
∗ 16 / 1 × 1024 conv
b
(flattened) Leaky ReLU
lin2 1× r / 1 × n
z
lin1 -
lin3 1× n
z
/ 1 × r lin2 Leaky ReLU
lin4 1 × r / 1 ×
W
16
∗
H
16
∗ 16 lin3 Leaky ReLU
VAE
lin1 1 ×
W
16
∗
H
16
∗ 16 / 1 × 1024 conv
b
(flattened) Leaky ReLU
lin21 1×r / 1 × n
z
lin1 -
lin22 1×r / 1 × n
z
lin1 -
lin3 1× n
z
/ 1 × r lin21, lin22 (reparam.) Leaky ReLU
lin4 1 × r / 1 ×
W
16
∗
H
16
∗ 16 lin3 Leaky ReLU
Table 4: Architecture of the C
AAE
used for all our datasets.
Layer Resolution Channels Input Activ. func.
lind1 1 × n
z
/ 1 × 10 - lin2 Leaky ReLU
lind2 1 × 10 / 1 × 1 - lind1 Leaky ReLU
Analysis of the Future Potential of Autoencoders in Industrial Defect Detection
289