Kingma, D. P. and Welling, M. (2014). Auto-encoding vari-
ational Bayes. In International Conference on Learn-
ing Representations (ICLR).
Ko, J. H., Mudassar, B., Na, T., and Mukhopad-
hyay, S. (2017). Design of an energy-efficient ac-
celerator for training of convolutional neural net-
works using frequency-domain computation. In
54th ACM/EDAC/IEEE Design Automation Confer-
ence (DAC).
Kosiorek, A. R., Kim, H., Posner, I., and Teh, Y. W. (2018).
Sequential attend, infer, repeat: Generative modelling
of moving objects. In International Conference on
Neural Information Processing Systems (NeurIPS).
Kosiorek, A. R., Sabour, S., Teh, Y. W., and Hinton, G. E.
(2019). Stacked capsule autoencoders. In Interna-
tional Conference on Neural Information Processing
Systems (NeurIPS).
Kumar, N., Verma, R., and Sethi, A. (2017). Convolutional
neural networks for wavelet domain super resolution.
Pattern Recognition Letters, 90:65–71.
Kurin, V., Nowozin, S., Hofmann, K., Beyer, L., and
Leibe, B. (2017). The Atari grand challenge dataset.
arXiv:1705.10998.
Lee, A. B., Mumford, D., and Huang, J. (2001). Occlu-
sion models for natural images: A statistical study of a
scale-invariant dead leaves model. International Jour-
nal of Computer Vision, 41(1):35–59.
Lin, C.-H., Yumer, E., Wang, O., Shechtman, E., and Lucey,
S. (2018). St-gan: Spatial transformer generative ad-
versarial networks for image compositing. In IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 9455–9464.
Lin, Z., Wu, Y.-F., Peri, S. V., Sun, W., Singh, G., Deng, F.,
Jiang, J., and Ahn, S. (2020). Space: Unsupervised
object-oriented scene representation via spatial atten-
tion and decomposition. In International Conference
on Learning Representations (ICLR).
Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S.,
Sch
¨
olkopf, B., and Bachem, O. (2019). Challenging
common assumptions in the unsupervised learning of
disentangled representations. In International Confer-
ence on Machine Learning (ICML), pages 4114–4124.
Locatello, F., Weissenborn, D., Unterthiner, T., Mahendran,
A., Heigold, G., Uszkoreit, J., Dosovitskiy, A., and
Kipf, T. (2020). Object-centric learning with slot at-
tention. In International Conference on Neural Infor-
mation Processing Systems (NeurIPS).
Matheron, G. (1968). Sch
´
ema bool
´
een s
´
equentiel de par-
tition al
´
eatoire. N-83 CMM, Paris School of Mines
publications.
Mathieu, M., Henaff, M., and LeCun, Y. (2014). Fast train-
ing of convolutional networks through FFTs. In In-
ternational Conference on Learning Representations
(ICLR).
Monnier, T., Groueix, T., and Aubry, M. (2020). Deep
transformation-invariant clustering. In International
Conference on Neural Information Processing Sys-
tems (NeurIPS).
Monnier, T., Vincent, E., Ponce, J., and Aubry, M. (2021).
Unsupervised Layered Image Decomposition into Ob-
ject Prototypes. In IEEE/CVF International Confer-
ence on Computer Vision (ICCV).
Paschalidou, D., Gool, L. V., and Geiger, A. (2020). Learn-
ing unsupervised hierarchical part decomposition of
3D objects from a single RGB image. In IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 1060–1070.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E.,
DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and
Lerer, A. (2017). Automatic differentiation in Py-
Torch. In International Conference on Neural Infor-
mation Processing Systems Workshops (NeurIPS-W).
Proakis, J. G. and Manolakis, D. G. (2004). Digital signal
processing. PHI Publication: New Delhi, India.
Reddy, B. S. and Chatterji, B. N. (1996). An FFT-based
technique for translation, rotation, and scale-invariant
image registration. IEEE Transactions on Image Pro-
cessing, 5(8):1266–1271.
Rudin, L. I. and Osher, S. (1994). Total variation based im-
age restoration with free local constraints. In Proceed-
ings of 1st International Conference on Image Pro-
cessing, volume 1, pages 31–35. IEEE.
Sbai, O., Couprie, C., and Aubry, M. (2020). Unsupervised
image decomposition in vector layers. In IEEE In-
ternational Conference on Image Processing (ICIP),
pages 1576–1580. IEEE.
Stanic, A., Van Steenkiste, S., and Schmidhuber, J. (2021).
Hierarchical relational inference. In Conference on
Artificial Intelligence (AAAI), pages 9730–9738.
Veerapaneni, R., Co-Reyes, J. D., Chang, M., Janner, M.,
Finn, C., Wu, J., Tenenbaum, J., and Levine, S.
(2020). Entity abstraction in visual model-based rein-
forcement learning. In Conference on Robot Learning
(CoRL), pages 1439–1456. PMLR.
Villar-Corrales, A., Schirrmacher, F., and Riess, C.
(2021). Deep learning architectural designs for super-
resolution of noisy images. In IEEE International
Conference on Acoustics, Speech and Signal Process-
ing (ICASSP), pages 1635–1639.
Weis, M. A., Chitta, K., Sharma, Y., Brendel, W., Bethge,
M., Geiger, A., and Ecker, A. S. (2021). Benchmark-
ing unsupervised object representations for video se-
quences. volume 22, pages 1–61.
Wolter, M., Yao, A., and Behnke, S. (2020). Object-
centered fourier motion estimation and segment-
transformation prediction. In European Symposium
on Artificial Neural Networks, Computational Intelli-
gence and Machine Learning (ESANN).
Xu, K., Qin, M., Sun, F., Wang, Y., Chen, Y.-K., and Ren,
F. (2020). Learning in the frequency domain. In
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 1740–1749.
Yang, J., Kannan, A., Batra, D., and Parikh, D. (2017). LR-
GAN: Layered recursive generative adversarial net-
works for image generation. arXiv:1703.01560.
Zhang, Y., Tsang, I. W., Luo, Y., Hu, C.-H., Lu, X., and Yu,
X. (2020). Copy and paste GAN: Face hallucination
from shaded thumbnails. In IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 7355–7364.
Unsupervised Image Decomposition with Phase-Correlation Networks
233