CoRR, abs/2010.11882.
Deng, L. (2012). The mnist database of handwritten digit
images for machine learning research. IEEE Signal
Processing Magazine, 29(6):141–142.
Farabet, C., Couprie, C., Najman, L., and LeCun, Y.
(2013). Learning hierarchical features for scene la-
beling. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 35(8):1915–1929.
Florack, L. M., ter Haar Romeny, B. M., Koenderink, J. J.,
and Viergever, M. A. (1992). Scale and the differen-
tial structure of images. Image and Vision Comput-
ing, 10(6):376–388. Information Processing in Medi-
cal Imaging.
Ghosh, R. and Gupta, A. K. (2019). Scale steerable filters
for locally scale-invariant convolutional neural net-
works. CoRR, abs/1906.03861.
Kanazawa, A., Sharma, A., and Jacobs, D. W. (2014). Lo-
cally scale-invariant convolutional neural networks.
CoRR, abs/1412.5104.
Kullback, S. and Leibler, R. A. (1951). On Information and
Sufficiency. The Annals of Mathematical Statistics,
22(1):79 – 86.
Lindeberg, T. (2020). Scale-covariant and scale-invariant
gaussian derivative networks. CoRR, abs/2011.14759.
Lindeberg, T. and Eklundh, J.-O. (1992). Scale-space pri-
mal sketch: construction and experiments. Image and
Vision Computing, 10(1):3–18.
Marcos, D., Kellenberger, B., Lobry, S., and Tuia, D.
(2018). Scale equivariance in cnns with vector fields.
CoRR, abs/1807.11783.
Naderi, H., Goli, L., and Kasaei, S. (2020). Scale equiv-
ariant cnns with scale steerable filters. In 2020 In-
ternational Conference on Machine Vision and Image
Processing (MVIP), pages 1–5.
Pintea, S. L., Tomen, N., Goes, S. F., Loog, M., and van
Gemert, J. C. (2021). Resolution learning in deep con-
volutional networks using scale-space theory. CoRR,
abs/2106.03412.
Saldanha, N., Pintea, S. L., van Gemert, J. C., and Tomen,
N. (2021). Frequency learning for structured cnn fil-
ters with gaussian fractional derivatives. BMVC.
Sosnovik, I., Moskalev, A., and Smeulders, A. W. M.
(2021). DISCO: accurate discrete scale convolutions.
CoRR, abs/2106.02733.
Sosnovik, I., Szmaja, M., and Smeulders, A. W. M.
(2019). Scale-equivariant steerable networks. CoRR,
abs/1910.11093.
Sun, Z. and Blu, T. (2023). Empowering networks with
scale and rotation equivariance using a similarity con-
volution.
Tomen, N., Pintea, S.-L., and Van Gemert, J. (2021). Deep
continuous networks. In International Conference on
Machine Learning, pages 10324–10335. PMLR.
Worrall, D. and Welling, M. (2019). Deep scale-spaces:
Equivariance over scale. In Wallach, H., Larochelle,
H., Beygelzimer, A., d'Alch
´
e-Buc, F., Fox, E., and
Garnett, R., editors, Advances in Neural Information
Processing Systems 32, pages 7364–7376. Curran As-
sociates, Inc.
Xu, Y., Xiao, T., Zhang, J., Yang, K., and Zhang, Z. (2014a).
Scale-invariant convolutional neural networks. CoRR,
abs/1411.6369.
Xu, Y., Xiao, T., Zhang, J., Yang, K., and Zhang, Z.
(2014b). Scale-invariant convolutional neural net-
works. CoRR, abs/1411.6369.
Yang, Y., Dasmahapatra, S., and Mahmoodi, S. (2023).
Scale-equivariant unet for histopathology image seg-
mentation.
Zhu, W., Qiu, Q., Calderbank, A. R., Sapiro, G., and
Cheng, X. (2019). Scale-equivariant neural net-
works with decomposed convolutional filters. CoRR,
abs/1909.11193.
APPENDIX
Dataset Description
Dynamic Scale MNIST. The Dynamic Scale
MNIST pads the original 28x28 images from the
MNIST dataset (Deng, 2012) to 168x168 pixels and
then on initialisation of the dataset, an independent
scale for each sample is drawn from the chosen scale
distribution. Only scales e larger than 1 are sampled
during training time to prevent the influence of in-
formation loss which occurs when downsampling the
data. Since each digit is upsampled upon accessing no
additional storage is needed to use this dataset for var-
ious scale distributions. After initialisation the dataset
is normalised.
Additionally, this dataset can also be used to eval-
uate across a range of scales by sampling each test
digit individually on multiple scales. The scales to
evaluate are rounded to the nearest half-octave of 2.
The number to evaluate is determined by the range of
octaves times 10. Thus for Fig. 4, 45 scales are sam-
pled between 2
−0.5
and 2
3.5
in a logarithmic manner.
The underlying MNIST dataset (Deng, 2012) is split
into 10k training samples, 5k validation samples, and
50k test samples and 3 different realisations are gen-
erated and fixed.
MNIST-Scale The images in the MNIST-Scale
dataset are rescaled images of the MNIST dataset
(Deng, 2012). The scales are sampled from a Uni-
form distribution in the range of 0.3 - 1.0 of the orig-
inal size and padded back to the original resolution
of 28x28 pixels. The dataset is split into 10k training
samples, 2k validation samples and 50k test samples
and 6 realisations are made.
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
574