Figure 10: Change of accuracy when input images are trans-
formed by background replacement and puzzle. First the
non-object part of the images have been replaced by a black
color. Afterwards the puzzle transformation has been ap-
plied with puzzle resolutions ranging from 1 (original im-
age) to 4 (4 × 4 puzzle).
rotations are widely used as data augmentation meth-
ods. On the other hand, this observation is consistent
with the results of (Engstrom et al., 2018), although
they evaluated much smaller rotation angles. Given
the relatively good performance of the evaluated net-
works at multiples of ±90
◦
one might suspect that the
lack of blank image areas at these angles might be the
underlying cause for these irregularities. However,
the results obtained from the combination of rotation
and background replacement contradict this assump-
tion.
Regarding the results obtained by the collage
transformation experiment, ResNeXt’s high accuracy
suggests that this model has an improved ability to
recognize the patterns of an object independent of the
object’s environment or background. This is consis-
tent with the good performance of ResNeXt in the
background replacement experiment. AlexNet on the
other hand is hardly able to detect more than one ob-
ject class simultaneously. This might indicate that
AlexNet depends more heavily on patterns that are
present in the image background to perform a correct
classification.
With regard to the puzzle transformation experi-
ment, the more recent models show a higher robust-
ness as well. This may be related to the fact that they
are already more robust against translations of objects
in the image plane and therefore have less problems
with a changed spatial arrangement of the image. Yet,
this could also be an indication for a detection behav-
ior that is more specialized in local patterns. For fur-
ther insights, puzzles with an even smaller fragmen-
tation of the input images could be constructed, while
simultaneously controlling the extend of fragmenta-
tion of the object itself.
Further, more extensive experiments may use re-
duced step sizes for the transformation parameters to
facilitate a more fine grained comparison with previ-
ous publications. For instance, (Azulay and Weiss,
2018) have shown that translations of even a few pix-
els can lead to significant performance drops in some
architectures. Therefore, using smaller step sizes on
the transformations evaluated in our work may facili-
tate a more detailed reasoning about the robustness of
the networks.
6 CONCLUSION
Our results show that the more recent architectures
VGG19 and ResNeXt appear to have an increased ro-
bustness against many kinds of image transformations
including color changes and background replacement.
However, invariance towards rotational transforma-
tions of the input appears to remain problematic as
these transformations cause a significant decrease in
recognition performance of all evaluated models.
The collage and puzzle transformations intro-
duced here appear to be suitable benchmarks to
further investigate the strengths and weaknesses of
CNNs as they were able to reveal markedly different
abilities among the evaluated architectures.
REFERENCES
Azulay, A. and Weiss, Y. (2018). Why do deep convolu-
tional networks generalize so poorly to small image
transformations? arXiv preprint arXiv:1805.12177.
Barbu, A., Mayo, D., Alverio, J., Luo, W., Wang, C.,
Gutfreund, D., Tenenbaum, J., and Katz, B. (2019).
ObjectNet: A large-scale bias-controlled dataset for
pushing the limits of object recognition models. In
Wallach, H., Larochelle, H., Beygelzimer, A., Alch
´
e-
Buc, F. d., Fox, E., and Garnett, R., editors, Advances
in Neural Information Processing Systems 32, pages
9448–9458. Curran Associates, Inc.
Bay, H., Tuytelaars, T., and Van Gool, L. (2006). Surf:
Speeded up robust features. In Leonardis, A., Bischof,
H., and Pinz, A., editors, Computer Vision – ECCV
2006, pages 404–417, Berlin, Heidelberg. Springer
Berlin Heidelberg.
Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., and
Madry, A. (2018). A rotation and a translation suffice:
Fooling CNNs with simple transformations. arXiv
preprint arXiv:1712.02779.
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014). Ex-
plaining and harnessing adversarial examples. arXiv
preprint arXiv:1412.6572.
Hendrycks, D. and Dietterich, T. (2019). Benchmarking
neural network robustness to common corruptions and
perturbations.
NCTA 2020 - 12th International Conference on Neural Computation Theory and Applications
402