5 CONCLUSION
We present generalized dilation neural networks, a
sub-module within the CNN architecture augmented
with dilated filters. The generalization to the exist-
ing framework is provided by two extensions. First,
the fixed dilation filters are made learnable by intro-
ducing an alternative continuous representation of the
dilation operation using masking vectors or matrices
which can be trained by standard gradient descent op-
timizers. To this end, we introduce a novel barrier
function approach together with a suitable initializa-
tion scheme to account for the constraints imposed
on the masking parameters. According to the au-
thors, this is the first study which makes use of such
techniques for constraining the dilation operation in a
CNN. Second, we generalize the fixed structure of di-
lation kernels to arbitrary structures, allowing for an
arbitrary coverage of the input space. We provide ex-
perimental evidence by testing the proposed architec-
ture on two benchmark image recognition data-sets.
The learned masking maps and distributions point to
a discrete optimization of parameters using continu-
ous gradient descent methods.
Although not presented in the results, the gener-
alized dilations can also be applied on the whole in-
put image rather than a receptive field, leading to a
form of barrier function based attention. Using this
architecture as the very first layer of a CNN with an
receptive field size equal to the input size, the layer
is forced to select only certain discriminative pixels
from the actual input. Also part of the future work is
the use of barrier function for selecting a certain num-
ber of channels in the convolution layers where com-
plete channels will be masked out from training lead-
ing to a sparse network. On the application side, we
will experiment with GDNN on domains like image
segmentation, object detection and sequential mod-
elling. Such tasks are suitable as the networks need
to produce even denser predictions, for e.g. a predic-
tion for each pixel.
REFERENCES
Bai, S., Kolter, J. Z., and Koltun, V. (2018). An em-
pirical evaluation of generic convolutional and recur-
rent networks for sequence modeling. arXiv preprint
arXiv:1803.01271.
Boyd, S. P. and Vandenberghe, L. (2004). Convex optimiza-
tion. Cambridge University Press, Cambridge.
Carl Lemaire, Andrew Achkar, and Pierre-Marc Jodoin
(2018). Structured pruning of neural networks with
budget-aware regularization. 2019 IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 9100–9108.
Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H. Re-
thinking atrous convolution for semantic image seg-
mentation.
Chen, L.-C., Yang, Y., Wang, J., Xu, W., and Yuille, A. L.
(2016). Attention to scale: Scale-aware semantic im-
age segmentation. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 3640–3649.
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and
Wei, Y. (2017). Deformable convolutional networks.
In Proceedings of the IEEE international conference
on computer vision, pages 764–773.
Fisher Yu and Vladlen Koltun (2016). Multi-scale context
aggregation by dilated convolutions. In ICLR.
Fukushima, K. and Miyake, S. (1982). Neocognitron: A
self-organizing neural network model for a mecha-
nism of visual pattern recognition. In Competition and
cooperation in neural nets, pages 267–285. Springer.
Gupta, A. and Rush, A. M. (2017). Dilated convolu-
tions for modeling long-distance genomic dependen-
cies. bioRxiv.
Hasanpour, S. H., Rouhani, M., Fayyaz, M., and Sabokrou,
M. (2016). Lets keep it simple, using simple architec-
tures to outperform deeper and more complex archi-
tectures. arXiv preprint arXiv:1608.06037.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
He, Y., Keuper, M., Schiele, B., and Fritz, M. (2017).
Learning dilation factors for semantic segmentation
of street scenes. In German Conference on Pattern
Recognition, pages 41–51.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural computation, 9(8):1735–1780.
Holschneider, M., Kronland-Martinet, R., Morlet, J., and
Tchamitchian, P. (1990). A real-time algorithm for
signal analysis with the help of the wavelet transform.
In Wavelets, pages 286–297. Springer.
Kingma, D. P. and Ba, J. L. (2014). Adam: Amethod for
stochastic optimization. In Proceedings of the 3rd In-
ternational Conference on Learning Representations
(ICLR).
Krizhevsky, A. and Hinton, G. (2010). Convolutional deep
belief networks on cifar-10. Unpublished manuscript,
40(7).
L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L.
Yuille (2018). Deeplab: Semantic image segmentation
with deep convolutional nets, atrous convolution, and
fully connected crfs. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 40(4):834–848.
LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D.,
Howard, R. E., Hubbard, W. E., and Jackel, L. D.
(1990). Handwritten digit recognition with a back-
propagation network. In Advances in neural informa-
tion processing systems, pages 396–404.
Generalized Dilation Structures in Convolutional Neural Networks
87