
architecture works as intended. And although we
outperformed the backbone networks in almost every
case, there is room for improvement.
An improvement might be a more sophisticated
grouping algorithm. Our grouping algorithm often
produces group trees where, for example, one group
has 40 classes while others have only a few. Although
it’s probably impossible to construct a completely bal-
anced tree, because some classes are more distinct
while another large set of classes are more similar,
we could improve our algorithm to take into account
how balanced the hierarchy tree is.
Regarding training, since we used the same opti-
mizer, learning rate schedule, and weight decay as for
the backbone models, it is very likely that what works
for the baseline models is not optimal for HierNet, so
we could also investigate the training settings more.
Finally, it might be useful to investigate which
features are extracted by the shared edges and which
features are extracted by the edges of the individual
groups. We could visualize this with an approach sim-
ilar to the one described in (Zeiler and Fergus, 2014).
REFERENCES
Clevert, D.-A., Unterthiner, T., and Hochreiter, S.
(2015). Fast and accurate deep network learning
by exponential linear units (elus). arXiv preprint
arXiv:1511.07289.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep
Residual Learning for Image Recognition. In Pro-
ceedings of 2016 IEEE Conference on Computer Vi-
sion and Pattern Recognition, CVPR ’16, pages 770–
778. IEEE.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Ac-
celerating deep network training by reducing internal
covariate shift. In International conference on ma-
chine learning, pages 448–456. PMLR.
Ji, R., Wen, L., Zhang, L., Du, D., Wu, Y., Zhao, C., Liu,
X., and Huang, F. (2020). Attention convolutional bi-
nary neural tree for fine-grained visual categorization.
In Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pages 10468–
10477.
Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple
layers of features from tiny images.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
Imagenet classification with deep convolutional neu-
ral networks. In Pereira, F., Burges, C. J. C., Bottou,
L., and Weinberger, K. Q., editors, Advances in Neu-
ral Information Processing Systems 25, pages 1097–
1105. Curran Associates, Inc.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. In Proceedings of the IEEE, volume 86, pages
2278–2324.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh,
S., Ma, S., Huang, Z., Karpathy, A., Khosla, A.,
Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015).
ImageNet Large Scale Visual Recognition Challenge.
International Journal of Computer Vision (IJCV),
115(3):211–252.
Shah, A., Kadam, E., Shah, H., Shinde, S., and Shingade,
S. (2016). Deep residual networks with exponential
linear unit. In Proceedings of the third international
symposium on computer vision and the internet, pages
59–65.
Simonyan, K. and Zisserman, A. (2015). Very deep con-
volutional networks for large-scale image recognition.
In Bengio, Y. and LeCun, Y., editors, 3rd Interna-
tional Conference on Learning Representations, ICLR
2015, San Diego, CA, USA, May 7-9, 2015, Confer-
ence Track Proceedings.
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model
scaling for convolutional neural networks. In Chaud-
huri, K. and Salakhutdinov, R., editors, Proceedings of
the 36th International Conference on Machine Learn-
ing, volume 97 of Proceedings of Machine Learning
Research, pages 6105–6114. PMLR.
Tanno, R., Arulkumaran, K., Alexander, D., Criminisi, A.,
and Nori, A. (2019). Adaptive neural trees. In In-
ternational Conference on Machine Learning, pages
6166–6175. PMLR.
Yan, Z., Zhang, H., Piramuthu, R., Jagadeesh, V., DeCoste,
D., Di, W., and Yu, Y. (2015). Hd-cnn: Hierarchical
deep convolutional neural networks for large scale vi-
sual recognition. In 2015 IEEE International Confer-
ence on Computer Vision (ICCV), pages 2740–2748.
Zeiler, M. D. and Fergus, R. (2014). Visualizing and under-
standing convolutional networks. In European confer-
ence on computer vision, pages 818–833. Springer.
Zhu, X. and Bain, M. (2017). B-cnn: branch convolutional
neural network for hierarchical classification. arXiv
preprint arXiv:1709.09890.
HierNet: Image Recognition with Hierarchical Convolutional Networks
155