posed technique of traffic sign classification with the
existing region proposal networks (RPN) to perform
efficient real-time detection of traffic signs. We also
intend to fine tune this method for classifying Indian
traffic signs.
REFERENCES
Abedin, M. Z., Dhar, P., and Deb, K. (2016). Traffic sign
recognition using hybrid features descriptor and ar-
tificial neural network classifier. In 2016 19th In-
ternational Conference on Computer and Information
Technology (ICCIT), pages 457–462.
Bay, H., Tuytelaars, T., and Van Gool, L. (2006). Surf:
Speeded up robust features. In Leonardis, A., Bischof,
H., and Pinz, A., editors, Computer Vision – ECCV
2006, pages 404–417, Berlin, Heidelberg. Springer
Berlin Heidelberg.
Blauth, M., Kraft, E., Hirschenberger, F., and B
¨
ohm, M.
Large-scale traffic sign recognition based on local fea-
tures and color segmentation.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gra-
dients for human detection. In Computer Vision and
Pattern Recognition, 2005. CVPR 2005. IEEE Com-
puter Society Conference on, volume 1, pages 886–
893. IEEE.
Elmoogy, A. M., Dong, X., and Lu, T. (2018). Cnn : a
descriptor enhanced convolutional neural network.
Fei-Fei, L., Fergus, R., and Perona, P. (2004). Learning gen-
erative visual models from few training examples: An
incremental bayesian approach tested on 101 object
categories. In 2004 Conference on Computer Vision
and Pattern Recognition Workshop, pages 178–178.
Garc
´
ıa,
´
A. A.,
´
Alvarez, J. A., and Soria-Morillo, L. M.
(2018). Deep neural network for traffic sign recogni-
tion systems: An analysis of spatial transformers and
stochastic optimisation methods. Neural networks :
the official journal of the International Neural Net-
work Society, 99:158–165.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep
residual learning for image recognition. CoRR,
abs/1512.03385.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in neural information process-
ing systems, pages 1097–1105.
Schmitt, D. and McCoy, N. (2011). Object classification
and localization using surf descriptors.
Sermanet, P. and LeCun, Y. (2011). Traffic sign recogni-
tion with multi-scale convolutional networks. In Neu-
ral Networks (IJCNN), The 2011 International Joint
Conference on, pages 2809–2813. IEEE.
Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C.
(2011). The german traffic sign recognition bench-
mark: a multi-class classification competition. In Neu-
ral Networks (IJCNN), The 2011 International Joint
Conference on, pages 1453–1460. IEEE.
Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C.
(2012). Man vs. computer: Benchmarking machine
learning algorithms for traffic sign recognition. Neu-
ral Networks, (0):–.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E.,
Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-
novich, A. (2014). Going deeper with convolutions.
CoRR, abs/1409.4842.
APPENDIX
HOG features and SURF are computed using the
OpenCV library. The parameters used during HOG
computation are window size = (32,32), block size =
(8,8), block stride = (4,4), cell size = (4,4), number of
bins = 9. All experiments were performed using Ten-
sorflow. Dropout was used in all convolutional layers
with a probability of 0.6. Adam Optimizer, with an
initial learning rate of 1e-4 and default parameters,
was used.
Basic Architecture
The basic architecture was implemented entirely in
Tensorflow. With reference to Figure 6, Conv C K S
refers to a convolution layer with ’C’ output channels,
’KxK’ kernel size and an ’SxS’ strided convolution.
All Pool layers are Max-Pool layers with a kernel of
2x2 and stride of 2x2. This reduces the input size by a
factor of 2 at each stage. A ReLU activation is applied
after each block. We implement Batch Normalization
after each convolutional block with the default scale
and shifting parameters in tensorflow. Batch Normal-
ization is not applied to fully connected layers. The
output of the network is the Softmax activation prob-
abilities over the 43 classes of the GTSRB dataset.
The loss function used is as follows:-
L(y, y
0
) = −
∑
y
0
log(y) (5)
where y are the labels for classification and y’ are
the predictions made (the logits). This represents the
cross-entropy loss for multiple classes.
Branched CNN Architecture
The Branched Architecture was implemented in a
similar fashion to the Basic Architecture with similar
default parameters. Batch Normalization was applied
only after each convolutional block and not on any
fully connected layer.
Traffic Sign Classification using Hybrid HOG-SURF Features and Convolutional Neural Networks
619