dealing with more challenging dataset and supported
it with further experiments. The extension is to use M
feature vectors of size p as input to the recurrent struc-
ture instead of one feature vector. The proposed fra-
mework can easily adapt itself to other scenarios like
multi-label image classification without adding extra
layers to the network architecture.
ACKNOWLEDGEMENTS
This work has been partially supported by the Spa-
nish project TIN2016-74946-P (MINECO/FEDER,
UE) and CERCA Programme / Generalitat de Cata-
lunya.
REFERENCES
Bengio, Y., Courville, A., and Vincent, P. (2013). Represen-
tation learning: A review and new perspectives. IEEE
Trans. Pattern Anal. Mach. Intell., 35(8):1798–1828.
Buch, N., Orwell, J., and Velastin, S. A. (2009). 3d extended
histogram of oriented gradients (3dhog) for classifica-
tion of road users in urban scenes. In Proceedings of
the British Machine Vision Conference, pages 15.1–
15.11. BMVA Press. doi:10.5244/C.23.15.
Chen, Z. and Ellis, T. (2011). Multi-shape descriptor vehi-
cle classification for urban traffic. In 2011 Internatio-
nal Conference on Digital Image Computing: Techni-
ques and Applications, pages 456–461.
Cohen, J. (1960). A coefficient of agreement for nominal
scales. Educational and Psychological Measurement,
20(1):37–46.
Dalal, N. and Triggs, B. (2005). Histograms of oriented
gradients for human detection. In Proceedings of the
2005 IEEE Computer Society Conference on Compu-
ter Vision and Pattern Recognition (CVPR’05) - Vo-
lume 1 - Volume 01, CVPR ’05, pages 886–893, Wa-
shington, DC, USA. IEEE Computer Society.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). ImageNet: A Large-Scale Hierarchical
Image Database. In CVPR09.
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N.,
Tzeng, E., and Darrell, T. (2014). Decaf: A deep con-
volutional activation feature for generic visual recog-
nition. In International Conference in Machine Lear-
ning (ICML).
Dong, Z., Pei, M., He, Y., Liu, T., Dong, Y., and Jia, Y.
(2014). Vehicle type classification using unsupervi-
sed convolutional neural network. In 2014 22nd In-
ternational Conference on Pattern Recognition, pages
172–177.
Fei-Fei, L., Fergus, R., and Perona, P. (2007). Lear-
ning generative visual models from few training ex-
amples: An incremental bayesian approach tested on
101 object categories. Comput. Vis. Image Underst.,
106(1):59–70.
Gupte, S., Masoud, O., Martin, R. F., and Papanikolopou-
los, N. P. (2002). Detection and classification of vehi-
cles. Trans. Intell. Transport. Sys., 3(1):37–47.
Hasegawa, O. and Kanade, T. (2005). Type classification,
color estimation, and specific target detection of mo-
ving targets on public streets. Machine Vision and Ap-
plications, 16(2):116–121.
He, D., Lang, C., Feng, S., Du, X., and Zhang, C. (2015).
Vehicle detection and classification based on convolu-
tional neural network. In Proceedings of the 7th In-
ternational Conference on Internet Multimedia Com-
puting and Service, ICIMCS ’15, pages 3:1–3:5, New
York, NY, USA. ACM.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep re-
sidual learning for image recognition. In The IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR).
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural Comput., 9(8):1735–1780.
Hsieh, J.-W., Yu, S.-H., Chen, Y.-S., and Hu, W.-F.
(2006). Automatic traffic surveillance system for vehi-
cle tracking and classification. IEEE Transactions on
Intelligent Transportation Systems, 7(2):175–187.
Huang, D., Shan, C., Ardabilian, M., Wang, Y., and Chen,
L. (2011). Local binary patterns and its application to
facial image analysis: A survey. IEEE Transactions on
Systems, Man, and Cybernetics, Part C (Applications
and Reviews), 41(6):765–781.
Huo, Z., Xia, Y., and Zhang, B. (2016). Vehicle type classi-
fication and attribute prediction using multi-task rcnn.
In 2016 9th International Congress on Image and Sig-
nal Processing, BioMedical Engineering and Infor-
matics (CISP-BMEI), pages 564–569.
Ioffe, S. and Szegedy, C. (2015). Batch normalization:
Accelerating deep network training by reducing in-
ternal covariate shift. In Proceedings of the 32nd In-
ternational Conference on Machine Learning, ICML
2015, Lille, France, 6-11 July 2015, pages 448–456.
Jiang, C. and Zhang, B. (2016). Weakly-supervised vehi-
cle detection and classification by convolutional neu-
ral network. In 2016 9th International Congress on
Image and Signal Processing, BioMedical Engineer-
ing and Informatics (CISP-BMEI), pages 570–575.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
Imagenet classification with deep convolutional neu-
ral networks. In Pereira, F., Burges, C. J. C., Bottou,
L., and Weinberger, K. Q., editors, Advances in Neu-
ral Information Processing Systems 25, pages 1097–
1105. Curran Associates, Inc.
Lai, A. H. S., Fung, G. S. K., and Yung, N. H. C.
(2001). Vehicle type classification from visual-based
dimension estimation. In ITSC 2001. 2001 IEEE In-
telligent Transportation Systems. Proceedings (Cat.
No.01TH8585), pages 201–206.
Li, X., Zhao, F., and Guo, Y. (2014). Multi-label image
classification with a probabilistic label enhancement
model. In Proceedings of the Thirtieth Conference on
Uncertainty in Artificial Intelligence, UAI’14, pages
430–439, Arlington, Virginia, United States. AUAI
Press.
CRN: End-to-end Convolutional Recurrent Network Structure Applied to Vehicle Classification
143