gent vision systems. In Proceedings of 14th Interna-
tional Conference, ACIVS 2012.
Csurka, G., Bray, C., Dance, C., and Fan, L. (2004). Visual
categorization with bags of keypoints. In Proceedings
of ECCV Workshop on Statistical Learning in Com-
puter Vision, pages 1–22.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gra-
dients for human detection. In Proceedings of IEEE
Conference on Computer Vision and Pattern Recogni-
tion, volume 1, pages 886–893.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J.,
and Zisserman, A. (2010). The PASCAL Visual Ob-
ject Classes (VOC) challenge. International Journal
of Computer Vision, 88(2):303–338.
Lowe, D. G. (1999). Object recognition from local scale in-
variant features. In Proceedings of IEEE International
Conference on Computer Vision, pages 1150–1157.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60(2):91–110.
Mikolajczyk, K. and Schmid, C. (2004). Scale & affine in-
variant interest point detectors. International Journal
of Computer Vision, 60(1):63–86.
N. Inoue, T. H. Dang, R. Y. and Shinoda, K. (2015). Toky-
oTech at TRECVID 2015. In TRECVID 2015.
Ojala, T., Pietik¨ainen, M., and Harwood, D. (1994). Perfor-
mance evaluation of texture measures with classifica-
tion based on kullback discrimination of distributions.
In Proceedings of the IAPR International Conference,
volume 1, pages 582–585.
Over, P., Awad, G., Michel, M., Fiscus, J., Kraaij, W.,
Smeaton, A. F., Qu´enot, G., and Ordelman, R. (2015).
TRECVID 2015 – An overview of the goals, tasks,
data, evaluation mechanisms and metrics. In Proceed-
ings of TRECVID 2015. NIST, USA.
Ren, S., He, K., Girshick, R. B., and Sun, J. (2015). Faster
R-CNN: Towards real-time object detection with re-
gion proposal networks. CoRR, abs/1506.01497.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh,
S., Ma, S., Huang, Z., Karpathy, A., Khosla, A.,
Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015).
ImageNet Large Scale Visual Recognition Challenge.
International Journal of Computer Vision (IJCV),
115(3):211–252.
S´anchez, J., Perronnin, F., Mensink, T., and Verbeek, J.
(2013). Image Classification with the Fisher Vector:
Theory and practice. International Journal of Com-
puter Vision, 105(3):222–245.
Schmid, C. (2006). Beyond bags of features: Spatial
pyramid matching for recognizing natural scene cat-
egories. In Proceedings of CVPR 2006, pages 2169–
2178.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
CoRR, abs/1409.1556.
Smeaton, A. F., Over, P., and Kraaij, W. (2006). Evaluation
campaigns and TRECVid. In MIR ’06: Proceedings
of the 8th ACM International Workshop on Multime-
dia Information Retrieval, pages 321–330, New York,
NY, USA. ACM Press.
Snoek, C. G. M., Cappallo, S., van Gemert, J., Habibian,
A., Mensink, T., Mettes, P., Tao, R., Koelma, D. C.,
and Smeulders, A. W. M. (2015). Qualcomm Re-
search and University of Amsterdam at TRECVID
2015: Recognizing Concepts, Objects, and Events in
Video. In TRECVID 2015.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E.,
Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-
novich, A. (2014). Going deeper with convolutions.
CoRR, abs/1409.4842.
Ueki, K. and Kobayashi, T. (2015). Waseda at TRECVID
2015: Semantic Indexing. In TRECVID 2015.
Varma, M. and Ray, D. (2007). Learning the discrimina-
tive power-invariance trade-off. In Proceedings of the
IEEE International Conference on Computer Vision,
Rio de Janeiro, Brazil.
Zeiler, M. D. and Fergus, R. (2013). Visualizing
and understanding convolutional networks. CoRR,
abs/1311.2901.