In Computer Vision (ICCV), 2011 IEEE International
Conference on, pages 786–793.
Amer, M. R., Xie, D., Zhao, M., Todorovic, S., and Zhu, S.-
C. (2012). Cost-sensitive top-down/bottom-up infer-
ence for multiscale activity recognition. In Computer
Vision–ECCV 2012, pages 187–200. Springer.
Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library
for support vector machines. ACM Transactions on
Intelligent Systems and Technology, 2:27:1–27:27.
Choi, W. and Savarese, S. (2012). A unified framework
for multi-target tracking and collective activity recog-
nition. In Computer Vision–ECCV 2012, pages 215–
230. Springer.
Choi, W., Shahid, K., and Savarese, S. (2009). What are
they doing? : Collective activity classification us-
ing spatio-temporal relationship among people. In
Computer Vision Workshops (ICCV Workshops), 2009
IEEE 12th International Conference on, pages 1282–
1289.
Choi, W., Shahid, K., and Savarese, S. (2011). Learning
context for collective activity recognition. In Proceed-
ings of the IEEE International Conference on Com-
puter Vision and Pattern Recognition.
Dieleman, S., Schluter, J., Raffel, C., Olson, E., Snderby,
S. K., Nouri, D., Maturana, D., Thoma, M., Bat-
tenberg, E., Kelly, J., Fauw, J. D., Heilman, M.,
diogo149, McFee, B., Weideman, H., takacsg84, pe-
terderivaz, Jon, instagibbs, Rasul, D. K., CongLiu,
Britefury, and Degrave, J. (2015). Lasagne: First re-
lease.
Ess, A., Leibe, B., Schindler, K., and Van Gool, L. (2008).
A mobile vision system for robust multi-person track-
ing. In Computer Vision and Pattern Recognition,
2008. CVPR 2008. IEEE Conference on, pages 1–8.
Glorot, X. and Bengio, Y. (2010). Understanding the diffi-
culty of training deep feedforward neural networks. In
International conference on artificial intelligence and
statistics, pages 249–256.
Hasan, M. and Roy-Chowdhury, A. K. (2014). Contin-
uous learning of human activity models using deep
nets. In Computer Vision–ECCV 2014, pages 705–
720. Springer.
Ji, S., Xu, W., Yang, M., and Yu, K. (2013). 3d convo-
lutional neural networks for human action recogni-
tion. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 35(1):221–231.
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Suk-
thankar, R., and Fei-Fei, L. (2014). Large-scale video
classification with convolutional neural networks. In
Computer Vision and Pattern Recognition (CVPR),
2014 IEEE Conference on, pages 1725–1732. IEEE.
Lan, T., Wang, Y., Yang, W., Robinovitch, S. N., and Mori,
G. (2012). Discriminative latent models for recog-
nizing contextual group activities. Pattern Analy-
sis and Machine Intelligence, IEEE Transactions on,
34(8):1549–1562.
Laptev, I., Marszałek, M., Schmid, C., and Rozenfeld,
B. (2008a). Learning realistic human actions from
movies. In Computer Vision and Pattern Recognition,
2008. CVPR 2008. IEEE Conference on, pages 1–8.
IEEE.
Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld,
B. (2008b). Learning realistic human actions from
movies. In Computer Vision and Pattern Recognition,
2008. CVPR 2008. IEEE Conference on, pages 1–8.
Nesterov, Y. et al. (2007). Gradient methods for minimizing
composite objective function. Technical report, UCL.
Schuldt, C., Laptev, I., and Caputo, B. (2004). Recogniz-
ing human actions: a local svm approach. In Pat-
tern Recognition, 2004. ICPR 2004. Proceedings of
the 17th International Conference on, volume 3, pages
32–36 Vol.3.
Tran, K. N., Bedagkar-Gala, A., Kakadiaris, I. A., and Shah,
S. K. (2013). Social cues in group formation and local
interactions for collective activity analysis. In VIS-
APP, pages 539–548.
Tran, K. N., Kakadiaris, I. A., and Shah, S. K. (2012).
Part-based motion descriptor image for human action
recognition. Pattern Recognition, 45(7):2562–2572.
Tran, K. N., Yan, X., Kakadiaris, I. A., and Shah, S. K.
(2015). A group contextual model for activity recog-
nition in crowded scenes. In VISAPP.
Wang, X. and Ji, Q. (2015). Video event recognition with
deep hierarchical context model. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pages 4418–4427.
Was, J., Gudowski, B., and Matuszyk, P. J. (2006). Social
distances model of pedestrian dynamics. In Cellular
Automata, pages 492–501. Springer.
Wei, L. and Shah, S. K. (2015). Subject centric group fea-
ture for person re-identification. In 2015 IEEE Con-
ference on Computer Vision and Pattern Recognition
Workshops (CVPRW), pages 28–35.
Wei, L. and Shah, S. K. (2016). Person re-identification
with spatial appearance group feature. In 2016 IEEE
Symposium on Technologies for Homeland Security
(HST), pages 1–6.
Weinland, D., Ronfard, R., and Boyer, E. (2011). A sur-
vey of vision-based methods for action representation,
segmentation and recognition. Computer Vision and
Image Understanding, 115(2):224–241.
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., and Oliva,
A. (2014). Learning deep features for scene recog-
nition using places database. In Advances in Neural
Information Processing Systems, pages 487–495.
Human Activity Recognition using Deep Neural Network with Contextual Information
43