Giese, M. A., & Poggio, T. (2003). Cognitive neuroscience:
neural mechanisms for the recognition of biological
movements. Nature Reviews Neuroscience, 4(3), 179.
Gkioxari, G., & Malik, J. (2015). Finding action tubes.
In Proceedings of the IEEE conference on computer
vision and pattern recognition (pp. 759-768).
Gorban, A., Idrees, H., Jiang, Y. G., Zamir, A. R., Laptev,
I., Shah, M., & Sukthankar, R. (2015). THUMOS
challenge: Action recognition with a large number of
classes.
Grundmann, M., Kwatra, V., Han, M., & Essa, I. (2010,
June). Efficient hierarchical graph-based video
segmentation. In Computer Vision and Pattern
Recognition (CVPR), 2010 IEEE Conference on (pp.
2141-2148). IEEE.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual
learning for image recognition. In Proceedings of the
IEEE conference on computer vision and pattern
recognition (pp. 770-778).
Horn, B. K., & Schunck, B. G. (1981). Determining optical
flow. Artificial intelligence, 17(1-3), 185-203.
Jain, M., Van Gemert, J., Jégou, H., Bouthemy, P., &
Snoek, C. G. (2014). Action localization with tubelets
from motion. In Proceedings of the IEEE conference on
computer vision and pattern recognition (pp. 740-747).
Jain, S. D., & Grauman, K. (2014, September). Supervoxel-
consistent foreground propagation in video.
In European Conference on Computer Vision (pp. 656-
671). Springer, Cham.
Jiang, Y. G., Liu, J., Zamir, A. R., Toderici, G., Laptev, I.,
Shah, M., & Sukthankar, R. (2014). THUMOS
challenge: Action recognition with a large number of
classes.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012).
Imagenet classification with deep convolutional neural
networks. In Advances in neural information
processing systems (pp. 1097-1105).
Kuehne, H., Jhuang, H., Stiefelhagen, R., & Serre, T.
(2013). Hmdb51: A large video database for human
motion recognition. In High Performance Computing in
Science and Engineering ‘12 (pp. 571-582). Springer,
Berlin, Heidelberg.
Lee, Y. J., Kim, J., & Grauman, K. (2011, November). Key-
segments for video object segmentation. In Computer
Vision (ICCV), 2011 IEEE International Conference
on (pp. 1995-2002). IEEE.
Lu, Z. L., & Sperling, G. (1995). The functional
architecture of human visual motion perception. Vision
research, 35(19), 2697-2722.
Narayana, M., Hanson, A., & Learned-Miller, E. (2013).
Coherent motion segmentation in moving camera
videos using optical flow orientations. In Proceedings
of the IEEE International Conference on Computer
Vision (pp. 1577-1584).
Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural
networks are easily fooled: High confidence predictions
for unrecognizable images. In Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition (pp. 427-436).
Oh, S., Hoogs, A., Perera, A., Cuntoor, N., Chen, C. C.,
Lee, J. T., ... & Swears, E. (2011, June). A large-scale
benchmark dataset for event recognition in surveillance
video. In Computer vision and pattern recognition
(CVPR), 2011 IEEE conference on (pp. 3153-3160).
IEEE.
Oram, M. W., & Perrett, D. I. (1996). Integration of form
and motion in the anterior superior temporal
polysensory area (STPa) of the macaque
monkey. Journal of neurophysiology, 76(1), 109-129.
Pathak, D., Girshick, R. B., Dollár, P., Darrell, T., &
Hariharan, B. (2017, July). Learning Features by
Watching Objects Move. In CVPR (Vol. 1, No. 2, p. 7).
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L.,
Gross, M., & Sorkine-Hornung, A. (2016). A
benchmark dataset and evaluation methodology for
video object segmentation. In Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition (pp. 724-732).
Polana, R., & Nelson, R. (1994, November). Low level
recognition of human motion (or how to get your man
without finding his body parts). In Motion of Non-
Rigid and Articulated Objects, 1994., Proceedings of
the 1994 IEEE Workshop on(pp. 77-82). IEEE.
Poppe, R. (2010). A survey on vision-based human action
recognition. Image and vision computing, 28(6), 976-
990.
Ray, L., & Miao, T. (2016, June). Towards Real-Time
Detection, Tracking and Classification of Natural
Video. In Computer and Robot Vision (CRV), 2016
13th Conference on(pp. 236-241). IEEE.
Roth, S., Lempitsky, V., & Rother, C. (2009). Discrete-
continuous optimization for optical flow estimation.
In Statistical and Geometrical Approaches to Visual
Motion Analysis (pp. 1-22). Springer, Berlin,
Heidelberg.
Shi, J., & Malik, J. (2000). Normalized cuts and image
segmentation. IEEE Transactions on pattern analysis
and machine intelligence, 22(8), 888-905.
Shou, Z., Wang, D., & Chang, S. F. (2016). Temporal
action localization in untrimmed videos via multi-stage
cnns. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp. 1049-
1058).
Simonyan, K., & Zisserman, A. (2014). Two-stream
convolutional networks for action recognition in
videos. In Advances in neural information processing
systems (pp. 568-576).
Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A
dataset of 101 human actions classes from videos in the
wild. arXiv preprint arXiv:1212.0402.
Tokmakov, P., Schmid, C., & Alahari, K. (2017). Learning
to Segment Moving Objects. arXiv preprint
arXiv:1712.01127.
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri,
M. (2015). Learning spatiotemporal features with 3d
convolutional networks. In Proceedings of the IEEE
international conference on computer vision (pp. 4489-
4497).
ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods