flow with convolutional networks. arXiv preprint
arXiv:1504.06852.
Fragkiadaki, K., Arbel
´
aez, P., Felsen, P., and Malik, J.
(2015). Learning to segment moving objects in
videos. In 2015 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 4083–4090.
IEEE.
Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready
for autonomous driving? the kitti vision benchmark
suite. In Conference on Computer Vision and Pattern
Recognition (CVPR).
Glorot, X. and Bengio, Y. (2010). Understanding the dif-
ficulty of training deep feedforward neural networks.
In Aistats, volume 9, pages 249–256.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.,
Girshick, R., Guadarrama, S., and Darrell, T. (2014).
Caffe: Convolutional architecture for fast feature em-
bedding. arXiv preprint arXiv:1408.5093.
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Suk-
thankar, R., and Fei-Fei, L. (2014). Large-scale video
classification with convolutional neural networks. In
Proceedings of the IEEE conference on Computer Vi-
sion and Pattern Recognition, pages 1725–1732.
Koltun, V. (2011). Efficient inference in fully connected
crfs with gaussian edge potentials. Adv. Neural Inf.
Process. Syst.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard,
R. E., Hubbard, W., and Jackel, L. D. (1989). Back-
propagation applied to handwritten zip code recogni-
tion. Neural computation, 1(4):541–551.
Lin, G., Shen, C., Reid, I., et al. (2015). Efficient piece-
wise training of deep structured models for semantic
segmentation. arXiv preprint arXiv:1504.01013.
Liu, Z., Li, X., Luo, P., Loy, C.-C., and Tang, X. (2015). Se-
mantic image segmentation via deep parsing network.
In Proceedings of the IEEE International Conference
on Computer Vision, pages 1377–1385.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-
volutional networks for semantic segmentation. In
Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition, pages 3431–3440.
Park, E., Han, X., Berg, T. L., and Berg, A. C. (2016). Com-
bining multiple sources of knowledge in deep cnns for
action recognition. In 2016 IEEE Winter Conference
on Applications of Computer Vision (WACV), pages
1–8. IEEE.
Reddy, N. D., Singhal, P., and Krishna, K. M. (2014). Se-
mantic motion segmentation using dense crf formula-
tion. In Proceedings of the 2014 Indian Conference
on Computer Vision Graphics and Image Processing,
page 56. ACM.
Rozantsev, A., Lepetit, V., and Fua, P. (2014). Flying ob-
jects detection from a single moving camera. arXiv
preprint arXiv:1411.7715.
Russell, C., Kohli, P., Torr, P. H., et al. (2009). Associative
hierarchical crfs for object class image segmentation.
In 2009 IEEE 12th International Conference on Com-
puter Vision, pages 739–746. IEEE.
Shotton, J., Johnson, M., and Cipolla, R. (2008). Semantic
texton forests for image categorization and segmen-
tation. In Computer vision and pattern recognition,
2008. CVPR 2008. IEEE Conference on, pages 1–8.
IEEE.
Simonyan, K. and Zisserman, A. (2014a). Two-stream con-
volutional networks for action recognition in videos.
In Advances in Neural Information Processing Sys-
tems, pages 568–576.
Simonyan, K. and Zisserman, A. (2014b). Very deep con-
volutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Tokmakov, P., Alahari, K., and Schmid, C. (2016). Weakly-
supervised semantic segmentation using motion cues.
arXiv preprint arXiv:1603.07188.
Tourani, S. and Krishna, K. M. (2016). Using in-frame
shear constraints for monocular motion segmentation
of rigid bodies. Journal of Intelligent & Robotic Sys-
tems, 82(2):237–255.
Wedel, A., Meißner, A., Rabe, C., Franke, U., and Cremers,
D. (2009). Detection and segmentation of indepen-
dently moving objects from dense scene flow. In Inter-
national Workshop on Energy Minimization Methods
in Computer Vision and Pattern Recognition, pages
14–27. Springer.
Weinzaepfel, P., Revaud, J., Harchaoui, Z., and Schmid, C.
(2013). Deepflow: Large displacement optical flow
with deep matching. In Proceedings of the IEEE
International Conference on Computer Vision, pages
1385–1392.
Yu, F. and Koltun, V. (2015). Multi-scale context ag-
gregation by dilated convolutions. arXiv preprint
arXiv:1511.07122.
Joint Semantic and Motion Segmentation for Dynamic Scenes using Deep Convolutional Networks
85