REFERENCES
Bao, L., Wu, B., and Liu, W. (2018). Cnn in mrf: Video ob-
ject segmentation via inference in a cnn-based higher-
order spatio-temporal mrf. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 5977–5986.
Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A.,
and Torr, P. H. (2016). Fully-convolutional siamese
networks for object tracking. In European Conference
on Computer Vision (ECCV), pages 850–865. Sprin-
ger.
Braham, M. and Van Droogenbroeck, M. (2016). Deep
background subtraction with scene-specific convolu-
tional neural networks. In IEEE International Con-
ference on Systems, Signals and Image Processing
(IWSSIP), pages 1–4. IEEE.
Caelles, S., Maninis, K.-K., Pont-Tuset, J., Leal-Taix
´
e, L.,
Cremers, D., and Van Gool, L. (2017). One-shot video
object segmentation. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR). IEEE.
Cheng, J., Tsai, Y.-H., Hung, W.-C., Wang, S., and Yang,
M.-H. (2018). Fast and accurate online video object
segmentation via tracking parts.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 248–255. Ieee.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2014). Generative adversarial nets. In Advan-
ces in Neural Information Processing Systems (NIPS),
pages 2672–2680.
Goroshin, R., Bruna, J., Tompson, J., Eigen, D., and LeCun,
Y. (2015). Unsupervised learning of spatiotemporally
coherent metrics. In IEEE International Conference
on Computer Vision (ICCV), pages 4086–4093.
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A.,
and Brox, T. (2017). Flownet 2.0: Evolution of optical
flow estimation with deep networks. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), volume 2, page 6.
Ioffe, S. and Szegedy, C. (2015). Batch normalization:
Accelerating deep network training by reducing in-
ternal covariate shift. In International Conference on
Machine Learning (ICML), pages 448–456.
Jain, S. D. and Grauman, K. (2014). Supervoxel-consistent
foreground propagation in video. In European Con-
ference on Computer Vision (ECCV), pages 656–671.
Springer.
Khoreva, A., Benenson, R., Ilg, E., Brox, T., and Schiele,
B. (2017). Lucid data dreaming for object tracking. In
The 2017 DAVIS Challenge on Video Object Segmen-
tation - CVPR Workshops.
Kingma, D. P. and Ba, J. (2014). Adam: A method for sto-
chastic optimization. arXiv preprint arXiv:1412.6980.
Kr
¨
ahenb
¨
uhl, P. and Koltun, V. (2011). Efficient inference in
fully connected crfs with gaussian edge potentials. In
Advances in Neural Information Processing Systems
(NIPS), pages 109–117.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
Imagenet classification with deep convolutional neu-
ral networks. In Advances in Neural Information Pro-
cessing Systems (NIPS), pages 1097–1105.
Li, F., Kim, T., Humayun, A., Tsai, D., and Rehg, J. M.
(2013). Video segmentation by tracking many figure-
ground segments. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 2192–2199.
Li, H., Li, Y., and Porikli, F. (2016). Deeptrack: Lear-
ning discriminative feature representations online for
robust visual tracking. IEEE Transactions on Image
Processing (TIP), 25(4):1834–1848.
Luc, P., Couprie, C., Chintala, S., and Verbeek, J. (2016).
Semantic segmentation using adversarial networks. In
NIPS Workshop on Adversarial Training.
Maninis, K.-K., Pont-Tuset, J., Arbel
´
aez, P., and Van Gool,
L. (2016). Convolutional oriented boundaries. In Eu-
ropean Conference on Computer Vision (ECCV), pa-
ges 580–596. Springer.
M
¨
arki, N., Perazzi, F., Wang, O., and Sorkine-Hornung, A.
(2016). Bilateral space video segmentation. In Pro-
ceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 743–751.
Mobahi, H., Collobert, R., and Weston, J. (2009). Deep le-
arning from temporal coherence in video. In Internati-
onal Conference on Machine Learning (ICML), pages
737–744. ACM.
Nam, H. and Han, B. (2016). Learning multi-domain con-
volutional neural networks for visual tracking. In Pro-
ceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 4293–4302.
Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., and
Sorkine-Hornung, A. (2017). Learning video object
segmentation from static images. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), volume 2.
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L.,
Gross, M., and Sorkine-Hornung, A. (2016). A ben-
chmark dataset and evaluation methodology for video
object segmentation. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 724–732.
Prest, A., Leistner, C., Civera, J., Schmid, C., and Ferrari, V.
(2012). Learning object class detectors from weakly
annotated video. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 3282–3289. IEEE.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time ob-
ject detection. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 779–788.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. In Advances in Neural Informa-
tion Processing Systems (NIPS), pages 91–99.
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
230