Collaborative Contributions for Better Annotations

Priyam Bakliwal, Guruprasad M. Hegde, C. V. Jawahar


We propose an active learning based solution for efficient, scalable and accurate annotations of objects in video sequences. Recent computer vision solutions use machine learning. Effectiveness of these solutions relies on the amount of available annotated data which again depends on the generation of huge amount of accurately annotated data. In this paper, we focus on reducing the human annotation efforts with simultaneous increase in tracking accuracy to get precise, tight bounding boxes around an object of interest. We use a novel combination of two different tracking algorithms to track an object in the whole video sequence. We propose a sampling strategy to sample the most informative frame which is given for human annotation. This newly annotated frame is used to update the previous annotations. Thus, by collaborative efforts of both human and the system we obtain accurate annotations with minimal effort. Using the proposed method, user efforts can be reduced to half without compromising on the annotation accuracy. We have quantitatively and qualitatively validated the results on eight different datasets.


  1. Angela, Y., Juergen, G., Christian, L., and Luc, Van, G. (2012). Interactive object detection. CVPR.
  2. Bolme, a. S., Beveridge, J. R., Draper, B. A., and Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. CVPR.
  3. Chatterjee, M. and Leuski, A. (2015). CRMActive: An active learning based approach for effective Video annotation and retrieval. ICMR.
  4. Danelljan, M., Haumlger, G., Shahbaz Khan, F., and Felsberg, M. (2014). Accurate scale estimation for robust visual tracking. BMVC.
  5. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and FeiFei, L. (2009). Imagenet: A large-scale hierarchical image database. CVPR.
  6. Fergus, R., Weiss, Y., and Torralba, A. (2009). Semisupervised learning in gigantic image collections. NIPS.
  7. Gray, D., Brennan, S., and Tao, H. (2007). Evaluating appearance models for recognition, reacquisition, and tracking. PETSW.
  8. Höferlin, B., Netzel, R., Höferlin, M., Weiskopf, D., and Heidemann, G. (2012). Inter-active learning of ad-hoc classifiers for video visual analytics. VAST.
  9. Kavasidis, I., Palazzo, S., Di Salvo, R., Giordano, D., and Spampinato, C. (2012). A semi-automatic tool for detection and tracking ground truth generation in videos. VIGTAW.
  10. Lee, Jae, Y., and Grauman, K. (2011). Learning the easy things first: Self-paced visual category discovery. CVPR.
  11. Oh, S. and et. al. (2011). A large-scale benchmark dataset for event recognition in surveillance video. CVPR.
  12. Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. (2008). Labelme: A database and web-based tool for image annotation. IJCV.
  13. Thomas, D., Bogdan, A., and Ferrari, V. (2010). Localizing objects while learning their appearance. ECCV.
  14. Vondrick, C., Patterson, D., and Ramanan, D. (2013). Efficiently scaling up crowdsourced video annotation. IJCV.
  15. Vondrick, C. and Ramanan, D. (2011). Video annotation and tracking with active learning. NIPS.
  16. Yuen, J., Russell, B., Liu, C., and Torralba, A. (2009). Labelme video: Building a video database with human annotations.
  17. Zha, Z. J., Wang, M., Zheng, Y. T., Yang, Y., Hong, R., and Chua, T. S. (2012). Interactive video indexing with statistical active learning. Transactions on Multimedia.
  18. Zhang, K. and Song, H. (2013). Real-time visual tracking via online weighted multiple instance learning. Pattern Recognition.
  19. Zhong, D. and Chang, S.-F. (2001). Structure analysis of sports video using domain models. International Conference on Multimedia and Expo.
  20. Zhong, H., Shi, J., and Visontai, M. (2004). Detecting unusual activity in video. CVPR.
  21. Zhou, H., Yuan, Y., and Shi, C. (2009). Object tracking using sift features and mean shift. Computer vision and image understanding.

Paper Citation

in Harvard Style

Bakliwal P., M. Hegde G. and Jawahar C. (2017). Collaborative Contributions for Better Annotations . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 6: VISAPP, (VISIGRAPP 2017) ISBN 978-989-758-227-1, pages 353-360. DOI: 10.5220/0006098103530360

in Bibtex Style

author={Priyam Bakliwal and Guruprasad M. Hegde and C. V. Jawahar},
title={Collaborative Contributions for Better Annotations},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 6: VISAPP, (VISIGRAPP 2017)},

in EndNote Style

JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 6: VISAPP, (VISIGRAPP 2017)
TI - Collaborative Contributions for Better Annotations
SN - 978-989-758-227-1
AU - Bakliwal P.
AU - M. Hegde G.
AU - Jawahar C.
PY - 2017
SP - 353
EP - 360
DO - 10.5220/0006098103530360