Extending Recognition in a Changing Environment

Daniel Harari, Shimon Ullman

Abstract

We consider the task of visual recognition of objects and their parts in a dynamic environment, where the appearances, as well as the relative positions between parts, change over time. We start with a model of an object class learned from a limited set of view directions (such as side views of cars or airplanes). The algorithm is then given a video input which contains the object moving and changing its viewing direction. Our aim is to reliably detect the object as it changes beyond its known views, and use the dynamically changing views to extend the initial object model. To achieve this goal, we construct an object model at each time instant by combining two sources: consistency with the measured optical flow, together with similarity to the object model at an earlier time. We introduce a simple new way of updating the object model dynamically by combining approximate nearest neighbors search with kernel density estimation. Unlike tracking-by-detection methods that focus on tracking a specific object over time, we demonstrate how the proposed method can be used for learning, by extending the initial generic object model to cope with novel viewing directions, without further supervision. The results show that the adaptive combination of the initial model with even a single video sequence already provides useful generalization of the class model to novel views.

References

  1. Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Trans Pattern Anal Mach Intell, 26(11), 1475-1490.
  2. Arya, S., & Mount, D. N. (1993). Approximate nearest neighbor queries in fixed dimensions. Proc ACMSIAM Symp on Discrete Algorithms, 271-280.
  3. Black, M. J., & Anandan, P. (1996). The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields. Computer Vision and Image Understanding, 63(1), 75-104.
  4. Cehovin, L., Kristan, M., & Leonardis, A. (2011). An adaptive coupled-layer visual model for robust visual tracking. Proc IEEE Int Conf Computer Vision, 1363- 1370.
  5. Cornelis, N., Leibe, B., Cornelis, K., & Van Gool, L. (2006). 3D City Modeling Using Cognitive Loops. Proc Int Sym 3D Data Processing, Visualization, and Transmission, 9-16.
  6. Crandall, D., Felzenszwalb, P., & Huttenlocher, D. (2005). Spatial priors for part-based recognition using statistical models. Proc IEEE Conf Computer Vision and Pattern Recognition, 10-17.
  7. Dalal, N., & Triggs, B. (2005). Histograms of Oriented Gradients for Human Detection. Proc IEEE Conf Computer Vision and Pattern Recognition, 886-893.
  8. Dalal, N., Triggs, B., & Schmid, C. (2006). Human Detection Using Oriented Histograms of Flow and Appearance. Proc IEEE Int Conf Computer Vision, 428-441.
  9. Epshtein, B., & Ullman, S. (2007). Semantic Hierarchies for Recognizing Objects and Parts. Proc IEEE Conf Computer Vision and Pattern Recognition, 1-8.
  10. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. Retrieved from http://www.pascalnetwork.org/challenges/VOC/voc2009
  11. Fei-Fei, L., Fergus, R., & Perona, P. (2007). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1), 59-70.
  12. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans Pattern Anal Mach Intell, 1-20.
  13. Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. Int J Computer Vision, 61(1), 55-79.
  14. Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. Proc IEEE Conf Computer Vision and Pattern Recognition, 2, 264-271.
  15. Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. Proc IEEE Conf Computer Vision and Pattern Recognition, 1, 380-387.
  16. Ferryman, J. M. (2009). Workshop on Performance Evaluation of Tracking and Surveillance. Retrieved from http://www.cvg.rdg.ac.uk/PETS2009
  17. Godec, M., Roth, P. M., & Bischof, H. (2011). Houghbased tracking of non-rigid objects. Proc IEEE Int Conf Computer Vision, 81-88.
  18. Kalal, Z., Mikolajczyk, K., & Matas, J. (2011). TrackingLearning-Detection. IEEE Trans Pattern Anal Mach Intell, 34(7), 1409-1422.
  19. Kwon, J., & Lee, K. M. (2009). Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive basin hopping monte carlo sampling. Proc IEEE Conf Computer Vision and Pattern Recognition, 1208-1215.
  20. Lim, J., Ross, D., Lin, R., & Yang, M. (2005). Incremental learning for visual tracking. In L. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in Neural Inform Process Syst (Vol. 7, pp. 793- 800). Cambridge: MIT Press.
  21. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. Int J Computer Vision, 60(2), 91-110.
  22. Ramanan, D., Forsyth, D., & Zisserman, A. (2007). Tracking People by Learning Their Appearance. IEEE Trans Pattern Anal Mach Intell, 29(1), 65-81.
  23. Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5(7), 682-687.
  24. Vidal-Naquet, M., & Ullman, S. (2003). Object recognition with informative features and linear classification. Proc IEEE Int Conf Computer Vision, 281-288.
Download


Paper Citation


in Harvard Style

Harari D. and Ullman S. (2013). Extending Recognition in a Changing Environment . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 632-640. DOI: 10.5220/0004281106320640


in Bibtex Style

@conference{visapp13,
author={Daniel Harari and Shimon Ullman},
title={Extending Recognition in a Changing Environment},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},
year={2013},
pages={632-640},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004281106320640},
isbn={978-989-8565-47-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - Extending Recognition in a Changing Environment
SN - 978-989-8565-47-1
AU - Harari D.
AU - Ullman S.
PY - 2013
SP - 632
EP - 640
DO - 10.5220/0004281106320640