Fast Discovery of Discriminative Mid-level Patches

Angran Lin, Xuhui Jia, Kowk Ping Chan

2015

Abstract

Learning discriminative mid-level patches has gained popularity in recent years since they can be applied to various computer vision topics and achieve better performance. However, state-of-the-art learning methods require a lot of training time, especially when the problem scale becomes much larger. In this paper we propose a simple but fast and effective way, the Fast Exemplar Clustering(FEC), to mine discriminative mid-level patches with only class labels provided. We verified our results on the task of scene classification and it took us only one day to train the model on the MIT Indoor 67 dataset using an Core i5 quad-core computer with Matlab. The results of our experiments revealed that the mid-level patches discovered by our method were semantically meaningful and achieved competitive accuracy compared to the state-of-the-art techniques. In addition, we created a new scene classification dataset named Outdoor Sight 20 which contains outdoor views of 20 famous tourist attractions to test our model.

References

  1. Agarwal, S., Awan, A., and Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. Pattern Analysis and Machine Intelligence (PAMI), 2004 IEEE Transactions on, 26(11):1475- 1490.
  2. Andrews, S., Tsochantaridis, I., and Hofmann, T. (2002). Support vector machines for multiple-instance learning. In NIPS, pages 561-568.
  3. Aubry, M., Maturana, D., Efros, A. A., Russell, B. C., and Sivic, J. (2014). Seeing 3d chairs: exemplar partbased 2d-3d alignment using a large dataset of cad models. In CVPR, 2014 IEEE Conference on. IEEE.
  4. Canny, J. (1986). A computational approach to edge detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (6):679-698.
  5. Chatfield, K., Lempitsky, V. S., Vedaldi, A., and Zisserman, A. (2011). The devil is in the details: an evaluation of recent feature encoding methods. pages 1-12.
  6. Chen, X., Shrivastava, A., and Gupta, A. (2013). Neil: Extracting visual knowledge from web data. In ICCV, 2013 IEEE International Conference on, pages 1409- 1416. IEEE.
  7. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR, 2005 IEEE Conference on, volume 1, pages 886-893. IEEE.
  8. Doersch, C., Singh, S., Gupta, A., Sivic, J., and Efros, A. A. (2012). What makes paris look like paris? ACM Transactions on Graphics (TOG), 31(4):101.
  9. Felzenszwalb, P., McAllester, D., and Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In CVPR, 2008 IEEE Conference on, pages 1-8. IEEE.
  10. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. (2010). Object detection with discriminatively trained part-based models. Pattern Analysis and Machine Intelligence (PAMI), 2010 IEEE Transactions on, 32(9):1627-1645.
  11. Jain, A., Gupta, A., Rodriguez, M., and Davis, L. S. (2013). Representing videos using mid-level discriminative patches. In CVPR, 2013 IEEE Conference on, pages 2571-2578. IEEE.
  12. Jia, X., Yang, H., Lin, A., Chan, K.-P., and Patras, I. Structured semi-supervised forest for facial landmarks localization with face mask reasoning. In BMVC, 2014 IEEE International Conference on. IEEE.
  13. Jia, X., Zhu, X., Lin, A., and Chan, K. P. (2013). Face alignment using structured random regressors combined with statistical shape model fitting. In 28th International Conference on Image and Vision Computing New Zealand, IVCNZ 2013, Wellington, New Zealand, November 27-29, 2013, pages 424-429.
  14. Juneja, M., Vedaldi, A., Jawahar, C., and Zisserman, A. (2013). Blocks that shout: Distinctive parts for scene classification. In CVPR, 2013 IEEE Conference on, pages 923-930. IEEE.
  15. Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006 IEEE Conference on, volume 2, pages 2169-2178. IEEE.
  16. Lee, Y. J., Efros, A. A., and Hebert, M. (2013). Style-aware mid-level representation for discovering visual connections in space and time. In ICCV, 2013 IEEE International Conference on, pages 1857-1864. IEEE.
  17. Li, L.-J., Su, H., Fei-Fei, L., and Xing, E. P. (2010). Object bank: A high-level image representation for scene classification & semantic feature sparsification. In Advances in neural information processing systems, pages 1378-1386.
  18. Li, Q., Wu, J., and Tu, Z. (2013). Harvesting mid-level visual concepts from large-scale internet images. In CVPR, 2013 IEEE Conference on, pages 851-858. IEEE.
  19. Lim, J. J., Zitnick, C. L., and Dollár, P. (2013). Sketch tokens: A learned mid-level representation for contour and object detection. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3158-3165. IEEE.
  20. Maji, S. and Shakhnarovich, G. (2013). Part discovery from partial correspondence. In CVPR, 2013 IEEE Conference on, pages 931-938. IEEE.
  21. Malisiewicz, T., Gupta, A., and Efros, A. A. (2011). Ensemble of exemplar-svms for object detection and beyond. In ECCV, 2011 IEEE International Conference on, pages 89-96. IEEE.
  22. Mittelman, R., Lee, H., Kuipers, B., and Savarese, S. (2013). Weakly supervised learning of mid-level features with beta-bernoulli process restricted boltzmann machines. In IEEE Conference on Computer Vision and Pattern Recognition, pages 476-483.
  23. Pandey, M. and Lazebnik, S. (2011). Scene recognition and weakly supervised object localization with deformable part-based models. In ECCV, 2011 IEEE International Conference on, pages 1307-1314. IEEE.
  24. Parizi, S. N., Oberlin, J. G., and Felzenszwalb, P. F. (2012). Reconfigurable models for scene recognition. In CVPR, 2012 IEEE Conference on, pages 2775-2782. IEEE.
  25. Perronnin, F., Liu, Y., Sánchez, J., and Poirier, H. (2010). Large-scale image retrieval with compressed fisher vectors. In CVPR, 2010 IEEE Conference on, pages 3384-3391. IEEE.
  26. Quattoni, A. and Torralba, A. (2009). Recognizing indoor scenes. In CVPR, 2009 IEEE Conference on. IEEE.
  27. Rios-Cabrera, R. and Tuytelaars, T. (2013). Discriminatively trained templates for 3d object detection: A real time scalable approach. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 2048- 2055. IEEE.
  28. Sadeghi, F. and Tappen, M. F. (2012). Latent pyramidal regions for recognizing scenes. In ECCV, 2012 IEEE Conference on, pages 228-241. Springer.
  29. Sandeep, R. N., Verma, Y., and Jawahar, C. (2014). Relative parts: Distinctive parts for learning relative attributes. In CVPR, 2014 IEEE Conference on. IEEE.
  30. Shabou, A. and LeBorgne, H. (2012). Locality-constrained and spatially regularized coding for scene categorization. In CVPR, 2012 IEEE Conference on, pages 3618-3625. IEEE.
  31. Shen, L., Wang, S., Sun, G., Jiang, S., and Huang, Q. (2013). Multi-level discriminative dictionary learning towards hierarchical visual categorization. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 383-390. IEEE.
  32. Singh, S., Gupta, A., and Efros, A. A. (2012). Unsupervised discovery of mid-level discriminative patches. In ECCV, 2012 IEEE Conference on, pages 73-86. Springer.
  33. Sun, J. and Ponce, J. (2013). Learning discriminative part detectors for image classification and cosegmentation. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 3400-3407. IEEE.
  34. Tang, K., Sukthankar, R., Yagnik, J., and Fei-Fei, L. (2013). Discriminative segment annotation in weakly labeled video. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 2483- 2490. IEEE.
  35. Walker, J., Gupta, A., and Hebert, M. (2014). Patch to the future: Unsupervised visual prediction. In CVPR, 2014 IEEE Conference on. IEEE.
  36. Wang, L., Qiao, Y., and Tang, X. (2013). Motionlets: Midlevel 3d parts for human motion recognition. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 2674-2681. IEEE.
  37. Wu, J. and Rehg, J. M. (2011). Centrist: A visual descriptor for scene categorization. Pattern Analysis and Machine Intelligence (PAMI), 2011 IEEE Transactions on, 33(8):1489-1501.
  38. Yang, J., Yu, K., Gong, Y., and Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In CVPR, 2009 IEEE Conference on, pages 1794-1801. IEEE.
  39. Zheng, Y., Jiang, Y.-G., and Xue, X. (2012). Learning hybrid part filters for scene recognition. In ECCV, 2012 IEEE Conference on, pages 172-185. Springer.
  40. Zhu, J., Li, L.-J., Fei-Fei, L., and Xing, E. P. (2010). Large margin learning of upstream scene understanding models. In Advances in Neural Information Processing Systems, pages 2586-2594.
Download


Paper Citation


in Harvard Style

Lin A., Jia X. and Chan K. (2015). Fast Discovery of Discriminative Mid-level Patches . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM, ISBN 978-989-758-077-2, pages 53-61. DOI: 10.5220/0005183200530061


in Bibtex Style

@conference{icpram15,
author={Angran Lin and Xuhui Jia and Kowk Ping Chan},
title={Fast Discovery of Discriminative Mid-level Patches},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,},
year={2015},
pages={53-61},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005183200530061},
isbn={978-989-758-077-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,
TI - Fast Discovery of Discriminative Mid-level Patches
SN - 978-989-758-077-2
AU - Lin A.
AU - Jia X.
AU - Chan K.
PY - 2015
SP - 53
EP - 61
DO - 10.5220/0005183200530061