6 CONCLUSION
In this paper we proposed a novel approach to learn
discriminative mid-level patches from training data
with only class labels provided. The motivation lies
in that current discriminative patch learning methods
are too time-consuming and can hardly be applied
to complicated computer vision problems with larger
dataset. We proposed the FEC algorithm to train part
classifiers. Under proper validation settings and ap-
propriately designed evaluation function, we obtained
classifiers whose accuracy could compete with state-
of-the-art SVM based classifiers. We tested our clas-
sifiers on scene classification using MIT Indoor 67
and our Outdoor Sight 20. Both results revealed they
were as good as classifiers generated by the contem-
porary methods. Our classifiers could be further ap-
plied to other computer vision problems like scene
classification, video classification, object detection,
2D-3D matching.
REFERENCES
Agarwal, S., Awan, A., and Roth, D. (2004). Learning to
detect objects in images via a sparse, part-based repre-
sentation. Pattern Analysis and Machine Intelligence
(PAMI), 2004 IEEE Transactions on, 26(11):1475–
1490.
Andrews, S., Tsochantaridis, I., and Hofmann, T. (2002).
Support vector machines for multiple-instance learn-
ing. In NIPS, pages 561–568.
Aubry, M., Maturana, D., Efros, A. A., Russell, B. C., and
Sivic, J. (2014). Seeing 3d chairs: exemplar part-
based 2d-3d alignment using a large dataset of cad
models. In CVPR, 2014 IEEE Conference on. IEEE.
Canny, J. (1986). A computational approach to edge detec-
tion. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, (6):679–698.
Chatfield, K., Lempitsky, V. S., Vedaldi, A., and Zisserman,
A. (2011). The devil is in the details: an evaluation of
recent feature encoding methods. pages 1–12.
Chen, X., Shrivastava, A., and Gupta, A. (2013). Neil: Ex-
tracting visual knowledge from web data. In ICCV,
2013 IEEE International Conference on, pages 1409–
1416. IEEE.
Dalal, N. and Triggs, B. (2005). Histograms of oriented
gradients for human detection. In CVPR, 2005 IEEE
Conference on, volume 1, pages 886–893. IEEE.
Doersch, C., Singh, S., Gupta, A., Sivic, J., and Efros, A. A.
(2012). What makes paris look like paris? ACM
Transactions on Graphics (TOG), 31(4):101.
Felzenszwalb, P., McAllester, D., and Ramanan, D. (2008).
A discriminatively trained, multiscale, deformable
part model. In CVPR, 2008 IEEE Conference on,
pages 1–8. IEEE.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and
Ramanan, D. (2010). Object detection with discrim-
inatively trained part-based models. Pattern Analysis
and Machine Intelligence (PAMI), 2010 IEEE Trans-
actions on, 32(9):1627–1645.
Jain, A., Gupta, A., Rodriguez, M., and Davis, L. S. (2013).
Representing videos using mid-level discriminative
patches. In CVPR, 2013 IEEE Conference on, pages
2571–2578. IEEE.
Jia, X., Yang, H., Lin, A., Chan, K.-P., and Patras, I. Struc-
tured semi-supervised forest for facial landmarks lo-
calization with face mask reasoning. In BMVC, 2014
IEEE International Conference on. IEEE.
Jia, X., Zhu, X., Lin, A., and Chan, K. P. (2013). Face
alignment using structured random regressors com-
bined with statistical shape model fitting. In 28th
International Conference on Image and Vision Com-
puting New Zealand, IVCNZ 2013, Wellington, New
Zealand, November 27-29, 2013, pages 424–429.
Juneja, M., Vedaldi, A., Jawahar, C., and Zisserman, A.
(2013). Blocks that shout: Distinctive parts for scene
classification. In CVPR, 2013 IEEE Conference on,
pages 923–930. IEEE.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond
bags of features: Spatial pyramid matching for recog-
nizing natural scene categories. In CVPR, 2006 IEEE
Conference on, volume 2, pages 2169–2178. IEEE.
Lee, Y. J., Efros, A. A., and Hebert, M. (2013). Style-aware
mid-level representation for discovering visual con-
nections in space and time. In ICCV, 2013 IEEE In-
ternational Conference on, pages 1857–1864. IEEE.
Li, L.-J., Su, H., Fei-Fei, L., and Xing, E. P. (2010). Ob-
ject bank: A high-level image representation for scene
classification & semantic feature sparsification. In
Advances in neural information processing systems,
pages 1378–1386.
Li, Q., Wu, J., and Tu, Z. (2013). Harvesting mid-level
visual concepts from large-scale internet images. In
CVPR, 2013 IEEE Conference on, pages 851–858.
IEEE.
Lim, J. J., Zitnick, C. L., and Doll
´
ar, P. (2013). Sketch to-
kens: A learned mid-level representation for contour
and object detection. In Computer Vision and Pat-
tern Recognition (CVPR), 2013 IEEE Conference on,
pages 3158–3165. IEEE.
Maji, S. and Shakhnarovich, G. (2013). Part discovery from
partial correspondence. In CVPR, 2013 IEEE Confer-
ence on, pages 931–938. IEEE.
Malisiewicz, T., Gupta, A., and Efros, A. A. (2011). En-
semble of exemplar-svms for object detection and be-
yond. In ECCV, 2011 IEEE International Conference
on, pages 89–96. IEEE.
Mittelman, R., Lee, H., Kuipers, B., and Savarese, S.
(2013). Weakly supervised learning of mid-level fea-
tures with beta-bernoulli process restricted boltzmann
machines. In IEEE Conference on Computer Vision
and Pattern Recognition, pages 476–483.
Pandey, M. and Lazebnik, S. (2011). Scene recognition
and weakly supervised object localization with de-
ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods
60