Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and
Lin, C.-J. (2008). Liblinear: A library for large linear
classification. J. Mach. Learn. Res., 9:1871–1874.
Farhadi, A., Endres, I., Hoiem, D., and Forsyth, D. (2009).
Describing objects by their attributes. IEEE Confer-
ence on Computer Vision and Pattern Recognition,
pages 1778–1785.
Fei-Fei, L. and Perona, P. (2005). A bayesian hierarchi-
cal model for learning natural scene categories. In
Proceedings of the 2005 IEEE Computer Society Con-
ference on Computer Vision and Pattern Recognition
(CVPR’05) - Volume 2 - Volume 02, CVPR ’05, pages
524–531. IEEE Computer Society.
Felzenszwalb, P., McAllester, D., and Ramanan, D. (2008).
A discrimitatively trained, multiscale, deformable part
model. CVPR.
Gao, S., Tsang, I., Chia, L., and Zhao, P. (2010). Local fea-
tures are not lonely laplacian sparse coding for image
classification. IEEE Conference on Computer Vision
and Pattern Recognition.
Goodfellow, I., Le, Q., Saxe, A., and Ng, A. (2009). Mea-
suring invariances in deep networks. In NIPS’09,
pages 646–654.
Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast
learning algorithm for deep belief nets. Neural Com-
putation, 18:1527–1554.
Hofmann, T. (2001). Unsupervised learning by probabilistic
latent semantic analysis. Mach. Learn., 42:177–196.
Hoiem, D., Efros, A., and Hebert, M. (2005). Automatic
photo pop-up. SIGGRAPH, 24(3):577584.
Hotelling, H. (1933). Analysis of a complex of statistical
variables into principal components. Journal of Edu-
cational Psychology, 24:417–441, 498–520.
Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun,
Y. (2009). What is the best multi-stage architecture
for object recognition? In Proc. International Con-
ference on Computer Vision (ICCV’09), pages 2146–
2153. IEEE.
Kavukcuoglu, K., Ranzato, M., Fergus, R., and LeCun,
Y. (2009). Learning invariant features through topo-
graphic filter maps. In Proc. CVPR’09, pages 1605–
1612. IEEE.
Larochelle, H., Bengio, Y., Louradour, J., and Lamblin, P.
(2009). Exploring strategies for training deep neural
networks. JMLR, 10:1–40.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond
bags of features: Spatial pyramid matching for recog-
nizing natural scene categories. IEEE Conference on
Computer Vision and Pattern Recognition.
LeCun, Y., Haffner, P., Bottou, L., and Bengio, Y. (1999).
Object recognition with gradient-based learning. In
Shape, Contour and Grouping in Computer Vision,
pages 319–345. Springer.
Li, L.-J. and Fei-Fei, L. (2007). What, where and who?
classifying events by scene and object recognition.
ICCV.
Li-Jia Li, Hao Su, E. P. X. and Fei-Fei, L. (2010a). Ob-
ject bank: A high-level image representation for scene
classification and semantic feature sparsification. Pro-
ceedings of the Neural Information Processing Sys-
tems (NIPS).
Li-Jia Li, Hao Su, Y. L. and Fei-Fei, L. (2010b). Ob-
jects as attributes for scene classification. In Eu-
ropean Conference of Computer Vision (ECCV), In-
ternational Workshop on Parts and Attributes, Crete,
Greece.
Mesnil, G., Dauphin, Y., Glorot, X., Rifai, S., Bengio, Y.,
Goodfellow, I., Lavoie, E., Muller, X., Desjardins,
G., Warde-Farley, D., Vincent, P., Courville, A., and
Bergstra, J. (2012). Unsupervised and transfer learn-
ing challenge: a deep learning approach. In Guyon,
I., Dror, G., Lemaire, V., Taylor, G., and Silver, D.,
editors, JMLR W& CP: Proceedings of the Unsuper-
vised and Transfer Learning challenge and workshop,
volume 27, pages 97–110.
Oliva, A. and Torralba, A. (2006). Building the gist of a
scene: The role of global image features in recogni-
tion. Visual Perception, Progress in Brain Research,
155.
Pandey, M. and Lazebnik, S. (2011). Scene recognition
and weakly supervised object localization with de-
formable part-based models. ICCV.
Pearson, K. (1901). On lines and planes of closest fit to
systems of points in space. Philosophical Magazine,
2(6):559–572.
Quattoni, A. and Torralba, A. (2009). Recognizing indoor
scenes. CVPR.
Ranzato, M., Poultney, C., Chopra, S., and LeCun, Y.
(2007). Efficient learning of sparse representations
with an energy-based model. In NIPS’06.
Rifai, S., Mesnil, G., Vincent, P., Muller, X., Bengio, Y.,
Dauphin, Y., and Glorot, X. (2011a). Higher order
contractive auto-encoder. In European Conference
on Machine Learning and Principles and Practice of
Knowledge Discovery in Databases (ECML PKDD).
Rifai, S., Vincent, P., Muller, X., Glorot, X., and Bengio,
Y. (2011b). Contracting auto-encoders: Explicit in-
variance during feature extraction. In Proceedings
of the Twenty-eight International Conference on Ma-
chine Learning (ICML’11).
Russell, B. C., Torralba, A., Murphy, K. P., and Freeman,
W. T. (2008). Labelme: A database and web-based
tool for image annotation. Int. J. Comput. Vision,
77:157–173.
Serre, T., Wolf, L., and Poggio, T. (2005). Object recog-
nition with features inspired by visual cortex. IEEE
Conference on Computer Vision and Pattern Recogni-
tion.
Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A.,
and Jain, R. (2000). Content-based image retrieval at
the end of the early years. IEEE Trans. Pattern Anal.
Mach. Intell., 22:1349–1380.
Torralba, A. (2003). Contextual priming for object de-
tection. International Journal of Computer Vision,
53(2):169–191.
Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-
A. (2008). Extracting and composing robust features
with denoising autoencoders. In Cohen, W. W., Mc-
Callum, A., and Roweis, S. T., editors, ICML’08,
pages 1096–1103. ACM.
Vogel, J. and Schiele, B. (2004). Natural scene retrieval
based on a semantic modeling step. In Proceeedings
UnsupervisedandTransferLearningunderUncertainty-FromObjectDetectionstoSceneCategorization
353