Authors:
Grégoire Mesnil
1
;
Salah Rifai
2
;
Antoine Bordes
3
;
Xavier Glorot
2
;
Yoshua Bengio
2
and
Pascal Vincent
2
Affiliations:
1
Université de Montréal and Université de Rouen, Canada
;
2
Université de Montréal, Canada
;
3
Université de Technologie de Compiègne, France
Keyword(s):
Unsupervised Learning, Transfer Learning, Deep Learning, Scene Categorization, Object Detection.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Classification
;
Computational Intelligence
;
Computer Vision, Visualization and Computer Graphics
;
Health Engineering and Technology Applications
;
Human-Computer Interaction
;
Image Understanding
;
Methodologies and Methods
;
Neural Networks
;
Neurocomputing
;
Neurotechnology, Electronics and Informatics
;
Object Recognition
;
Pattern Recognition
;
Physiological Computing Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Software Engineering
;
Theory and Methods
Abstract:
Classifying scenes (e.g. into “street”, “home” or “leisure”) is an important but complicated task nowadays, because images come with variability, ambiguity, and a wide range of illumination or scale conditions. Standard approaches build an intermediate representation of the global image and learn classifiers on it. Recently, it has been proposed to depict an image as an aggregation of its contained objects:the representation on which
classifiers are trained is composed of many heterogeneous feature vectors derived from various object detectors. In this paper, we propose to study different approaches to efficiently combine the data extracted by these detectors. We use the features provided by Object-Bank (Li-Jia Li and Fei-Fei, 2010a) (177 different object detectors producing 252 attributes each), and show on several benchmarks for scene categorization that careful combinations, taking into account the structure of the data, allows to greatly improve over original results (from +5% to
+11%) while drastically reducing the dimensionality of the representation by 97% (from 44;604 to 1; 000).
(More)