ing. In order to be processed efficiently, each image in
the dataset has been scaled to fit inside a 64 × 64 pix-
els square. We obtained a test accuracy of 87.5% us-
ing a SOM with 1024 neurons, 4 × 4 pixels receptive
field, 2 pyramid levels and colors. The same dataset
was processed using the color PHOG feature with 10
bins and 3 levels, obtaining an OA of 74.0%.
4 CONCLUSIONS
In this paper we presented a model that exploits the
Self-Organizing Map (SOM) neural network to learn
features from images without requiring any supervi-
sion. Our experiments performed on the very chal-
lenging CIFAR-10 and on the Caltech 101 datasets
show that the features learned by the SOM and en-
coded using a pyramidal histogram approach signi-
ficatively outperform the classification methods based
on raw pixels values and the PHOG feature designed
specifically for image classification. Despite the large
number of images processed in the datasets, the pro-
posed feature learning process is fast and requires few
minutes also using SOMs with hundreds of neurons.
Moreover, employing the presented model it is possi-
ble to control the size of the features used to train the
supervised classifier by grouping close neurons in the
histogram encoding scheme. This property allows to
speed up the learning process without having to repeat
the unsupervised feature learning. Experiments show
that the accuracy of the classification can be improved
by applying appropriate normalization and fine tuning
to the receptive field. Other normalization methods,
such as whitening (Hyvarinen and Oja, 2000), and
feature encoding schemes, such as hard or soft pool-
ing (Lazebnik et al., 2006; Jarrett et al., 2009), can be
applied to improve the results and will be considered
in future work. An other interesting future develop-
ment is the use of multiple levels of SOM networks
to learn more complex features that can better charac-
terize visual patterns within the images, this approach
has been successfully applied in our previous work
(Vanetti et al., 2012) for the segmentation of complex
textures.
REFERENCES
Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H.
(2007). Greedy Layer-Wise Training of Deep Net-
works. In Neural Information Processing Systems,
pages 153–160.
Bosch, A., Zisserman, A., and Muoz, X. (2007a). Image
Classification using Random Forests and Ferns. In In-
ternational Conference on Computer Vision, pages 1–
8.
Bosch, A., Zisserman, A., and Muoz, X. (2007b). Repre-
senting shape with a spatial pyramid kernel. In Con-
ference on Image and Video Retrieval, pages 401–408.
Coates, A., Lee, H., and Ng, A. Y. (2011). An analysis of
single-layer networks in unsupervised feature learn-
ing. In AISTATS.
Cortes, C. and Vapnik, V. (1995). Support-vector networks.
Machine Learning, 20:273–297.
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and
Bray, C. (2004). Visual Categorization with Bags of
Keypoints. In European Conference on Computer Vi-
sion.
Gersho, A. and Gray, R. M. (1992). Vector quantization and
signal compression.
Haykin, S. (1999). Neural networks a comprehensive foun-
dation.
Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A
Fast Learning Algorithm for Deep Belief Nets. Neural
Computation, 18:1527–1554.
Hinton, G. E. and Salakhutdinov, R. R. (2006). Reduc-
ing the Dimensionality of Data with Neural Networks.
Science, 313:504–507.
Hyvarinen, A. and Oja, E. (2000). Independent component
analysis: algorithms and applications. Neural Net-
works, 13:411–430.
Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y.
(2009). What is the best multi-stage architecture for
object recognition? In International Conference on
Computer Vision, pages 2146–2153.
Kohonen, T. (1990). The self-organizing map. Proceedings
of the IEEE, 78:1464–1480.
Krizhevsky, A. (2009). Learning multiple layers of features
from tiny images. Technical report.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond
Bags of Features: Spatial Pyramid Matching for Rec-
ognizing Natural Scene Categories. In Computer Vi-
sion and Pattern Recognition, volume 2, pages 2169–
2178.
Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Cor-
rado, G., Dean, J., and Ng, A. (2012). Building high-
level features using large scale unsupervised learning.
In International Conference on Machine Learning.
Lee, H., Battle, A., Raina, R., and Ng, A. Y. (2006). Effi-
cient sparse coding algorithms. In Neural Information
Processing Systems, pages 801–808.
Olshausen, B. A. and Field, D. J. (1996). Emergence of
simple-cell receptive field properties by learning a
sparse code for natural images. Nature, 381:607–609.
Raina, R., Madhavan, A., and Ng, A. Y. (2009). Large-scale
deep unsupervised learning using graphics processors.
In International Conference on Machine Learning,
pages 110–880.
Vanetti, M., Gallo, I., and Nodari, A. (2012). Unsuper-
vised self-organizing texture descriptor. In Images:
Fundamentals, Methods and Applications (CompIM-
AGE2012).
UnsupervisedFeatureLearningusingSelf-organizingMaps
601