rotational invariance/direction, scale invariance/scale,
contrast invariance/contrast. One motivation is that
a learning process, applied to the invariant descriptor
parts, in general can be made much more efficient as
fewer training examples have to be presented. Such
a learning process would result in a reasonably large
but limited vocabulary from which higher level de-
scriptors can be formed. The non-invariant measures
from the low-level stage could then be used in higher
level descriptors, such as hyper-features (Agarwal and
Triggs, 2006), to capture semi-local properties. The
work presented here should be seen as a first step to-
wards such a framework, by introducing descriptors
showing a subset of the desired properties.
We have studied local histogram based image de-
scriptors for representation of image structures. Ap-
plying them on a grid, we have tested these descrip-
tors on classification tasks using well-known data sets
for object classification in a bag-of-words fashion.
The best of the proposed descriptors are based on
the responses of Laplace operators applied at differ-
ent scales, which means that the descriptors capture
the distribution of the texture width and relative tex-
ture strength in the underlying subregion.
The classification performance has been com-
pared to descriptors presented in earlier work, based
on global histograms of the same basic operators. For
the classification tasks, the local histogram based de-
scriptors show similar or, in some cases, even better
performance than the global histograms while achiev-
ing a significantly more compact representation.
The descriptors are of low dimension and suit-
able for hierarchical representations. For the given
data sets, the classification and recognition results are
comparable to the best known results from descrip-
tors of similar complexity. We have shown how lo-
cal contrast normalization improves the performance
and is important for limited vocabularies. When con-
trast variations are applied to the data, there is a sub-
stantially increased classification/recognition perfor-
mance from the proposed contrast normalization step.
We plan to further enhance these first-level de-
scriptors by introducing a direction dependent part,
together with a local scale measure and scale selec-
tion mechanism for full scale invariance. We will then
explore how the scale, contrast and direction informa-
tion can be incorporated in higher level descriptors.
REFERENCES
Agarwal, A. and Triggs, B. (2006). Hyperfeatures: Mul-
tilevel local coding for visual recognition. In Proc.
ECCV, pages I: 30–43.
Bosch, A., Zisserman, A., and Munoz, X. (2007). Image
classification using random forests and ferns. In Proc.
ICCV, Rio de Janeiro, Brazil.
Chang, C.-C. and Lin, C.-J. (2001). LIBSVM: a library
for support vector machines. Software available at
http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Csurka, G., Bray, C., Dance, C., and Fan, L. (2004). Vi-
sual categorization with bags of keypoints. In Proc.
ECCV International Workshop on Statistical Learning
in Computer Vision.
Dorkó, G. and Schmid, C. (2005). Object class recogni-
tion using discriminative local features. Rapport de
recherche RR-5497, INRIA - Rhone-Alpes.
Fei-Fei, L. and Perona, P. (2005). A bayesian hierarchical
model for learning natural scene categories. In Proc.
IEEE Conf. CVPR, pages II: 524–531.
Fergus, R., Perona, P., and Zisserman, A. (2003). Ob-
ject class recognition by unsupervised scale-invariant
learning. In Proc. IEEE Conf. CVPR, pages II:264–
271, Madison, Wisconsin.
Jurie, F. and Triggs, B. (2005). Creating efficient codebooks
for visual recognition. In Proc. ICCV.
Koenderink, J. J. and Doorn, A. J. V. (1999). The structure
of locally orderless images. Int. J. Comput. Vision,
31(2-3):159–168.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond
bags of features: Spatial pyramid matching for rec-
ognizing natural scene categories. In Proc. IEEE
CVPR, pages 2169–2178, Washington, DC, USA.
IEEE Computer Society.
Leibe, B. and Schiele, B. (2003). Interleaved object catego-
rization and segmentation. In Proc. British Machine
Vision Conference, Norwich, GB.
Linde, O. and Bretzner, L. (2008). Local histogram
based descriptors for recognition. Technical report,
CVAP/CSC/KTH.
Linde, O. and Lindeberg, T. (2004). Object recognition us-
ing composed receptive field histograms of higher di-
mensionality. In ICPR, Cambridge, U.K.
Lindeberg, T. (1994). Scale-Space Theory in Com-
puter Vision. Kluwer Academic Publishers, Dor-
drecht, Netherlands.
Lowe, D. (2004). Distinctive image features from scale-
invariant keypoints. In IJCV, vol. 20, pp. 91–110.
Nene, S. A., Nayar, S. K., and Murase, H. (1996). Columbia
object image library (COIL-100). Technical report
CUCS-006-96, CAVE, Columbia University.
Nilsback, M. and Caputo, B. (2004). Cue integration
through discriminative accumulation. In Proc. IEEE
Conf. CVPR, pages II:578–585.
Obdržálek, v. and Matas, J. (2002). Object recognition us-
ing local affine frames on distinguished regions. In
British Machine Vision Conference, pages 113–122.
Puzicha, J., Hofmann, T., and Buhmann, J. (1999). His-
togram clustering for unsupervised segmentation and
image retrieval. Pattern Recognition Letters, 20:899–
909(11).
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
338