image categorization and retrieval. The construc-
tion of BoW starts by the construction of local fea-
tures. After that, two steps are necessary: Encoding
and pooling. Despite their simplicity, the spatial in-
formation is ignored. In this paper, we introduces
the methods that improve the construction of BoW
such as LLC, Fisher vector, VLAD, VLAT and the
methods that integrate the spatial information such as
approches based on pairwise features (LPC, DLPC,
CCC, ...), approaches based on visual phrases (Phre-
selet), and approaches based on graph.
REFERENCES
Avrithis, S. and Kalantidis, Y. (2012). Approximate gaus-
sian mixtures for large scale vocabularies. In Euro-
pean Conference on Computer Vision, volume 7574,
pages 15–28. Springer.
Bay, H., Tuytelaars, T., and Gool, L. (2006). Surf speeded
up robust features. In European Conference on Com-
puter Vision.
Bingbing, N., Shuicheng, Y., Meng, W., Kassim, A., and Qi,
T. (2013). High order local spatial context modeling
by spatialized random forest. IEEE Transactions on
Image Processing, 22(2).
Bosch, A., Zisserman, A., and Muoz., X. (2008). Scene
classification using a hybrid generative/discriminative
approach. IEEE Transactions Pattern Analysis and
Machine Intelligents, 30:712–727.
Bowen, F., Du, E. Y., and Hu, J. (2012). A novel graph-
based invariant region descriptor for image matching.
In EIT.
Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T.,
Strecha, C., and Fua, P. (2012). Computing a local
binary descriptor very fast. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 34(7):1281–
1298.
Cao, Y., Wang, C., Li, Z., Zhang, L., and Zhang, L. (2010).
Spatial bag of features. In CVPR.
Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray,
C. (2004). Visual categorization with bags of key-
points. In In Workshop on Statistical Learning in
Computer Vision, ECCV, pages 1–22.
Csurka, G. and Perronnin, F. (2010). Fisher vectors: Be-
yond bag-of-visual-words image representations. In
VISIGRAPP, pages 28–42.
Duchenne, O., Joulin, A., and Ponce, J. (2011). A graph-
matching kernel for object categorization. In ICCV.
Farquhar, J., Szedmak, S., Meng, H., and Shawe-Taylor, J.
(2005). Improving bag-of keypoints image categori-
sation. Technical report, University of Southampton.
Freund, Y., Iyer, R., Schapire, R. E., and Singer, Y. (2003).
An efficient boosting algorithm for combining prefer-
ences. J. Mach. Learn. Res., 4:933–969.
G. Mclachlan, D. P. (2000). Finite mixture models.
Gemert, J., Veenman, C., and Geusebroek, J. (2010). Visu-
alword ambiguity. TPAMI.
Guo, X. and Cao, X. (2010). Find: A neat flip invariant de-
scriptor. In 20th International Conference on Pattern
Recognition, pages 515–518.
Herve, N. and Boujemaa, N. (2009). Visual word pairs for
automatic image annotation. In Proceedings of the
2009 IEEE international conference on Multimedia
and Expo, ICME 09.
H.Jeou, Perronnin, F., Douze, M., Sanchez, J., Perez, P.,
and Schmid, C. (2012). Aggregating local image de-
scriptors into compact codes. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 34(9).
Jaechul Kim, K. G. (2010). Asymmetric region to image
matching for comparing images with generic object
categories. In CVPR.
Jiang, Y., Meng, J., and Yuan, J. (2012). Randomized visual
phrases for object search. In CVPR.
Khan, R., Barat, C., Muselet, D., and Ducottet, C. (2012).
Spatial orientations of visual word pairs to improve
bag-of-visual-words model. In BMVC.
Kisku, D. R., Rattani, A., Grosso, E., and Tistarelli, M.
(2010). Face identification by sift-based complete
graph topology. CoRR.
Leordeanu, M. and Hebert, M. (2005). A spectral tech-
nique for correspondence problems using pairwise
constraints. In Tenth IEEE International Conference
on Computer Vision, pages 1482–1489.
Liu, L., Wang, L., and Liu, X. (2011). In defense of soft-
assignment coding. In International Conference on
Computer Vision, ICCV ’11, pages 2486–2493.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal on Com-
puter Vision, 60(2).
Ma, R., Chen, J., and Su, Z. (2010). Mi-sift: mirror and in-
version invariant generalization for sift descriptor. In-
ternational Conf. on Image and Video Retrieval, pages
228–236.
Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2010). On-
line learning for matrix factorization and sparse cod-
ing. Journal of Machine Learning Research, 11:19–
60.
Morioka, N. and Satoh, S. (2010a). Building compact lo-
cal pairwise codebook with joint feature space cluster-
ing. In 11th European conference on Computer vision,
ECCV10.
Morioka, N. and Satoh, S. (2010b). Learning directional
local pairwise bases with sparse coding. In BMVC.
Morioka, N. and Satoh, S. (2011). Compact correlation cod-
ing for visual object categorization. In ICCV.
Pham, T., Mulhem, P., Maisonnasse, L., Gaussier, E., and
Lim, J. (2012). Visual graph modeling for scene
recognition and mobile robot localization. Multime-
dia Tools Appl., 60(2).
Picard, D. and Gosselin, P. (2013). Efficient image signa-
tures and similarities using tensor products of local de-
scriptors. Computer Vision and Image Understanding,
117(6):680–687.
Quack, T., Ferrari, V., Leibe, B., and Gool, L. V. (2007). Ef-
ficient mining of frequent and distinctive feature con-
figurations. In International Conference on Computer
Vision).
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
682