occurrences. Histogram intersection kernel has been
used to decrease the complexity of the algorithm. We
implemented SP BoVW for comparison purpose and
different feature detection methods are evaluated. We
noticed that the geometrical based method is robust
if strong geometrical deformations are present on the
face, which is the case with acted expressions. Howe-
ver for spontaneous expressions where the facial de-
formations are more subtle, it appears that geometri-
cal based methods alone are not so efficient to achieve
good performance due to the fact that each person has
a different way to react to a given emotion. For fu-
ture work, the idea would be to combine the proposed
approach with an appearance based facial expression
recognition method we developed in (Chanti and Cap-
lier, 2017), in order to take benefit of the advantages
of both approaches.
REFERENCES
Aldavert, D., Rusi
˜
nol, M., Toledo, R., and Llad
´
os, J. (2015).
A study of bag-of-visual-words representations for
handwritten keyword spotting. International Jour-
nal on Document Analysis and Recognition (IJDAR),
18(3):223–234.
Altintakan, U. L. and Yazici, A. (2015). Towards ef-
fective image classification using class-specific code-
books and distinctive local features. IEEE Transacti-
ons on Multimedia, 17(3):323–332.
Arthur, D. and Vassilvitskii, S. (2007). k-means++: The
advantages of careful seeding. In Proceedings of the
eighteenth annual ACM-SIAM symposium on Discrete
algorithms, pages 1027–1035. Society for Industrial
and Applied Mathematics.
Chanti, D. A. and Caplier, A. (2017). Spontaneous fa-
cial expression recognition using sparse representa-
tion. In Proceedings of the 12th International Joint
Conference on Computer Vision, Imaging and Com-
puter Graphics Theory and Applications (VISIGRAPP
2017), pages 64–74.
Furuya, T. and Ohbuchi, R. (2009). Dense sampling and fast
encoding for 3d model retrieval using bag-of-visual
features. In Proceedings of the ACM international
conference on image and video retrieval, page 26.
ACM.
Grauman, K. and Darrell, T. (2007). The pyramid match
kernel: Efficient learning with sets of features. Jour-
nal of Machine Learning Research, 8(Apr):725–760.
Hariri, W., Tabia, H., Farah, N., Declercq, D., and Benoua-
reth, A. (2017). Geometrical and visual feature quan-
tization for 3d face recognition. In VISAPP 2017 12th
Joint Conference on Computer Vision, Imaging and
Computer Graphics Theory and Applications.
Ionescu, R. T., Popescu, M., and Grozea, C. (2013). Lo-
cal learning to improve bag of visual words model for
facial expression recognition. In Workshop on chal-
lenges in representation learning, ICML.
Kanade, T., Cohn, J. F., and Tian, Y. (2000). Comprehensive
database for facial expression analysis. In Automa-
tic Face and Gesture Recognition, 2000. Proceedings.
Fourth IEEE International Conference on, pages 46–
53. IEEE.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond
bags of features: Spatial pyramid matching for recog-
nizing natural scene categories. In 2006 IEEE Compu-
ter Society Conference on Computer Vision and Pat-
tern Recognition (CVPR’06), volume 2, pages 2169–
2178.
Leskovec, J., Rajaraman, A., and Ullman, J. D. (2014). Mi-
ning of massive datasets. Cambridge University Press.
Lyons, M., Akamatsu, S., Kamachi, M., and Gyoba, J.
(1998). Coding facial expressions with gabor wa-
velets. In Automatic Face and Gesture Recognition,
1998. Proceedings. Third IEEE International Confe-
rence on, pages 200–205. IEEE.
Pantic, M., Valstar, M., Rademaker, R., and Maat, L.
(2005). Web-based database for facial expression ana-
lysis. In Multimedia and Expo, 2005. ICME 2005.
IEEE International Conference on, pages 5–pp. IEEE.
Peng, X., Wang, L., Wang, X., and Qiao, Y. (2016). Bag of
visual words and fusion methods for action recogni-
tion: Comprehensive study and good practice. Com-
puter Vision and Image Understanding, 150:109–125.
Scovanner, P., Ali, S., and Shah, M. (2007). A 3-
dimensional sift descriptor and its application to
action recognition. In Proceedings of the 15th ACM
international conference on Multimedia, pages 357–
360. ACM.
Sivic, J. and Zisserman, A. (2003). Video google: A text
retrieval approach to object matching in videos. In
null, page 1470. IEEE.
Tcherkassof, A., Dupr
´
e, D., Meillon, B., Mandran, N., Du-
bois, M., and Adam, J.-M. (2013). Dynemo: A vi-
deo database of natural facial expressions of emoti-
ons. The International Journal of Multimedia & Its
Applications, 5(5):61–80.
Van Gemert, J. C., Veenman, C. J., Smeulders, A. W., and
Geusebroek, J.-M. (2010). Visual word ambiguity.
IEEE transactions on pattern analysis and machine
intelligence, 32(7):1271–1283.
Xie, Y., Jiang, S., and Huang, Q. (2013). Weighted visual
vocabulary to balance the descriptive ability on gene-
ral dataset. Neurocomputing, 119:478–488.
Zhang, S., Tian, Q., Hua, G., Huang, Q., and Gao, W.
(2011). Generating descriptive visual words and vi-
sual phrases for large-scale image applications. IEEE
Transactions on Image Processing, 20(9):2664–2677.
Zhu, Q., Zhong, Y., Zhao, B., Xia, G.-S., and Zhang, L.
(2016). Bag-of-visual-words scene classifier with lo-
cal and global features for high spatial resolution re-
mote sensing imagery. IEEE Geoscience and Remote
Sensing Letters, 13(6):747–751.
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
152