Robust Human Detection using Bag-of-Words and Segmentation
Yuta Tani, Kazuhiro Hotta
2015
Abstract
It is reported that Bag-of-Words (BoW) is effective to detect humans with large pose changes and occlusions in still images. BoW can make consistent representation even if a human has pose changes and occlusions. However, the conventional method represents all information within a bounding box as positive data. Since the bounding box is the rectangle including a human, background region is also included in BoW representation. The background region affects BoW representation and the detection accuracy decreases. Thus, in this paper, we propose to segment the region by GrabCut or Color Names, and the influence of background is reduced and we can obtain BoW histogram from only human region. By the comparison with the deformable part model (DPM) and conventional method using BoW, the effectiveness of our method is demonstrated.
References
- Russakovsky, O., Lin, Y., Yu, K. and Fei-Fei, L., 2012. Object-centric spatial pooling for image classification, European Conference on Computer Vision.
- Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D., 2010. Object Detection with Discriminatively Trained Part Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, Sep.
- Csurka, G., Dance, C., Fan, L., Willamowski, J. and Bray, C., 2004. Visual Categorization with Bags of Keypoints, Proc. of ECCV Workshop on Statistical Learning in Computer Vision, pp. 59-74.
- Arandjelovi'c, R. and Zisserman, A., 2012. Three things everyone should know to improve object retrieval, In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2911-2918.
- Discriminatively trained deformable part models. http://cs.brown.edu/pff/latent-release4/
- Lazebnik, S., Schmid, C. and Ponce, J., 2006. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2169-2178.
- Fan, R., Chang, K., Hsieh, C., Wang, X. and Lin, C. 2008. LIBLINEAR: A library for large linear classification, Journal of Machine Learning Research 9, pp. 1871- 1874.
- Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L.J. and Fei-Fei, L., 2011. Human Action Recognition by Learning Bases of Action Attributes and Parts, Internation Conference on Computer Vision.
- Vedaldi, A. and Zisserman, A., 2010. Efficient Additive Kernels via Explicit Feature Maps, In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 34, No. 3, pp. 480-492.
- Tani, Y. and Hotta, K., 2014. Robust Human Detection to Pose and Occlusion Using Bag-of-Words, International Conference on Pattern Recognition, pp. 4376-4381.
- Rother, C., Kolmogorov, V., and Blake, A., 2004. GrabCut: Interactive foreground extraction using iterated graph cuts, The ACM Special Interest Group on Computer Graphics, Vol. 23, pp. 309-314.
- Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and. Susstrunk, S. 2010. SLIC superpixels, Technical report, EPFL.
- Weijer J., Schmid C., 2007, Applying Color Names to Image Description, In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 3, pp. 493-496.
- Gavves E., Fernando B., Snoek C.G.M., Smeulders A.W.M., and Tuytelaars T, 2013, Fine-Grained Categorization by Alignments, In IEEE International Conference on Computer Vision.
Paper Citation
in Harvard Style
Tani Y. and Hotta K. (2015). Robust Human Detection using Bag-of-Words and Segmentation . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-090-1, pages 504-509. DOI: 10.5220/0005354705040509
in Bibtex Style
@conference{visapp15,
author={Yuta Tani and Kazuhiro Hotta},
title={Robust Human Detection using Bag-of-Words and Segmentation},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)},
year={2015},
pages={504-509},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005354705040509},
isbn={978-989-758-090-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)
TI - Robust Human Detection using Bag-of-Words and Segmentation
SN - 978-989-758-090-1
AU - Tani Y.
AU - Hotta K.
PY - 2015
SP - 504
EP - 509
DO - 10.5220/0005354705040509