this case. However, if there is no informa-
tion about foreground/background labels, the
FB_Adaboost_miSVM is the most suitable method
as it will automatically estimate the weights of
foreground/background information in learning the
ensemble classifier.
7 CONCLUSIONS
In this paper, we have presented a novel approach
for image annotation with the foreground and back-
ground decomposition on the view of multiple in-
stance learning. This study is to make use of saliency
map to reduce the ambiguity in multiple instance
learning for image annotation. Therefore, a simple
method based on sampling/weighting method is con-
sidered. The main idea is that the salient objects have
higher probability to be foreground.
The empirical results in this paper show that ap-
plying the foreground and background decomposition
to image annotation can yield good performance in
most cases. This provides a very simple and efficient
solution to weak labeling problem in image annota-
tion.
The results in this paper additionally show that
models using both foreground and background infor-
mation in image annotation outperforms models only
use foreground or background information. Besides,
it also proves that classifier ensemble based on Ad-
aBoost significantly improves the classification accu-
racy.
REFERENCES
Andrews, S., Tsochantaridis, I., and Hofmann, T. (2002).
Support vector machines for multiple-instance learn-
ing. In NIPS, pages 561–568.
Carneiro, G., Chan, A. B., Moreno, P. J., and Vasconcelos,
N. (2007). Supervised learning of semantic classes for
image annotation and retrieval. IEEE Trans. Pattern
Anal. Mach. Intell., 29(3):394–410.
Duygulu, P., Barnard, K., Freitas, J. F. G. d., and Forsyth,
D. A. (2002). Object recognition as machine trans-
lation: Learning a lexicon for a fixed image vocabu-
lary. In Proceedings of the 7th European Conference
on Computer Vision-Part IV, ECCV ’02, pages 97–
112, London, UK.
Guillaumin, M., Mensink, T., Verbeek, J., and Schmid, C.
(2009). Tagprop: Discriminative metric learning in
nearest neighbor models for image auto-annotation. In
Proceedings of the 12th International Conference on
Computer Vision (ICCV), pages 309–316.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann,
P., and Witten, I. H. (2009). The weka data mining
software: An update. In SIGKDD Explorations, vol-
ume 11.
Hou, X., Harel, J., and Koch, C. (2012). Image signature:
Highlighting sparse salient regions. IEEE Trans. Pat-
tern Anal. Mach. Intell., 34(1):194–201.
Lavrenko, V., Manmatha, R., and Jeon, J. (2003). A model
for learning the semantics of pictures. In Proccedings
of the 16th Conference on Neural Information Pro-
cessing Systems (NIPS’03). MIT Press.
Navalpakkam, V. and Itti, L. (2006). An integrated model of
top-down and bottom-up attention for optimizing de-
tection speed. In In IEEE Conference on Computer Vi-
sion and Pattern Recognition, CVPR ’06, pages 2049–
2056.
Nguyen, C.-T., Kaothanthong, N., Phan, X.-H., and
Tokuyama, T. (2010). A feature-word-topic model for
image annotation. In Proceedings of the 19th ACM
international conference on Information and knowl-
edge management, CIKM ’10, pages 1481–1484, New
York, USA. ACM.
Nguyen, C.-T., Le, H. V., and Tokuyama, T. (2011). Cas-
cade of multi-level multi-instance classifiers for image
annotation. In KDIR ’11: Proceedings of the Interna-
tional Conference on Knowledge Discovery and Infor-
mation Retrieval, pages 14–23, Paris, France.
Qi, X. and Han, Y. (2007). Incorporating multiple svms
for automatic image annotation. Pattern Recogn.,
40(2):728–741.
Rokach, L. (2010). Ensemble-based classifiers. Artif. Intell.
Rev., 33(1-2):1–39.
Shi, J. and Malik, J. (2000). Normalized cuts and image
segmentation. IEEE Trans. Pattern Anal. Mach. In-
tell., 22(8):888–905.
Ueli, R., Dirk, W., Christof, K., and Pietro, P. (2004). Is
bottom-up attention useful for object recognition. In
In IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2004, pages 37–44.
Yang, C., Dong, M., and Hua, J. (2006). Region-based
image annotation using asymmetrical support vector
machine-based multiple-instance learning. In Pro-
ceedings of the 2006 IEEE Computer Society Confer-
ence on Computer Vision and Pattern Recognition -
Volume 2, CVPR ’06, pages 2057–2063, Washington,
DC, USA. IEEE Computer Society.
Zhou, Z.-H. and Zhang, M.-L. (2007). Multi-instance multi-
label learning with application to scene classification.
In Proceedings of the 19th Conference on Neural In-
formation Processing Systems (NIPS), pages 1609–
1616. Monreal, Canada.
AMultipleInstanceLearningApproachtoImageAnnotationwithSaliencyMap
159