For the quantitative evaluation, we randomly and
roughly selected 200 images for each of the 15
labels and computed the average recalls of image
annotation for each label. Note that we adopted the
region features and the visual words that are provide
within the dataset. Figure 4(a) shows each of the
recalls for 15 labels with different iterations, and
Figure 4(b) draws the average recalls of all. For the
comparison, we depict the average recalls using
random choice.
Moreover, we perform another experiment,
without relevance feedback, to show the effect of
using unlabeled images in classifier training. We
adopt F1 value, which considers both precision and
recall, as the evaluation measure, where F1=(2×
precision×recall)/(precision+recall). We change the
numbers of the labeled images with |D
k
|=8, 16, 32,
64, and 128, and we also change the numbers of the
unlabeled images with |D
U
|=0, 800, and 1,600. The
result, in Figure 5, shows that using unlabeled
images can significantly improve the performance,
especially the cases with few labeled images (e.g.,
|D
k
|=8 or 16). That will be very helpful for relevance
feedback because we cannot get many labeled
images at the beginning of the iterations for image
annotation. Using unlabeled images to help the
clustering can reach to a better performance at first
iterations for image annotation.
7 CONCLUSIONS AND FUTURE
WORK
This paper presents an interactive method for image
annotation using a semi-supervised and hierarchical
approach. We apply unlabeled images to assist
classifiers in training to reach a better performance
even though fewer training images are included. We
construct hierarchical classifiers each corresponds to
an individual label that can make the annotation
system more flexible. In the future, we will use
another unsupervised clustering instead of K-means
clustering in our method. We also plan to embed
prior knowledge, e.g., ontology, in the annotation
task. Moreover, we plan to apply the annotation
results to image retrieval.
ACKNOWLEDGEMENTS
This work was supported in part by the grants of 96-
2752-E-002-007-PAE and 96R0062-03.
REFERENCES
Bilenko, M., Basu , S., and Mooney, R. J. (2004).
Integrating Constraints and Metric Learning in Semi-
Supervised Clustering. Proceedings of ICM.
Fei-Fei, L. and Perona, P. (2005). A Bayesian Hierarchical
Model for Learning Natural Scene Categories.
Proceedings of CVPR, pp. 524-531.
Carneiro, G. and Vasconcelos, N. (2005). Formulating
Semantic Image Annotation as a Supervised Learning
Problem. Proceedings of CVPR.
Chang, E. Y., Goh, K., Sychay, G., and Wu, G. (2003).
CBSA: Content-based Soft Annotation for Multimodal
Image Retrieval Using Bayes Point Machines. IEEE
Transaction on Circuits and Systems for Video
Technology, 13(1):26– 38.
Datta, R., Li, J., and Wang, J. Z. (2005). Content-Based
Image Retrieval - Approaches and Trends of the New
Age. Proceedings of the ACM SIGMM international
workshop on MIR.
Duygulu, P., Barnard, K., de Freitas, J. F. G., and Forsyth,
D. A. (2002). Object Recognition as Machine
Translation: Learning a Lexicon for a Fixed Image
Vocabulary. Proceedings of ECCV, pp. 97-112.
Feng, S. L., Manmatha, R., and Lavrenko, V. (2004).
Multiple Bernoulli Relevance Models for Image and
Video Annotation. Proceedings of CVPR.
Jeon, J., Lavrenko, V., and Manmatha, R. (2003).
Automatic Image Annotation and Retrieval using
Cross-Media Relevance Models. Proceedings of ACM
SIGIR.
Jin, W., Shi, R., and Chua, T. –S. (2004). A Semi-Naïve
Bayesian Method Incorporating Clustering with Pair-
Wise Constraints for Auto Image Annotation.
Proceedings of ACMMM.
Lavrenko, V. and Croft, W. (2001). Relevance-Based
Language Models. Proceedings of ACM SIGIR.
Mori, Y., Takahashi, H., and Oka R., (1999). Image-to-
word transformation based on dividing and vector
quantizing images with words. Proceedings of First
International Workshop on Multimedia Intelligent
Storage and Retrieval Management.
Rui, Y., Huang, T. S., Ortega, M., and Mehrotra, S. (1998).
Relevance Feedback: A Power Tool for Interactive
Content-Based Image Retrieval. IEEE Transactions on
Circuits and Systems for Video Technology, vol. 8(5):
644-655.
Srikanth, M., Varner, J., Bowden, M., and Moldovan, D.
(2005). Exploiting Ontologies for Automatic Image
Annotation. Proceedings of ACM SIGIR.
Zhu, X. (2005). Semi-Supervised Learning with Graphs.
Ph.D. Thesis, CMU.
VISAPP 2008 - International Conference on Computer Vision Theory and Applications
178