Automatic Image Annotation Using Convex Deep Learning Models

Niharjyoti Sarangi, C. Chandra Sekhar

Abstract

Automatically assigning semantically relevant tags to an image is an important task in machine learning. Many algorithms have been proposed to annotate images based on features such as color, texture, and shape. Success of these algorithms is dependent on carefully handcrafted features. Deep learning models are widely used to learn abstract, high level representations from raw data. Deep belief networks are the most commonly used deep learning models formed by pre-training the individual Restricted Boltzmann Machines in a layer-wise fashion and then stacking together and training them using error back-propagation. In the deep convolutional networks, convolution operation is used to extract features from different sub-regions of the images to learn better representations. To reduce the time taken for training, models that use convex optimization and kernel trick have been proposed. In this paper we explore two such models, Tensor Deep Stacking Network and Kernel Deep Convex Network, for the task of automatic image annotation. We use a deep convolutional network to extract high level features from raw images, and then use them as inputs to the convex deep learning models. Performance of the proposed approach is evaluated on benchmark image datasets.

References

  1. Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798-1828.
  2. Boutell, M. R., Luo, J., Shen, X., and Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9):1757-1771.
  3. Deng, L., T ür, G., He, X., and Hakkani-T ür, D. Z. (2012a). Use of kernel deep convex networks and end-to-end learning for spoken language understanding. In IEEE Workshop on Spoken Language Technologies, pages 210-215.
  4. Deng, L. and Yu, D. (2011). Deep convex network: A scalable architecture for speech pattern classification. In Interspeech.
  5. Deng, L., Yu, D., and Platt, J. (2012b). Scalable stacking and learning for building deep architectures. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing.
  6. Elisseeff, A. and Weston, J. (2001). A kernel method for multi-labelled classification. In Advances in Neural Information Processing Systems 14, pages 681-687.
  7. Hare, J., Samangooei, S., Lewis, P., and Nixon, M. (2008). Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces. In Proceedings of the International conference on Content-based image and video retrieval, pages 359-368.
  8. Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006a). A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527-1554.
  9. Hinton, G. E., Osindero, S., Welling, M., and Teh, Y. W. (2006b). Unsupervised discovery of nonlinear structure using contrastive backpropagation. Cognitive Science, 30(4):725-731.
  10. Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507.
  11. Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the Uncertainty in Artificial Intelligence, pages 289-296.
  12. Huiskes, M. J. and Lew, M. S. (2008). The mir flickr retrieval evaluation. In Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval.
  13. Hutchinson, B., Deng, L., and Yu, D. (2013). Tensor deep stacking networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1944-1957.
  14. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the Neural Information Processing System, volume 22, pages 1106-1114.
  15. Le Roux, N. and Bengio, Y. (2008). Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation, 20(6):1631-1649.
  16. LeCun, Y., Kavukcuoglu, K., and Farabet, C. (2010). Convolutional networks and applications in vision. In Proceedings of International Symposium on Circuits and Systems, pages 253-256.
  17. Lee, H., Grosse, R., Ranganath, R., and Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 609-616.
  18. Lowe, D. G. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2):91-110.
  19. Montavon, G., Braun, M. L., and Mller, K.-R. (2012). Deep Boltzmann machines as feed-forward hierarchies. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 22, pages 798-804.
  20. Ranzato, M., Krizhevsky, A., and Hinton, G. E. (2010). Factored 3-way restricted Boltzmann machines for modeling natural images. Journal of Machine Learning Research - Proceedings Track, 9:621-628.
  21. Read, J., Pfahringer, B., Holmes, G., and Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3):333-359.
  22. Salakhutdinov, R. and Hinton, G. (2009). Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, pages 448-455.
  23. Tsoumakas, G. and Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3):1-13.
  24. Vens, C., Struyf, J., Schietgat, L., Dz?eroski, S., and Blockeel, H. (2008). Decision trees for hierarchical multilabel classification. Machine Learning, 73(2):185- 214.
  25. Washington, U. (2004). Washington ground truth database. http://www.cs.washington.edu/research/imagedatabase.
  26. Zhang, M.-L. and Zhou, Z.-H. (2007). Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7):2038 - 2048.
  27. Zhang, M.-L. and Zhou, Z.-H. (2014). A review on multilabel learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8):1819-1837.
Download


Paper Citation


in Harvard Style

Sarangi N. and Chandra Sekhar C. (2015). Automatic Image Annotation Using Convex Deep Learning Models . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM, ISBN 978-989-758-077-2, pages 92-99. DOI: 10.5220/0005216700920099


in Bibtex Style

@conference{icpram15,
author={Niharjyoti Sarangi and C. Chandra Sekhar},
title={Automatic Image Annotation Using Convex Deep Learning Models},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,},
year={2015},
pages={92-99},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005216700920099},
isbn={978-989-758-077-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,
TI - Automatic Image Annotation Using Convex Deep Learning Models
SN - 978-989-758-077-2
AU - Sarangi N.
AU - Chandra Sekhar C.
PY - 2015
SP - 92
EP - 99
DO - 10.5220/0005216700920099