- we had to represent the content of the images of
the Web. We extracted information from the Web
pages containing the retrieved images, i.e. metadata,
and we modelled this content as a Semantic Space,
similarly to the text case.
- we needed to compare the two Semantic Space.
For that, we defined a similarity measure which
indicates how much the two spaces relates to the
same semantic meaning.
- we needed to re-rank the list of retrieved images.
We propose to extract the words from the common
subspace of the two semantic spaces. Images are
ranked on the basis of the number of tags they share
in the common sub-space.
Results are very satisfactory, and impressive if
compared to those obtained with SPE. The main
difference between the two methods is the image
dataset. SPE uses a personal collection of photos,
annotated by hand, hence is very limited by the
number of images and by the concepts represented
in its dataset. We use Google Image to create, dy-
namically at each query, a new image dataset to
work within. Thus we exploit the knowledge of the
Web, increasing the chances to find images relevant
to the text content.
Furthermore our method has been designed to be
multi-language, as it can be easily extended to other
language if the proper list of stop-words is created.
We also developed a Web implementation of the
proposed method, as a new service to Internet users.
We are confident that our solution will interest
news or advertising agencies, newspapers websites,
bloggers or in general all the users who search for
information into the Web.
At last, the “core” of our system is general pur-
pose, and can be used to compare texts, HTML pag-
es, and all types of annotated document, that can be
retrieved from the Web. That is, we can use Google
Search (Youtube, Wikipedia, etc.) instead of Google
Image, to query for any type of tagged contents
which can be useful to describe an input text.
REFERENCES
Barnard, K., Duygulu, P., et al., 2003. Matching words
and pictures. JMLR, 3:1107–1135.
Barnard, K., and Forsyth, D., 2001. Learning the Seman-
tics of Words and Pictures. Proc. International Con-
ference on Computer Vision, pp. II: 408-415, 2001.
Carney, R. N., and Levin, J. R., 2002, “Pictorial illustra-
tions still improve students' learning from text”, Edu-
cational Psychology Review, 2002, 14(1), 5-26.
Carneiro, G., Chan, A., et al., 2007. Supervised learning of
semantic classes for image annotation and retrieval.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 29(3):394–410.
Carney, R. N., and Levin, J. R., 2002. Pictorial illustra-
tions still improve students' learning from text. Educa-
tional Psychology Review, 14(1), 5-26.
Coelho, F., and Ribeiro, C., 2011, Automatic illustration
with cross-media retrieval in large-scale collections. In
Content-Based Multimedia Indexing (CBMI), 2011
9th International Workshop on (pp. 25-30). IEEE.
Coyne, B., and Sproat, R., 2001. Wordseye: An automatic
text-to-scene conversion system. In Proceedings of the
28th Annual Conference on Computer Graphics and
Interactive Techniques. SIGGRAPH 2001, 487–496.
Deerwester, S., Dumais, S. T., et al., 1990. Indexing by
latent semantic analysis. Journal of the American So-
ciety For Information Science, 41(6), 391-407.
Delgado, D., Magalhaes, J., & Correia, N., 2010. Auto-
mated illustration of news stories. In Semantic Compu-
ting (ICSC), 2010 IEEE Fourth International Confer-
ence on (pp. 73-78). IEEE.
Feng, Y., and Lapata, M., 2010. Topic models for image
annotation and text illustration. In Proceedings of the
NAACL HLT. Association for Computational Linguis-
tics, Los Angeles, California, pages 831–839.
Feng, S., Manmatha, R, and Lavrenko, V., 2004. Multiple
bernoulli relevance models for image and video anno-
tation. In CVPR, volume 2(2004), pp. 1002-1009.
Joshi, D., Wang, J .Z., and Li, J., 2006. The story picturing
engine—a system for automatic text illustration. ACM
Transactions on Mul-timedia Computing, Communica-
tions, and Applications, 2(1):68–89.
Kandola, J. S., Shawe-Taylor, J., and Cristianini, N., 2003.
Learning semantic similarity, In Neural Information
Processing Systems 15 (NIPS 15), pp. 657-664.
Lowe, W., 2001. Towards a theory of semantic space. In
Proceedings of the Twenty-Third Annual Conference
of the Cognitive Science Society 2001 (pp. 576-581).
Mahwah, NJ: Erlbaum.
Miller, G. 1990. WordNet: An on-line lexical database.
Int. Journal of Lexicography, Special Issue, 3(4).
Monay, F., and Gatica-Perez, D., 2007. Modeling seman-
tic aspects for cross-media image indexing. IEEE
Transactions on Pattern Analysis and Machine Intelli-
gence, 29(10):1802–1817.
Rasiwasia, N., Pereira, J. C., et al. 2010. A new approach
to cross-modal multimedia retrieval. In Proceedings of
the International Conference on Multimedia (MM
'10), 251-260.
Yutaka, M., and Ishizuka, M., 2004. Keyword extraction
from a single document using word co-occurrence sta-
tistical information. Int’l Journal on Artificial Intelli-
gence Tools, 13(1):157–169.
Zhu, X., Goldberg, A. B., et al., 2007. A text-to-picture
synthesis system for augmenting communication. In
Proceedings of the 22nd national conference on Artifi-
cial intelligence, Vol. 2, 1590-1595 2007.
link1: http://en.wikinews.org/wiki/Main_Page.
link2: http://alipr.com/spe/
link3: http://en.wikinews.org/wiki/Los_Angeles_Lakers_
need_to_win_game_six_to_tie_NBA_championship.
IVAPP2015-InternationalConferenceonInformationVisualizationTheoryandApplications
148