
- we had to represent the content of the images of 
the Web. We extracted information from the Web 
pages containing the retrieved images, i.e. metadata, 
and we modelled this content as a Semantic Space, 
similarly to the text case. 
- we needed to compare the two Semantic Space.  
For that, we defined a similarity measure which 
indicates how much the two spaces relates to the 
same semantic meaning.  
- we needed to re-rank the list of retrieved images. 
We propose to extract the words from the common 
subspace of the two semantic spaces. Images are 
ranked on the basis of the number of tags they share 
in the common sub-space. 
Results are very satisfactory, and impressive if 
compared to those obtained with SPE. The main 
difference between the two methods is the image 
dataset. SPE uses a personal collection of photos, 
annotated by hand, hence is very limited by the 
number of images and by the concepts represented 
in its dataset. We use Google Image to create, dy-
namically at each query, a new image dataset to 
work within. Thus we exploit the knowledge of the 
Web, increasing the chances to find images relevant 
to the text content.  
Furthermore our method has been designed to be 
multi-language, as it can be easily extended to other 
language if the proper list of stop-words is created. 
We also developed a Web implementation of the 
proposed method, as a new service to Internet users.  
We are confident that our solution will interest 
news or advertising agencies, newspapers websites, 
bloggers or in general all the users who search for 
information into the Web. 
At last, the “core” of our system is general pur-
pose, and can be used to compare texts, HTML pag-
es, and all types of annotated document, that can be 
retrieved from the Web. That is, we can use Google 
Search (Youtube, Wikipedia, etc.) instead of Google 
Image, to query for any type of tagged contents 
which can be useful to describe an input text. 
REFERENCES 
Barnard, K., Duygulu, P., et al., 2003. Matching words 
and pictures. JMLR, 3:1107–1135. 
Barnard, K., and Forsyth, D., 2001. Learning the Seman-
tics of Words and Pictures. Proc. International Con-
ference on Computer Vision, pp. II: 408-415, 2001. 
Carney, R. N., and Levin, J. R., 2002, “Pictorial illustra-
tions still improve students' learning from text”, Edu-
cational Psychology Review, 2002, 14(1), 5-26. 
Carneiro, G., Chan, A., et al., 2007. Supervised learning of 
semantic classes for image annotation and retrieval. 
IEEE Transactions on Pattern Analysis and Machine 
Intelligence, 29(3):394–410. 
Carney, R. N., and Levin, J. R., 2002. Pictorial illustra-
tions still improve students' learning from text. Educa-
tional Psychology Review, 14(1), 5-26. 
Coelho, F., and Ribeiro, C., 2011, Automatic illustration 
with cross-media retrieval in large-scale collections. In 
Content-Based Multimedia Indexing (CBMI), 2011 
9th International Workshop on (pp. 25-30). IEEE. 
Coyne, B., and Sproat, R., 2001. Wordseye: An automatic 
text-to-scene conversion system. In Proceedings of the 
28th Annual Conference on Computer Graphics and 
Interactive Techniques. SIGGRAPH 2001, 487–496. 
Deerwester, S., Dumais, S. T., et al., 1990. Indexing by 
latent semantic analysis. Journal of the American So-
ciety For Information Science, 41(6), 391-407. 
Delgado, D., Magalhaes, J., & Correia, N., 2010. Auto-
mated illustration of news stories. In Semantic Compu-
ting (ICSC), 2010 IEEE Fourth International Confer-
ence on (pp. 73-78). IEEE. 
Feng, Y., and Lapata, M., 2010. Topic models for image 
annotation and text illustration. In Proceedings of the 
NAACL HLT. Association for Computational Linguis-
tics, Los Angeles, California, pages 831–839. 
Feng, S., Manmatha, R, and Lavrenko, V., 2004. Multiple 
bernoulli relevance models for image and video anno-
tation. In CVPR, volume 2(2004), pp. 1002-1009. 
Joshi, D., Wang, J .Z., and Li, J., 2006. The story picturing 
engine—a system for automatic text illustration. ACM 
Transactions on Mul-timedia Computing, Communica-
tions, and Applications, 2(1):68–89. 
Kandola, J. S., Shawe-Taylor, J., and Cristianini, N., 2003. 
Learning semantic similarity, In Neural Information 
Processing Systems 15 (NIPS 15), pp. 657-664. 
Lowe, W., 2001. Towards a theory of semantic space. In 
Proceedings of the Twenty-Third Annual Conference 
of the Cognitive Science Society 2001 (pp. 576-581). 
Mahwah, NJ: Erlbaum. 
Miller, G. 1990.  WordNet: An on-line lexical database. 
Int. Journal of Lexicography, Special Issue, 3(4). 
Monay, F., and Gatica-Perez, D., 2007.  Modeling seman-
tic aspects for cross-media image indexing. IEEE 
Transactions on Pattern Analysis and Machine Intelli-
gence, 29(10):1802–1817. 
Rasiwasia, N., Pereira, J. C., et al. 2010. A new approach 
to cross-modal multimedia retrieval. In Proceedings of 
the International Conference on Multimedia (MM 
'10), 251-260. 
Yutaka, M., and Ishizuka, M., 2004. Keyword extraction 
from a single document using word co-occurrence sta-
tistical information.  Int’l Journal on Artificial Intelli-
gence Tools, 13(1):157–169. 
Zhu, X., Goldberg, A. B., et al., 2007. A text-to-picture 
synthesis system for augmenting communication. In 
Proceedings of the 22nd national conference on Artifi-
cial intelligence, Vol. 2, 1590-1595 2007. 
link1: http://en.wikinews.org/wiki/Main_Page. 
link2: http://alipr.com/spe/ 
link3: http://en.wikinews.org/wiki/Los_Angeles_Lakers_ 
need_to_win_game_six_to_tie_NBA_championship. 
IVAPP2015-InternationalConferenceonInformationVisualizationTheoryandApplications
148