automatically select the most relevant images from the uploaded set for each of the
written paragraphs of the blog. Technically, this is done in two steps. First the system
automatically adds metadata to the traveler’s photos exploiting web repositories (shared
and tagged photo repositories, other travel blogs, wikipedia, etc). Metadata have some
textual form or, at least, features that make it comparable to textual data. Then, it uses
these metadata to measure the similarity between the images and the given text.
In the first step, the system adds predefined keywords – related to a given set of
visual categories (people, car, animal, ...) – to the metadata using a Generic Visual
Categorizer (section 3.1). Secondly, based on the image features obtained by the cate-
gorizer, it uses a CBIR (Content Based Image Retrieval) technique to retrieve images
similar (section 3.1) to the image to which it wants to add further metadata. Processing
the aggregate of textual parts corresponding to the retrieved images allows the extrac-
tion of relevant concepts and topics (section 3.3). These emerging concepts and topics
will be the “textual” keywords enriching the image metadata.
In the second step, we measure the similarity between the enriched metadata and
a particular piece of text (section 3.2), which in our case is the written paragraph of
the blog. According to this similarity measure, the images are ranked and the system
is able to propose the most relevant images for a given paragraph, so that the traveler
can choose what he thinks is the more appropriate for the final version. Furthermore,
as it is the most costly part, the metadata extraction and indexation can be done offline;
the computation of the image relevance scores with respect to the paragraphs (which
has almost no cost with pre-prepared, indexed data) will be done on-line. So the system
can propose the set of images for selection as soon as a paragraph is finished, which is
very user friendly. Finally, another advantage of the system is obviously the metadata
information added to each image. Indeed, the traveler can keep them and use them for
further organization and retrieval in his private repository of pictures.
In order to illustrate the different steps and the performance of the TBAS system, we
designed a prototype using a relatively small database (compared to the data we can get
on the Web). We used as multi-media data repository the [2] as it contains travel images
with additional textual information such as title, location and description. To obtain re-
alistic traveler data, we downloaded a large set of images from the online photo sharing
site Flickr [3]. For the travel text we collected blog paragraphs from two travel blog
sites [4,5]. In order to ensure the semantic correlation between images and blog texts,
we used city names of two different travel destination (Peru and Brasil) as search tags
to gather the images and blog texts.
The technology developed and presented here has potential beyond travel blogs. We
used the Travel Blog Assistant System just as an illustrative example of a more general
problem, which is complementing text with images, or vice-versa. Clearly, the same
methodology can be used by professional users in the fields of multimedia document
generation and automatic illustration and captioning, such as e.g. graphical designers,
illustrators, journalists, pedagogical writers, etc.
9191