k-NN-set (using both the EMD and the 1 − 1 docu-
ment distance), k-NN-imgIdx (using EMD), and Sky-
set according to 4 different performance metrics, as
described in Sect. 4. It is worth noting that k-NN-
imgIdx performs the worst among considered algo-
rithms: this might sound strange at first, since only
k sorted accesses to the document index are needed
and no computation is done outside of the index it-
self, but this is not enough to compensate for the very
high number of document distances that are computed
within the index.
4
Again, the library classes already
contain the code for obtaining this important result,
demonstrating that, when dealing with complex doc-
uments, a simplistic approach is not always the best
one, and several alternatives should be taken into ac-
count to find out the best combination of efficiency
and effectiveness.
0%
20%
40%
60%
80%
100%
120%
doc. distances elem. distances sorted
accesses
time
k-NN-set (1-1) k-NN-set (EMD) k-NN-imgIdx Sky-set
Figure 7: Efficiency of the query processing index-based al-
gorithms: k-NN-set using the EMD and the 1− 1 document
distances, k-NN-imgIdx using EMD and Sky-set (graphs are
normalized to the maximum values so as to emphasize rel-
ative performance).
7 CONCLUSIONS
We have presented the WINDSURF library for the
management of complex (hierarchical) multimedia
data, with the goal of providing tools for their effi-
cient retrieval. The library was designed with the aim
of generality and extensibility, so as to be applicable
to a wide range of multimedia scenarios that fit its
similarity-based retrieval model. Due to the inher-
ent complexity of multimedia data, we designed the
WINDSURF retrieval model to include all the differ-
4
We note here that k-NN-set computes document dis-
tances outside of the index, only for those documents that
are retrieved under sorted access. On the other hand,
Sky-set does not compute any document distance, but has
nonetheless to compare documents fordomination: in Fig. 7
each of such comparisons is computed as a document dis-
tance, in order to compare algorithms on a fair basis.
ent facets introduced by the hierarchical nature of the
data (for example, how documents are characterized,
how they are split into component elements, how ele-
ments are to be compared, how similarities at the el-
ement level are to be aggregated, and so on). Such
facets can be instantiated in several alternative ways
(each choice possibly giving different results) and an
user may want to compare the performance of such
alternatives in the scenario at her hand: we believe
that the use of the WINDSURF library could help in
abstracting away the details of generic query process-
ing algorithms, since the above-mentioned facets can
be realized by simply implementing abstract classes
of the library. We are currently working in extending
the library with new query processing algorithms and
to incorporate other scenarios (e.g., videos (Bartolini,
Patella, and Romani 2010)) as instances of the library
available for downloading. Moreover, a current lim-
itation of the WINDSURF retrieval model is that ele-
ments of a document are all of a same type: we plan
to extend the model to consider elements of different
types, so that only elements of the same type can be
compared. For example, if we consider a multimedia
document composed of textual sections and images,
it makes sense to only compare text with text and im-
ages with images. Another important application of
this concept is the use of cross-domain information to
improve the retrieval of a given type of content, for
example, exploiting surrounding text and/or links ex-
isting to other documents (`a la PageRank) to boost
image/video retrieval.
REFERENCES
Ardizzoni, S., Bartolini, I., Patella, M. Windsurf: Region-
based image retrieval using wavelets. In: IWOSS’99.
pp. 167–173. Florence, Italy (Sep 1999).
Bartolini, I., Ciaccia, P., Oria, V.,
¨
Ozsu, T. Flexible integra-
tion of multimedia sub-queries with qualitative pref-
erences. Multimedia Tools and Applications, 33(3),
275–300 (June 2007).
Bartolini, I., Ciaccia, P., Patella, M. Query processing is-
sues in region-based image databases. Knowledge and
Information Systems, 25(2), 389–420 (Nov 2010).
Bartolini, I., Patella, M., and Romani, C. SHIATSU:
Semantic-Based Hierarchical Automatic Tagging of
Videos by Segmentation using Cuts. In AIEMPro
2010. Florence, Italy, (Sep 2010).
Ch´avez, E., Navarro, G., Baeza-Yates, R., Marroqu´ın, J. L.
Proximity searching in metric spaces. ACM Comput-
ing Surveys, 33(3), 273–321 (Sep 2001).
Ciaccia, P., Patella, M., Zezula, P. M-tree: An efficient ac-
cess method for similarity search in metric spaces. In:
VLDB’97. pp. 426–435. Athens, Greece (Aug 1997).
Fei-Fei, L., Fergus, R., and Torralba, A. Recognizing and
THE WINDSURF LIBRARY FOR THE EFFICIENT RETRIEVAL OF MULTIMEDIA HIERARCHICAL DATA
147