Third Step:
Similarity is defined by some functions:
The Jaccard index is a statistic used for
comparing the similarity and diversity of sample
sets.
The Jaccard coefficient measures similarity
between sample sets, and is defined as the size of the
intersection divided by the size of the union of the
sample sets:
(
,
)
=
|⋂|
|⋃|
(10)
We use in addition, term frequency. This count is
usually normalized to prevent a bias towards longer
documents (which may have a higher term count
regardless of the actual importance of that term in
the document) to give a measure of the importance
of the term t
i
within the particular document d
j
. Thus
we have the term frequency, defined as follows:
,
=
,
∑
,
(11)
where n
i,j
is the number of occurrences of the
considered term (t
i
) in document d
j
, and the
denominator is the sum of number of occurrences of
all terms in document d
j
, that is, the size of the
document | d
j
| .
A threshold parameter is used which changes
during evaluation.
In the main system, queries attempt to find
semantic contents such as specific people, objects
and events in a broadcast news collection.
We define the following classes according to
intent of the queries:
1. Find videos of president OBAMA speaking.
2. Find shots of archaeological sight of Carthage.
3. Find shots of the Fukushima earthquake.
Named Person: queries for finding a named
person, possibly with certain actions, e.g., “Find
shots of president OBAMA speaking ".
Named Object: queries for a specific object
with a unique name, which distinguishes this object
from other objects of the same type. For example,
“Find shots of archaeological sights of Carthage".
General Object: queries for a certain type of
objects, such as “Find shots of Fukushima
earthquake ". They refer to a general category of
objects instead of a specific one among them, though
they may be qualified by adjectives or other words.
Our retrieval system needs to go through the
following steps to find relevant multimedia
resources for content-based queries without any user
feedback and manual query expansion.
4 CONCLUSIONS
Metadata provides rich semantic relationships that
can be used for retrieval purposes. This paper has
presented our proposition of a contextual schema for
interlinking multimedia resources semantically via
XML. The goal of this schema is firstly is to be used
in the main multimodal retrieval system; secondly,
to provide more efficiency and recover previously
hidden query result. An initial evaluation of the
algorithm has shown good results.
The next step is to integrate the process of
building these semantic relationships process to
XQuery language, which gives the possibility to add
new relationship over queries. Based on resulting
resources, we could build new relationships that will
be used in second time. Furthermore, we plan to
investigate the weights for different semantic
relations based on their relevance. More
investigations are still present.
REFERENCES
Aouadi, H., Torjmen, M., 2010. Exploitation des liens
pour la recherche d’images dans des documents XML.
In Conférence francophone en Recherche
d’Information et Applications –CORIA.
Beretti, S., Del Bimbo, A., Vicario, E., 2001. Efficient
Matching and Indexing of Graph Models in Content-
Based Retrieval. In IEEE Transactions on Pattern
Analysis and Machine Intelligence - Graph algorithms
and computer vision 23(10): 1089-1105.
Erwin, L., Fabian, A., Dominikus, H., Eelco, H., Jan, H.,
Geert-Jan, H., 2010. A Flexible Rule-Based Method
for Interlinking, Integrating, and Enriching User Data.
In the Proceedings of the 10
th
ICWE 2010, Springer
Verlag, Vienna, Austria, July.
Hassanzadeh, O., Kementsietsidis, A., Lim, L., Miller, R
J., Wang, M., 2009. A framework for semantic link
discovery over relational data. In Proceedings of the
18th ACM Conference on Information and Knowledge
Management, CIKM, Hong Kong, China, November
2-6, 2009, 1027-1036.
Kharrat, M., Jedidi, A., Gargouri, F., 2011. A system
proposal for multimodal retrieval of multimedia
documents. In Parallel and Distributed Processing with
Applications Workshops (ISPAW), 2011 Ninth IEEE
International Symposium on Parallel and Distributed
Processing with Applications- Busan-Korea, pages
177 -182.
Murakami, K., Nichols, E., Mizuno, J., Watanabe, Y.,
Goto, H., Ohki, M., Matsuyoshi, S., Inui, K.,
Matsumoto, Y., 2010. Automatic Classification of
Semantic Relations between Facts and Opinions. In
Proceedings of the Second International Workshop on
KDIR 2011 - International Conference on Knowledge Discovery and Information Retrieval
346