
groups on Best Practice reports and articles, intel-
lectual property and business models, digital libraries
and archives. ECLAP services and facilities include:
user groups, discussion forums, mailing lists, integra-
tion with other Social Networks, suggestions and rec-
ommendations to users. Content distribution is avail-
able toward several channels: PC/Mac, iPad and Mo-
biles. ECLAP includes smart back office solutions,
for automated ingestion and refactoring of metadata
and content; multilingual indexing and querying, con-
tent and metadata enrichment, Intellectual Property
Rights modeling and assignment tools, content aggre-
gation and annotations, e-learning support.
3 SEARCHING AND INDEXING
TOOLS
The ECLAP content model deals with different types
of digital contents and metadata; at the core of the
content model there is a metadata mapping schema,
used for content indexing of resources in the same in-
dex instance. Resource’s metadata share the same set
of indexing fields, with a separate set for advanced
search purposes. The indexing schema has a flexible
and upgradeable hierarchy, that describes the whole
set of heterogeneous contents. The metadata schema
is divided in 4 categories (see Table 2): Dublin Core
(e.g., title, creator, subject, description), Dublin Core
Terms (e.g., alternative, conformsTo, created, extent),
Technical (e.g., type of content, ProviderID, Provider-
Name, ProviderContentID), Performing Arts (e.g.,
FirstPerformance Place, PerformingArtsGroup, Cast,
Professional), ECLAP Distribution and Thematic
Groups, and Taxonomical content related terms.
Notation used in Table 1, Y
n
: yes with n possible
languages (i.e., n metadata sets); Y : only one meta-
data set; Y /N: metadata set not complete; T : only
title of the metadata set, Y
m
: m different comments
can be provided, each of them in a specific language.
Comments may be annidated, thus producing a hier-
archically organized discussion forum. The ECLAP
Index Model meets the metadata requirements of any
digital content, while the indexing service follows a
metadata ingestion schema. Twenty different partners
are providing their digital contents, each of them with
their custom metadata, partially fulfilling the standard
DC schema. A single multilanguage index has been
developed for faster access, easy management and op-
timization. A fine tuning of term boosting, giving
more relevance to certain fields with respect to oth-
ers, is a major requirement for the system, in order to
achieve an optimal IR performance.
Table 1: ECLAP Indexing Model.
Media Types
DC (ML)
Technical
Performing Arts
Full Text
Tax, Group (ML)
Comments, Tags (ML)
Votes
# of Index Fields
∗
468 10 23 13 26 13 1
Cross Media:
html, MPEG-21,
animations, etc.
Y
n
Y Y Y Y
n
Y
m
Y
n
Info text:
blog, web pages,
events, forum,
comments
T N N N N Y
m
N
Document:
pdf, doc, ePub
Y
n
Y Y Y Y
n
Y
m
Y
Audio, video,
image
Y
n
Y Y N Y
n
Y
m
Y
n
Aggregations:
play lists,
collections,
courses, etc.
Y
n
Y Y Y/N Y
n
Y
m
Y
n
∗
= (# of Fields per Metadata type) ∗ (# of Languages)
ML: Multilingual; DC: Dublin Core; Tax: Taxonomy
4 EFFECTIVENESS AND
OPTIMIZATION
The ECLAP Metadata Schema, summarized in Table
2, consists of 541 metadata fields, divided in 8 cat-
egories; some important multilingual metadata (i.e.,
text, title, body, description, contributor, subject, tax-
onomy, and Performing Arts metadata) are mapped
into a set of 8 catchall fields, for searching purposes.
The scoring system implements a Lucene combina-
tion of Boolean Model and Vector Space Model, with
boosting of terms applied at query time. Documents
matching a clause get their score multiplied by a
weight factor. A boolean clause b, in the weighted
search model, can be defined as
b
:
= (title: q)
w
1
∨ (body: q)
w
2
∨ (description: q)
w
3
∨(sub ject : q)
w
4
∨ (taxonomy: q)
w
5
∨(contributor : q)
w
6
∨ (text : q)
w
7
where w
1
, w
2
, ..., w
7
are the boosting weights of the
query fields; title (DC resource name), body (parsed
html resource content); description (DC account of
the resource content; e.g., abstract, table of contents,
reference), subject (DC topic of the resource content;
e.g., keywords, key phrases, classification codes), tax-
onomy (content associated hierarchy term), contribu-
tor (contributions to the resource content; e.g., per-
sons, organizations, services), text (full text parsed
from resource; e.g. doc, pdf etc.); q is the query; DC:
Dublin Core. The effectiveness of the retrieval sys-
tem was evaluated with the aim of the trec eval tool.
OntheEffectivenessandOptimizationofInformationRetrievalforCrossMediaContent
345