
content-based audio perspective, both from the
academia and the industry (see Section 2). However
none seems to have impacted in professional sound
effects management systems. Another source of
problems is due to the imprecision and ambiguity
of natural languages. Natural languages present
polysemy—“bike” can mean both “bicycle” and
“motorcycle”—and synonymy—both “elevator” and
“lift” refer to the same concept. This, together with
the difficulty associated to describing sounds with
words, affects the quality of the search. The user has
to guess how the librarian has labeled the sounds and
either too many or too few results are returned.
In this context we present a SFX retrieval system
that incorporates content-based audio techniques and
semantic knowledge tools implemented on top of one
of the biggest sound effects providers database. The
rest of the paper is organized as follows: in Section 2
we review what what existing literature proposes to
improve sound effect management. From Section 3
to 5 we describe the implemented enhancements of
the system.
2 RELATED WORK
Related work to sound effect management falls into
three categories: Content-based audio technologies,
approaches to describe sound events and taxonomy
management.
2.1 Content-based audio
classification and retrieval
Content-based functionalities aim at finding new
ways of querying and browsing audio documents as
well as automatic generating of metadata, mainly
via classification. Query-by-example and simi-
larity measures that allow perceptual browsing of
an audio collection is addressed in the literature
and exist in commercial products, see for instance:
www.findsounds.com, www.soundfisher.com.
Existing classification methods normally concen-
trate on small domains, such as musical instrument
classification or very simplified sound effects tax-
onomies. Classification methods cannot currently of-
fer the detail needed in commercial sound effects
management, e.g: “female steps on wood, fast”. In
audio classification, researchers normally assume the
existence or define a well defined hierarchical classifi-
cation scheme of a few categories (less than a hundred
at the leaves of the tree). On-line sound effects and
music sample providers have several thousand cate-
gories. For further discussion on classification of gen-
eral sound, we refer to (Cano et al., 2004a).
2.2 Description of Audio
Sounds are multifaceted, multirepresentional and usu-
ally difficult to describe in words. MPEG-7 offers
a framework for the description of multimedia doc-
uments, see (Manjunath et al., 2002). MPEG-7 con-
tent semantic description tools describe the actions,
objects and context of a scene. In sound effects, this
correlates to the physical production of the sound in
the real world, “moo cow solo”, or the context, “Air-
port atmos announcer”.
MPEG-7 content structure tools concentrate on the
spatial, temporal and media source structure of mul-
timedia content. Indeed, important descriptors are
those that describe the perceptual qualities indepen-
dently of the source and how they are structured on
a mix. Since they refer to the properties of sound,
e.g: Loudness, brightness. Other important search-
able metadata are post-production specific descrip-
tions, e.g.: horror, comic or science-fiction. Creation
metadata describe how the sound was recorded. For
example, to record a car door closing one can place
the microphone in the interior or in the exterior. Some
examples of such descriptors are: interior, exterior,
close-up, live recording, programmed sound, studio
sound, treated sound. For a more complete review on
SFX description, we refer to (Cano et al., 2004b).
Figure 1: Snapshot of the vehicle taxonomy in WordNet
2.0. Only the hypernym type of relation is displayed.
2.3 Taxonomy Management
The use of taxonomies or classification schemes al-
leviates some of the ambiguity problems inherent
to natural languages, yet they pose others. It is
very complicated to devise and maintain classification
schemes that account for the level of detail needed in
a production-size sound effect management system.
ICETE 2004 - WIRELESS COMMUNICATION SYSTEMS AND NETWORKS
302