Pedro Cano, Markus Koppenberger, Sylvain Le Groux, Perfecto Herrera, Julien Ricard, Nicolas Wack


Sound producers create the sound that goes along the image in cinema and video productions, as well as spots and documentaries. Some sounds are recorded for the occasion. Many occasions, however, require the engineer to have access to massive libraries of music and sound effects. Of the three major facets of audio in post-production: music, speech and sound effects, this document focuses on sound effects (Sound FX or SFX). Main professional on-line sound-fx providers offer their collections using standard text-retrieval technologies. Library construction is an error-prone and labor consuming task. Moreover, the ambiguity and informality of natural languages affects the quality of the search. The use of ontologies alleviates some of the ambiguity problems inherent to natural languages, yet it is very complicated to devise and maintain an ontology that account for the level of detail needed in a production-size sound effect management system. To address this problem we use WordNet, an ontology that organizes over 100.000 concepts of real world knowledge: e.g: it relates doors to locks, to wood and to the actions of opening, closing or knocking. However a fundamental issue remains: sounds without caption are invisible to the users. Content-based audio tools offer perceptual ways of navigating the audio collections, like “nd similar sound”, even if unlabeled, or query-byexample, possibly restricting the search to a semantic subspace, such as “vehicles”. The proposed contentbased technologies also allow semi-automatic sound annotation. We describe the integration of semanticallyenhanced management of metadata using WordNet together with content-based methods in a commercial sound effect management system.


