KNOWLEDGE AND CONTENT-BASED AUDIO RETRIEVAL USING WORDNET

Pedro Cano, Markus Koppenberger, Sylvain Le Groux, Perfecto Herrera, Julien Ricard, Nicolas Wack

Abstract

Sound producers create the sound that goes along the image in cinema and video productions, as well as spots and documentaries. Some sounds are recorded for the occasion. Many occasions, however, require the engineer to have access to massive libraries of music and sound effects. Of the three major facets of audio in post-production: music, speech and sound effects, this document focuses on sound effects (Sound FX or SFX). Main professional on-line sound-fx providers offer their collections using standard text-retrieval technologies. Library construction is an error-prone and labor consuming task. Moreover, the ambiguity and informality of natural languages affects the quality of the search. The use of ontologies alleviates some of the ambiguity problems inherent to natural languages, yet it is very complicated to devise and maintain an ontology that account for the level of detail needed in a production-size sound effect management system. To address this problem we use WordNet, an ontology that organizes over 100.000 concepts of real world knowledge: e.g: it relates doors to locks, to wood and to the actions of opening, closing or knocking. However a fundamental issue remains: sounds without caption are invisible to the users. Content-based audio tools offer perceptual ways of navigating the audio collections, like “nd similar sound”, even if unlabeled, or query-byexample, possibly restricting the search to a semantic subspace, such as “vehicles”. The proposed contentbased technologies also allow semi-automatic sound annotation. We describe the integration of semanticallyenhanced management of metadata using WordNet together with content-based methods in a commercial sound effect management system.

References

  1. Aslandogan, Y. A., Thier, C., Yu, C. T., and nd N. Rishe, J. Z. (1997). Using semantic contents and WordNet in image retrieval. In Proc. of the SIGIR, Philadelphia, PA.
  2. Banerjee, S. and Pedersen, T. (2003). The design, implementation, and use of the Ngram Statistic Package. In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City.
  3. Benitez, A. B., Smith, J. R., and Chang, S.-F. (2000). Medianet: A multimedia information network for knowledge representation. In Proceedings of the SPIE 2000 Conference on Internet Multimedia Management Systems, volume 4210.
  4. Cano, P., Kaltenbrunner, M., Gouyon, F., and Batlle, E. (2002). On the use of FastMap for audio information retrieval. In Proceedings of the International Symposium on Music Information Retrieval, Paris, France.
  5. Cano, P., Koppenberger, M., Groux, S. L., Ricard, J., Herrera, P., and Wack, N. (2004a). Nearest-neighbor generic sound classi cation with a wordnet-based taxonomy. In Proc.116th AES Convention, Berlin, Germany.
  6. Cano, P., Koppenberger, M., Herrera, P., and Celma, O. (2004b). Sound effects taxonomy management in production environments. In Proc. AES 25th Int. Conf., London, UK.
  7. Celma, O. and Mieza, E. (2004). An opera information system based on MPEG-7. In Proc. AES 25th Int. Conf., London, UK.
  8. Flank, S. (July-September 2002). Multimedia technology in context. IEEE Multimedia, pages 12-17.
  9. Herrera, P., Peeters, G., and Dubnov, S. (2003). Automatic classi cation of musical instrument sounds. Journal of New Music Research, 32(1).
  10. Jain, A. K., Duin, R. P., and Mao, J. (2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):4- 37.
  11. Lakatos, S. (2000). A common perceptual space for harmonic and percussive timbres. Perception & Psychoacoustics, (62):1426-1439.
  12. Manjunath, B. S., Salembier, P., and Sikora, T. . (2002). Introduction to MPEG-7. Multimedia Content Description Interface. John Wiley & Sons, LTD.
  13. Miller, G. A. (November 1995). WordNet: A lexical database for english. Communications of the ACM, pages 39-45.
  14. Ricard, J. and Herrera, P. (2004). Morphological sound description: Computational model and usability evaluation. In Proc.116th AES Convention, Berlin, Germany.
  15. Schaeffer, P. (1966). Trait des Objets Musicaux. Editions du Seuil.
  16. Weis, E. (1995). Sync tanks: The art and technique of postproduction sound. Cineaste, 21(1):56.
Download


Paper Citation


in Harvard Style

Cano P., Koppenberger M., Le Groux S., Herrera P., Ricard J. and Wack N. (2004). KNOWLEDGE AND CONTENT-BASED AUDIO RETRIEVAL USING WORDNET . In Proceedings of the First International Conference on E-Business and Telecommunication Networks - Volume 3: ICETE, ISBN 972-8865-15-5, pages 301-308. DOI: 10.5220/0001397503010308


in Bibtex Style

@conference{icete04,
author={Pedro Cano and Markus Koppenberger and Sylvain Le Groux and Perfecto Herrera and Julien Ricard and Nicolas Wack},
title={KNOWLEDGE AND CONTENT-BASED AUDIO RETRIEVAL USING WORDNET},
booktitle={Proceedings of the First International Conference on E-Business and Telecommunication Networks - Volume 3: ICETE,},
year={2004},
pages={301-308},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001397503010308},
isbn={972-8865-15-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on E-Business and Telecommunication Networks - Volume 3: ICETE,
TI - KNOWLEDGE AND CONTENT-BASED AUDIO RETRIEVAL USING WORDNET
SN - 972-8865-15-5
AU - Cano P.
AU - Koppenberger M.
AU - Le Groux S.
AU - Herrera P.
AU - Ricard J.
AU - Wack N.
PY - 2004
SP - 301
EP - 308
DO - 10.5220/0001397503010308