PHONETIC-BASED MAPPINGS IN VOICE-DRIVEN SOUND SYNTHESIS

Jordi Janer; Esteban Maestre

doi:10.5220/0002141401090115

PHONETIC-BASED MAPPINGS IN VOICE-DRIVEN SOUND SYNTHESIS

Jordi Janer, Esteban Maestre

2007

Abstract

In voice-driven sound synthesis applications, phonetics convey musical information that might be related to the sound of an imitated musical instrument. Our initial hypothesis is that phonetics are user- and instrument-dependent, but they remain constant for a single subject and instrument. Hence, a user-adapted system is proposed, where mappings depend on how subjects performs musical articulations given a set of examples. The system will consist of, first, a voice imitation segmentation module that automatically determines note-to-note transitions. Second, a classifier determines the type of musical articulation for each transition from a set of phonetic features. For validating our hypothesis, we run an experiment where a number of subjects imitated real instrument recordings with the voice. Instrument recordings consisted of short phrases of sax and violin performed in three grades of musical articulation labeled as: staccato, normal, legato. The results of a supervised training classifier (user-dependent) are compared to a classifier based on heuristic rules (user- independent). Finally, with the previous results we improve the quality of a sample-concatenation synthesizer by selecting the most appropriate samples.

References

Amatriain, X., Bonada, J., Loscos, A., and Serra, X. (2002). DAFX - Digital Audio Effects, chapter Spectral Processing, pages 373-438. U. Zoelzer ed., J. Wiley & Sons.
Bonada, J. and Serra, X. (2007). Synthesis of the singing voice by performance sampling and spectral models. IEEE Signal Processing Magazine, 24(2):67-79.
Egozy, E. B. (1995). Deriving musical control features from a real-time timbre analysis of the clarinet. Master's thesis, Massachusetts Institut of Technology.
Janer, J. (2005). Voice-controlled plucked bass guitar through two synthesis techniques. In Int. Conf. on New Interfaces for Musical Expression, Vancouver, pages 132-134, Vancouver, Canada.
Lesaffre, M., Tanghe, K., Martens, G., Moelants, D., Leman, M., Baets, B. D., Meyer, H. D., and Martens, J. (2003). The mami query-by-voice experiment: Collecting and annotating vocal queries for music information retrieval. In Proceedings of the ISMIR 2003, 4th International Conference on Music Information Retrieval, Baltimore.
Lieberman, P. and Blumstein, S. E. (1986). Speech physiology, speech perception, and acoustic phonetics. Cambridge University Press.
Lindemann, E. (2007). Music synthesis with reconstructive phrase modeling. IEEE Signal Processing Magazine, 24(2):80-91.
Maestre, E. and Gómez, E. (2005). Automatic characterization of dynamics and articulation of monophonic expressive recordings. Procedings of the 118th AES Convention.
Maestre, E., Hazan, A., Ramirez, R., and Perez, A. (2006). Using concatenative synthesis for expressive performance in jazz saxophone. In Proceedings of International Computer Music Conference 2006, New Orleans.
Sundberg, J. (1994). Musical significance of musicians' syllable choice in improvised nonsense text singing: A preliminary study. Phonetica, 54:132-145.
Wanderley, M. and Depalle, P. (1999). Interfaces homme - machine et création musicale, chapter Contr oˆle Gestuel de la Synthèse Sonore, pages 145-63. H. Vinet and F. Delalande, Paris: Hermès Science Publishing.
Widmer, G. and Goebl, W. (2004). Computational models of expressive music performance: The state of the art. 3(33):203-216.

Download

Paper Citation

in Harvard Style

Janer J. and Maestre E. (2007). PHONETIC-BASED MAPPINGS IN VOICE-DRIVEN SOUND SYNTHESIS . In Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007) ISBN 978-989-8111-13-5, pages 109-115. DOI: 10.5220/0002141401090115

in Bibtex Style

@conference{sigmap07,
author={Jordi Janer and Esteban Maestre},
title={PHONETIC-BASED MAPPINGS IN VOICE-DRIVEN SOUND SYNTHESIS},
booktitle={Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)},
year={2007},
pages={109-115},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002141401090115},
isbn={978-989-8111-13-5},
}

in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)
TI - PHONETIC-BASED MAPPINGS IN VOICE-DRIVEN SOUND SYNTHESIS
SN - 978-989-8111-13-5
AU - Janer J.
AU - Maestre E.
PY - 2007
SP - 109
EP - 115
DO - 10.5220/0002141401090115