PHONETIC-BASED MAPPINGS IN VOICE-DRIVEN SOUND SYNTHESIS
Jordi Janer, Esteban Maestre
2007
Abstract
In voice-driven sound synthesis applications, phonetics convey musical information that might be related to the sound of an imitated musical instrument. Our initial hypothesis is that phonetics are user- and instrument-dependent, but they remain constant for a single subject and instrument. Hence, a user-adapted system is proposed, where mappings depend on how subjects performs musical articulations given a set of examples. The system will consist of, first, a voice imitation segmentation module that automatically determines note-to-note transitions. Second, a classifier determines the type of musical articulation for each transition from a set of phonetic features. For validating our hypothesis, we run an experiment where a number of subjects imitated real instrument recordings with the voice. Instrument recordings consisted of short phrases of sax and violin performed in three grades of musical articulation labeled as: staccato, normal, legato. The results of a supervised training classifier (user-dependent) are compared to a classifier based on heuristic rules (user- independent). Finally, with the previous results we improve the quality of a sample-concatenation synthesizer by selecting the most appropriate samples.
References
- Amatriain, X., Bonada, J., Loscos, A., and Serra, X. (2002). DAFX - Digital Audio Effects, chapter Spectral Processing, pages 373-438. U. Zoelzer ed., J. Wiley & Sons.
- Bonada, J. and Serra, X. (2007). Synthesis of the singing voice by performance sampling and spectral models. IEEE Signal Processing Magazine, 24(2):67-79.
- Egozy, E. B. (1995). Deriving musical control features from a real-time timbre analysis of the clarinet. Master's thesis, Massachusetts Institut of Technology.
- Janer, J. (2005). Voice-controlled plucked bass guitar through two synthesis techniques. In Int. Conf. on New Interfaces for Musical Expression, Vancouver, pages 132-134, Vancouver, Canada.
- Lesaffre, M., Tanghe, K., Martens, G., Moelants, D., Leman, M., Baets, B. D., Meyer, H. D., and Martens, J. (2003). The mami query-by-voice experiment: Collecting and annotating vocal queries for music information retrieval. In Proceedings of the ISMIR 2003, 4th International Conference on Music Information Retrieval, Baltimore.
- Lieberman, P. and Blumstein, S. E. (1986). Speech physiology, speech perception, and acoustic phonetics. Cambridge University Press.
- Lindemann, E. (2007). Music synthesis with reconstructive phrase modeling. IEEE Signal Processing Magazine, 24(2):80-91.
- Maestre, E. and Gómez, E. (2005). Automatic characterization of dynamics and articulation of monophonic expressive recordings. Procedings of the 118th AES Convention.
- Maestre, E., Hazan, A., Ramirez, R., and Perez, A. (2006). Using concatenative synthesis for expressive performance in jazz saxophone. In Proceedings of International Computer Music Conference 2006, New Orleans.
- Sundberg, J. (1994). Musical significance of musicians' syllable choice in improvised nonsense text singing: A preliminary study. Phonetica, 54:132-145.
- Wanderley, M. and Depalle, P. (1999). Interfaces homme - machine et création musicale, chapter Contr oˆle Gestuel de la Synthèse Sonore, pages 145-63. H. Vinet and F. Delalande, Paris: Hermès Science Publishing.
- Widmer, G. and Goebl, W. (2004). Computational models of expressive music performance: The state of the art. 3(33):203-216.
Paper Citation
in Harvard Style
Janer J. and Maestre E. (2007). PHONETIC-BASED MAPPINGS IN VOICE-DRIVEN SOUND SYNTHESIS . In Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007) ISBN 978-989-8111-13-5, pages 109-115. DOI: 10.5220/0002141401090115
in Bibtex Style
@conference{sigmap07,
author={Jordi Janer and Esteban Maestre},
title={PHONETIC-BASED MAPPINGS IN VOICE-DRIVEN SOUND SYNTHESIS},
booktitle={Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)},
year={2007},
pages={109-115},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002141401090115},
isbn={978-989-8111-13-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)
TI - PHONETIC-BASED MAPPINGS IN VOICE-DRIVEN SOUND SYNTHESIS
SN - 978-989-8111-13-5
AU - Janer J.
AU - Maestre E.
PY - 2007
SP - 109
EP - 115
DO - 10.5220/0002141401090115