SUBJECT-IN-THE-LOOP VOICE FEEDBACK CONTROL
Dan Necsulescu, William Weiss, Elisha Pruner
University of Ottawa, Department of Mechanical Engineering, Ottawa, Ontario, Canada
necsu@uottawa.ca
Jerzy Sasiadek
Carleton University, Department of Mechanical and Aerospace Engineering, Ottawa, Ontario, Canada
jsas@connect.carleton.ca
Keywords: Signal processing, Voice feedback, Subject-in-the loop, Real-time requirements, Signal processing, Voice
training.
Abstract: A subject-in-the-loop feedback control system is composed of a bioengineering system including a subject,
whose voice is received by a microphone, a computer that achieves the required signal processing of the
sound signal by temporal and/or spectral computations and a speaker or earphones for auditory feedback to
the subject of voice training. Frequency domain modifications of the signal are intended for voice training
of subjects already familiar with traditional voice training. The objective of this paper is to present
alternative methods for the implementation of subject-in-the-loop feedback control systems developed for
voice training. The proposed feedback control scheme is an extension of the traditional control systems;
feedback sensing and the control law are achieved by the human subject as a self-organizing controller. The
experimental set-ups, developed for this purpose, contain programmable digital devices for real time
modifications of the frequency content of the voice signal. The paper presents also a preliminary solution
that satisfies the requirements for real-time operations, in particular that the subject does not perceive the
delay between the sound generation and the auditory reception of the modified sound. The system performs
also spectral calculations for the analyses of the vocal sound signals. Preliminary experimental results
illustrate the operation and the features of the proposed subject-in-the-loop real-time system for voice
training.
1 INTRODUCTION
The purpose of this research work is to develop a
bioengineering experimental set-up with real-time
capability for voice signal processing in a closed
loop configuration. Acoustic loops refer to systems
for signal amplitude increase or decrease of certain
sound frequencies using digital manipulation of
sound samples. The goal of this research is the
design and the construction of computer based
modules for actors’ voice training, as well as, for
singers and public speakers who were already
subjected to traditional voice training. Such an
Audio-Formant Mobility Trainer is an adjunct to
voice-training. The reason for previous training
requirement results from the fact that the subject will
have to produce different voice qualities for which it
is necessary to have acquired a certain mobility of
the bodily parts that produce speech. The device is
intended to facilitate the production of new voice
qualities by increasing the mobility of one’s voice
formants. The Audio-Formant Mobility Trainer is a
module that can perform acoustical the subject’s
voice and band-pass filtering for each of the
formants for the purpose of auditory-feedback. Any
formant can hence be chosen to be manipulated in
order to increase the ability of the subject to perceive
it in his own voice. There are three types of formant
manipulations:
1. Intensity: varying the relative intensity of the
formant bandwidth ranging from filtering it
out to increasing it above the spectral
envelope.
2. Bandwidth: Increasing and decreasing its
width.
3. Pitch: Increasing and decreasing its pitch.
Typically, the learner uses a microphone and
headphones while singing or speaking. His voice is
analyzed and processed through the computer. This
training could be very useful for actors and singers
who are called to produce different voice types. It
may be also helpful for learners of new vowels in
analysis called LTAS (Long Time Average Spectra),
thereby determining the formants that characterize
foreign language. Preliminary research work
(Necsulescu and Weiss et al., 2006, 2005, 2008) led
to the confirmation that complex acoustic
phenomena can be properly simulated for the needs
of designing acoustic
hardware in the form of a
closed loop
experimental set-up for acoustics
analysis. The present stage is the construction of an
experimental set-up with real-time capabilities and
of-line spectral analysis and its testing with human
subjects. The specificity of this closed loop control
system is the presence of the human subject in the
loop in lieu of the traditional physical only system.
The presence of the subject in the control loop
results in interesting new issues for the feedback
control design.
2 AUDITORY FEEDBACK AND
VOICE PRODUCTION
Previous research showed that changing voice
quality by altering the auditory perception of one’s
voice is, to a limited degree, possible. If a person’s
sound production possibilities are enlarged (through
voice training), then altered auditory feedback might
facilitate the generation of different voice qualities
(Necsulescu, Weiss, and Pruner, 2008). The set-up
consists in a subject hearing his voice through
headphones while speaking into a microphone.
However, the process allows a series of digital
manipulations (temporal and spectral) designed to
affect perception while examining the effects on
vocal output. Whereas, the intensity feedback
manipulations have been studied extensively
(Purcell, and Munhall, Vol. 119 2006), (Purcell and
Munhall 120, 2006), spectral changes effects on
voice quality in auditory feedback and their
relationship to voice production are still relatively
unknown. Original proponents of the use of servo
mechanical theory have claimed a direct effect on
the vocal output when modified voice is fed back to
the speaker. Essentially, according to this theory, if
certain bandwidths of the voice spectra are modified
in such a manner as to increase or decrease the
energy in those regions, the person emitting those
sounds will unconsciously react if the modified
voice signal is fed back to his ears. The possibility of
affecting voice output by auditory feedback remains
a topic of intense interest for those involved in
voice, speech and accent training (Weiss, 2006),
(
Necsulescu, Weiss and Pruner, 2008).
Therefore, it would be important to verify the
possibility of bandwidth and formant modification.
It is expected that subjects, who have undergone
voice training, could emulate different spectral
characteristics fed-back through auditory filtering,
by increasing or attenuating energy in particular
spectral zones of their voice.
This work has the long-term goal to carry out
audio-vocal filtering experiments with subjects with
or without vocal training in order to determine
whether voice training could allow for vocal
adjustments in conditions related to filtered auditory
feedback. This paper describes the construction of
computer based module for auditory feedback with
no perceived temporal delay
.
There are many teaching techniques in voice
training, some auditory, some movement based and
some mixed. Independently of the technique, certain
pedagogical approaches are often used. One is
bodily awareness through minimal movements
(Purcell and Munhall, 2006
) an approach having as a
goal an effortless speech-motor learning system. A
variable is introduced and the subject perceives it,
plays with it, explores it, adjusts to it and integrates
it in his own behaviour. This is the purpose of the
Audio-Formant Mobility Trainer, an adjunct to
voice-training when the learner has had already
preliminary training with any traditional technique.
The reason for the need for previous training is that
the subject will have to produce different voice
qualities for which it is necessary to have acquired a
certain control of the mobility of the bodily parts
that produce speech. The purpose of the device is
expected to facilitate the production of new voice
qualities by increasing the mobility of one’s voice
formants.
3 DESCRIPTION OF
EXPERIMENT
Our first experiment tries to ascertain whether it is
possible to teach subjects to vary their fourth
formant (F
4
) at will. Previous research (Purcell, and
Munhall, Vol. 119, 2006
), (Weiss, 2006) has shown
that subjects do it unconsciously when their auditory
feedback is manipulated while uttering vowels. It is
also known (Purcell and Munhall, 2006), that
formant manipulation in pitch and bandwidth
Figure 1: Block diagram of a generic auditory feedback system.
Figure 2: Block diagram of the auditory feedback system in feedback control (regulator) configuration.
Figure 3: Block diagram of the PC based auditory feedback system.
HUMAN
SUBJECT
Voice signal
Microphone
AI
Analog
Input [V]
D
A
Q
Analog
Output [V]
AO
P
C
Mean level [dB] vs.
Frequency [Hz] plot
Headphones
Voice signal
FFT results
Microphone
AI
AO
Computer
Headphones
FFT results
HUMAN
SUBJECT
Voice signal
Controller
+
_
HUMAN
SUBJECT
voice
Microphone
OPTIMUS
Fast Track
Pro Audio Box:
-DAC,
-Earphone
driver and
power
amplifier
USB
AO
P
C
Mean level [dB] vs.
Frequency [Hz]
Headphone
-Audio Technica
ATH-M50
professional studio
monitor headphones
-15 to 28 000 Hz
frequency response
-99dB sensitivity
Filtered sound for
voice input
AI
Off-line
signal analysis
Figure 4: Block diagram of an Altera board based auditory feedback system.
Figure 5: The diagram of the real-time system for signal acquisition from AI, filtering, FFT frequency analysis, display and
headphone signal generation to AO.
changes significantly the perceived voice quality. A
new and useful approach to producing new voice
qualities is important voice training.
4 EXPERIMENTAL SET-UP
The task of the experimental set-up is to test the
subjects in the following auditory feedback
conditions:
a. Filtering spectral envelope of maxima in
the subject speech.
b. Filtering spectral envelope of minima in
the subject’s speech.
c. Low Pass filtering (LP)
d. High Pass filtering Hz (HP)
e. Band Pass filtering.
f. No filtering
The main difficulty until recently was to
achieve real-time capability in auditory feedback
with programmable digital hardware. Some delay in
auditory feedback could not be avoided, but it is
desired to reduce it such that it will not be perceived.
In the proposed experimental set-up, using recent
hardware equipment and advanced software, it was
possible to achieve above requirements.
The block diagram of the complete auditory
feedback system is shown in Figure 1. Figure 2
shows this system in the traditional control system
block diagram form. A human subject carries out in
this case the feedback sensing, the comparator and
the controller (regulator).
Figure 3 shows the block diagram of the PC based
auditory feedback system, while Figure 4 shows
another implementation using a single board
computer, an Altera board.
HUMAN
SUBJECT
Voice signal
Microphone
AI
AO
Altera
board
Headphones
Figure 5 shows the Simulink
R
diagram of the real-
time system for signal acquisition from AI, filtering,
FFT frequency analysis, display and headphone
signal generation to AO (
Necsulescu, Weiss and
Pruner, 2008).
This paper presents the results of testing the PC
based auditory feedback experimental set-up. The
results for an Altera-based system will be presented
in a subsequent paper.
5 PRELIMINARY
EXPERIMENTAL SET-UP
TESTING RESULTS
The experimental setup was tested for verifying its
performance. The current subject, used for
experiments, has had extensive voice training. He
sang for each audio-vocal filtering condition a 60
seconds French song using a neutral vowel.
MATLAB representation of the amplitude versus
time and the calculation of the Long Term Average
Spectra
(LTAS), permits the evaluation of the
effects of voice signal processing (Purcell and
Munhall, 2006).
Figure 6 shows the frequency domain results of the
sound signal in case of no headphones. These results
are post-processed in frequency domain for the
identification of formants.
A visual inspection of the figure allows us to choose
one of the peaks, for example 3300 Hz, for the
recording of the voice produced with auditory
feedback in real time of the formant manipulations.
0 0.5 1 1. 5 2 2. 5 3 3.5 4 4.5 5
-140
-120
-100
-80
-60
-40
-20
0
Frequency (kHz)
Magnitude (dB)
wp 60 seconds singing no headphones
Figure 6: Results for LTAS of the sound signal with no
headphones.
Figure 7 shows formant manipulation of the voice
with LTAS for 500 Hz bandstop filtering.
This figure shows what the subject heard following
signal manipulation.
0 0.5 1 1.5 2 2. 5 3 3.5 4 4.5 5
-140
-120
-100
-80
-60
-40
-20
0
Frequency (kHz)
Magnitude (dB)
wh60 seconds singing bandstop center frequency 3300Hz bandwidt h 500Hz
Figure 7: Formant manipulation of the voice with LTAS
for 500 Hz bandstop filtering.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
-140
-120
-100
-80
-60
-40
-20
0
Frequency (kHz)
Magnitude (dB)
wp60 seconds si nging bandst op cent er frequency 3300Hz bandwidth 500Hz
Figure 8: Recording the effects for the 500 Hz bandstop
filtering.
Bandstop hearing while singing produces a typical 3
zone spectrum: 1. The highest peaks, from
fundamental frequency to 1100 Hz; 2. The second
highest peaks, from 2500 to 3500-4000 Hz; 3. A
Bowl, from 1100 to 2500 Hz. The 500 Hz bandstop
produces a clear peak between 3500-4000 Hz. This
indicates that the bigger the gap the bigger the
compensation. After hearing the signal shown in
Figure 7, the resulting spectral form from Figure 8
appear similar in this case to the one produced while
singing without headphones, shown in Figure 6.
Figure 9 shows the results of the intensified
amplitude of a 500 Hz about 3300 Hz. After hearing
the signal shown in Figure 9, the subject produces
the signal with the spectral content shown in Figure
10.
0 0. 5 1 1. 5 2 2. 5 3 3. 5 4 4. 5 5
-140
-120
-100
-80
-60
-40
-20
0
Frequency (kHz)
Magnitude (dB)
wh60seconds si ngi ng bandpass cent er frequency 3300Hz bandwidt h 500Hz
Figure 9: Formant manipulation of the voice with LTAS
for 500 Hz frequency band amplification.
0 0.5 1 1. 5 2 2. 5 3 3.5 4 4. 5 5
-140
-120
-100
-80
-60
-40
-20
0
Frequency (kHz)
Magnitude (dB)
wp60seconds si nging bandpass center frequency 3300Hz bandwidt h 500Hz
Figure 10: Recording of the effects of the 500 Hz
frequency band amplification.
The spectral form from Figure 10 differs
significantly from the results from Figure 6, for the
case of no headphones. The spectrum is flattened
when compared to the three-zone spectrum of no
headphones condition. This confirms that the real
time signal was reduced to the desired frequency
domain and the audio test based on the system
shown in Figure 2 confirmed subjectively the
validity of this result. This preliminary confirmation
of the significant effects on the subject of signal
processing, is an encouraging result for the prospect
of using in voice training.
The subject produced different voice qualities
unconsciously when given different auditory
feedback conditions. The approach seems useful for
training the voice for different voice qualities.
The results for the Altera board implementation will
be presented in a subsequent paper.
6 CONCLUSIONS
This experimental setup is useful in for the voice
training experiments for selected subjects using:
- Long term average spectra under each condition.
- Processing and storing data in the computer.
- Comparison of spectra under filtering and no
filtering conditions.
- Comparison of experimental [trained] and non-
trained groups.
The proposed auditory feedback experimental
set-up proved to satisfy the requirements of
acquiring signals with a microphone and filtering,
displaying and transmitting the modified signals to
the earphones in real time. Moreover, auditory
signals were FFT processed for the successful
identification of each particular subject formants.
This experimental set-up was deemed appropriate
for purposes of voice training for selected subjects.
REFERENCES
Kinsler, L. E. et al, Fundamentals of Acoustics, J. Wiley,
2000
Kuwabara, H. and Ohgushi, K., “Contributions to vocal
tract resonant frequencies and bandwidths to the
personal perception of speech”. Acustica, 1987, 63,
No. 2. pp. 120-128.
Necsulescu, D., Mechatronics, Prentice Hall, 2002.
Necsulescu, D., Zhang, W., Weiss, W., Sasiadek J.,
“Room Acoustics Measurement System Design using
Simulation”, IMTC 2006 - Instrumentation and
Measurement Technology Conference, Sorrento, Italy
24-27 Apr 2006.
Necsulescu, D., Advanced Mechatronics, World
Scientific, 2009.
Necsulescu, D, Weiss, W., Zhang, W., “Issues Regarding
Hardware in the Loop Experimental Set-up for Room
Acoustics”, Montreal, McGill University, 20th
Canadian Congress of Applied Mechanics, 2005.
Necsulescu, D., Weiss, W., Pruner, E., “Acousto-
Mechatronic System for Voice Training”, Bul. I. P.
Iasi, Tom LIV (LVIII), Fasc. X, 2008, pp. 481 - 487.
Purcell, D. W. and K. Munhall. “Adaptive control of
vowel formant frequency: Evidence from real-time
formant manipulation”. Journal of the Acoustical
Society of America 120(2), (2006), pp. 966-977.
Purcell, D. W. and Munhall, K. “Compensation following
real-time manipulation of formants in isolated
vowels”, J. Acoust. Soc. Am. Vol. 119, Apr. 2006,
2288–2297.
Weiss, W., “Towards a Mobile Voice, Minimal
Movements, Spatialization and Expression”, New
York, Ottawa, Toronto, Legas, 2006.