SUBJECT-IN-THE-LOOP VOICE FEEDBACK CONTROL

Dan Necsulescu, William Weiss, Elisha Pruner

University of Ottawa, Department of Mechanical Engineering, Ottawa, Ontario, Canada

necsu@uottawa.ca

Jerzy Sasiadek

Carleton University, Department of Mechanical and Aerospace Engineering, Ottawa, Ontario, Canada

jsas@connect.carleton.ca

Keywords: Signal processing, Voice feedback, Subject-in-the loop, Real-time requirements, Signal processing, Voice

training.

Abstract: A subject-in-the-loop feedback control system is composed of a bioengineering system including a subject,

whose voice is received by a microphone, a computer that achieves the required signal processing of the

sound signal by temporal and/or spectral computations and a speaker or earphones for auditory feedback to

the subject of voice training. Frequency domain modifications of the signal are intended for voice training

of subjects already familiar with traditional voice training. The objective of this paper is to present

alternative methods for the implementation of subject-in-the-loop feedback control systems developed for

voice training. The proposed feedback control scheme is an extension of the traditional control systems;

feedback sensing and the control law are achieved by the human subject as a self-organizing controller. The

experimental set-ups, developed for this purpose, contain programmable digital devices for real time

modifications of the frequency content of the voice signal. The paper presents also a preliminary solution

that satisfies the requirements for real-time operations, in particular that the subject does not perceive the

delay between the sound generation and the auditory reception of the modified sound. The system performs

also spectral calculations for the analyses of the vocal sound signals. Preliminary experimental results

illustrate the operation and the features of the proposed subject-in-the-loop real-time system for voice

training.

1 INTRODUCTION

The purpose of this research work is to develop a

bioengineering experimental set-up with real-time

capability for voice signal processing in a closed

loop configuration. Acoustic loops refer to systems

for signal amplitude increase or decrease of certain

sound frequencies using digital manipulation of

sound samples. The goal of this research is the

design and the construction of computer based

modules for actors’ voice training, as well as, for

singers and public speakers who were already

subjected to traditional voice training. Such an

Audio-Formant Mobility Trainer is an adjunct to

voice-training. The reason for previous training

requirement results from the fact that the subject will

have to produce different voice qualities for which it

is necessary to have acquired a certain mobility of

the bodily parts that produce speech. The device is

intended to facilitate the production of new voice

qualities by increasing the mobility of one’s voice

formants. The Audio-Formant Mobility Trainer is a

module that can perform acoustical the subject’s

voice and band-pass filtering for each of the

formants for the purpose of auditory-feedback. Any

formant can hence be chosen to be manipulated in

order to increase the ability of the subject to perceive

it in his own voice. There are three types of formant

manipulations:

1. Intensity: varying the relative intensity of the

formant bandwidth ranging from filtering it

out to increasing it above the spectral

envelope.

2. Bandwidth: Increasing and decreasing its

width.

3. Pitch: Increasing and decreasing its pitch.

Typically, the learner uses a microphone and

headphones while singing or speaking. His voice is

analyzed and processed through the computer. This

training could be very useful for actors and singers

who are called to produce different voice types. It

may be also helpful for learners of new vowels in

analysis called LTAS (Long Time Average Spectra),

thereby determining the formants that characterize

foreign language. Preliminary research work

(Necsulescu and Weiss et al., 2006, 2005, 2008) led

to the confirmation that complex acoustic

phenomena can be properly simulated for the needs

of designing acoustic

hardware in the form of a

closed loop

experimental set-up for acoustics

analysis. The present stage is the construction of an

experimental set-up with real-time capabilities and

of-line spectral analysis and its testing with human

subjects. The specificity of this closed loop control

system is the presence of the human subject in the

loop in lieu of the traditional physical only system.

The presence of the subject in the control loop

results in interesting new issues for the feedback

control design.

2 AUDITORY FEEDBACK AND

VOICE PRODUCTION

Previous research showed that changing voice

quality by altering the auditory perception of one’s

voice is, to a limited degree, possible. If a person’s

sound production possibilities are enlarged (through

voice training), then altered auditory feedback might

facilitate the generation of different voice qualities

(Necsulescu, Weiss, and Pruner, 2008). The set-up

consists in a subject hearing his voice through

headphones while speaking into a microphone.

However, the process allows a series of digital

manipulations (temporal and spectral) designed to

affect perception while examining the effects on

vocal output. Whereas, the intensity feedback

manipulations have been studied extensively

(Purcell, and Munhall, Vol. 119 2006), (Purcell and

Munhall 120, 2006), spectral changes effects on

voice quality in auditory feedback and their

relationship to voice production are still relatively

unknown. Original proponents of the use of servo

mechanical theory have claimed a direct effect on

the vocal output when modified voice is fed back to

the speaker. Essentially, according to this theory, if

certain bandwidths of the voice spectra are modified

in such a manner as to increase or decrease the

energy in those regions, the person emitting those

sounds will unconsciously react if the modified

voice signal is fed back to his ears. The possibility of

affecting voice output by auditory feedback remains

a topic of intense interest for those involved in

voice, speech and accent training (Weiss, 2006),

(

Necsulescu, Weiss and Pruner, 2008).

Therefore, it would be important to verify the

possibility of bandwidth and formant modification.

It is expected that subjects, who have undergone

voice training, could emulate different spectral

characteristics fed-back through auditory filtering,

by increasing or attenuating energy in particular

spectral zones of their voice.

This work has the long-term goal to carry out

audio-vocal filtering experiments with subjects with

or without vocal training in order to determine

whether voice training could allow for vocal

adjustments in conditions related to filtered auditory

feedback. This paper describes the construction of

computer based module for auditory feedback with

no perceived temporal delay

There are many teaching techniques in voice

training, some auditory, some movement based and

some mixed. Independently of the technique, certain

pedagogical approaches are often used. One is

bodily awareness through minimal movements

(Purcell and Munhall, 2006

) an approach having as a

goal an effortless speech-motor learning system. A

variable is introduced and the subject perceives it,

plays with it, explores it, adjusts to it and integrates

it in his own behaviour. This is the purpose of the

Audio-Formant Mobility Trainer, an adjunct to

voice-training when the learner has had already

preliminary training with any traditional technique.

The reason for the need for previous training is that

the subject will have to produce different voice

qualities for which it is necessary to have acquired a

certain control of the mobility of the bodily parts

that produce speech. The purpose of the device is

expected to facilitate the production of new voice

qualities by increasing the mobility of one’s voice

formants.

3 DESCRIPTION OF

EXPERIMENT

Our first experiment tries to ascertain whether it is

possible to teach subjects to vary their fourth

formant (F

) at will. Previous research (Purcell, and

Munhall, Vol. 119, 2006

), (Weiss, 2006) has shown

that subjects do it unconsciously when their auditory

feedback is manipulated while uttering vowels. It is

also known (Purcell and Munhall, 2006), that

formant manipulation in pitch and bandwidth

Figure 1: Block diagram of a generic auditory feedback system.

Figure 2: Block diagram of the auditory feedback system in feedback control (regulator) configuration.

Figure 3: Block diagram of the PC based auditory feedback system.

HUMAN

SUBJECT

Voice signal

Microphone

Analog

Input [V]

Analog

Output [V]

Mean level [dB] vs.

Frequency [Hz] plot

Headphones

Voice signal

FFT results

Microphone

Computer

Headphones

FFT results

HUMAN

SUBJECT

Voice signal

Controller

HUMAN

SUBJECT

voice

Microphone

OPTIMUS

Fast Track

Pro Audio Box:

-DAC,

-Earphone

driver and

power

amplifier

USB

Mean level [dB] vs.

Frequency [Hz]

Headphone

-Audio Technica

ATH-M50

professional studio

monitor headphones

-15 to 28 000 Hz

frequency response

-99dB sensitivity

Filtered sound for

voice input

Off-line

signal analysis

Figure 4: Block diagram of an Altera board based auditory feedback system.

Figure 5: The diagram of the real-time system for signal acquisition from AI, filtering, FFT frequency analysis, display and

headphone signal generation to AO.

changes significantly the perceived voice quality. A

new and useful approach to producing new voice

qualities is important voice training.

4 EXPERIMENTAL SET-UP

The task of the experimental set-up is to test the

subjects in the following auditory feedback

conditions:

a. Filtering spectral envelope of maxima in

the subject speech.

b. Filtering spectral envelope of minima in

the subject’s speech.

c. Low Pass filtering (LP)

d. High Pass filtering Hz (HP)

e. Band Pass filtering.

f. No filtering

The main difficulty until recently was to

achieve real-time capability in auditory feedback

with programmable digital hardware. Some delay in

auditory feedback could not be avoided, but it is

desired to reduce it such that it will not be perceived.

In the proposed experimental set-up, using recent

hardware equipment and advanced software, it was

possible to achieve above requirements.

The block diagram of the complete auditory

feedback system is shown in Figure 1. Figure 2

shows this system in the traditional control system

block diagram form. A human subject carries out in

this case the feedback sensing, the comparator and

the controller (regulator).

Figure 3 shows the block diagram of the PC based

auditory feedback system, while Figure 4 shows

another implementation using a single board

computer, an Altera board.

HUMAN

SUBJECT

Voice signal

Microphone

Altera

board

Headphones

Figure 5 shows the Simulink

diagram of the real-

time system for signal acquisition from AI, filtering,

FFT frequency analysis, display and headphone

signal generation to AO (

Necsulescu, Weiss and

Pruner, 2008).

This paper presents the results of testing the PC

based auditory feedback experimental set-up. The

results for an Altera-based system will be presented

in a subsequent paper.

5 PRELIMINARY

EXPERIMENTAL SET-UP

TESTING RESULTS

The experimental setup was tested for verifying its

performance. The current subject, used for

experiments, has had extensive voice training. He

sang for each audio-vocal filtering condition a 60

seconds French song using a neutral vowel.

MATLAB representation of the amplitude versus

time and the calculation of the Long Term Average

Spectra

(LTAS), permits the evaluation of the

effects of voice signal processing (Purcell and

Munhall, 2006).

Figure 6 shows the frequency domain results of the

sound signal in case of no headphones. These results

are post-processed in frequency domain for the

identification of formants.

A visual inspection of the figure allows us to choose

one of the peaks, for example 3300 Hz, for the

recording of the voice produced with auditory

feedback in real time of the formant manipulations.

0 0.5 1 1. 5 2 2. 5 3 3.5 4 4.5 5

-140

-120

-100

-80

-60

-40

-20

Frequency (kHz)

Magnitude (dB)

wp 60 seconds singing no headphones

Figure 6: Results for LTAS of the sound signal with no

headphones.

Figure 7 shows formant manipulation of the voice

with LTAS for 500 Hz bandstop filtering.

This figure shows what the subject heard following

signal manipulation.

0 0.5 1 1.5 2 2. 5 3 3.5 4 4.5 5

-140

-120

-100

-80

-60

-40

-20

Frequency (kHz)

Magnitude (dB)

wh60 seconds singing bandstop center frequency 3300Hz bandwidt h 500Hz

Figure 7: Formant manipulation of the voice with LTAS

for 500 Hz bandstop filtering.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

-140

-120

-100

-80

-60

-40

-20

Frequency (kHz)

Magnitude (dB)

wp60 seconds si nging bandst op cent er frequency 3300Hz bandwidth 500Hz

Figure 8: Recording the effects for the 500 Hz bandstop

filtering.

Bandstop hearing while singing produces a typical 3

zone spectrum: 1. The highest peaks, from

fundamental frequency to 1100 Hz; 2. The second

highest peaks, from 2500 to 3500-4000 Hz; 3. A

Bowl, from 1100 to 2500 Hz. The 500 Hz bandstop

produces a clear peak between 3500-4000 Hz. This

indicates that the bigger the gap the bigger the

compensation. After hearing the signal shown in

Figure 7, the resulting spectral form from Figure 8

appear similar in this case to the one produced while

singing without headphones, shown in Figure 6.

Figure 9 shows the results of the intensified

amplitude of a 500 Hz about 3300 Hz. After hearing

the signal shown in Figure 9, the subject produces

the signal with the spectral content shown in Figure

10.

0 0. 5 1 1. 5 2 2. 5 3 3. 5 4 4. 5 5

-140

-120

-100

-80

-60

-40

-20

Frequency (kHz)

Magnitude (dB)

wh60seconds si ngi ng bandpass cent er frequency 3300Hz bandwidt h 500Hz

Figure 9: Formant manipulation of the voice with LTAS

for 500 Hz frequency band amplification.

0 0.5 1 1. 5 2 2. 5 3 3.5 4 4. 5 5

-140

-120

-100

-80

-60

-40

-20

Frequency (kHz)

Magnitude (dB)

wp60seconds si nging bandpass center frequency 3300Hz bandwidt h 500Hz

Figure 10: Recording of the effects of the 500 Hz

frequency band amplification.

The spectral form from Figure 10 differs

significantly from the results from Figure 6, for the

case of no headphones. The spectrum is flattened

when compared to the three-zone spectrum of no

headphones condition. This confirms that the real

time signal was reduced to the desired frequency

domain and the audio test based on the system

shown in Figure 2 confirmed subjectively the

validity of this result. This preliminary confirmation

of the significant effects on the subject of signal

processing, is an encouraging result for the prospect

of using in voice training.

The subject produced different voice qualities

unconsciously when given different auditory

feedback conditions. The approach seems useful for

training the voice for different voice qualities.

The results for the Altera board implementation will

be presented in a subsequent paper.

6 CONCLUSIONS

This experimental setup is useful in for the voice

training experiments for selected subjects using:

- Long term average spectra under each condition.

- Processing and storing data in the computer.

- Comparison of spectra under filtering and no

filtering conditions.

- Comparison of experimental [trained] and non-

trained groups.

The proposed auditory feedback experimental

set-up proved to satisfy the requirements of

acquiring signals with a microphone and filtering,

displaying and transmitting the modified signals to

the earphones in real time. Moreover, auditory

signals were FFT processed for the successful

identification of each particular subject formants.

This experimental set-up was deemed appropriate

for purposes of voice training for selected subjects.

REFERENCES

Kinsler, L. E. et al, Fundamentals of Acoustics, J. Wiley,

2000

Kuwabara, H. and Ohgushi, K., “Contributions to vocal

tract resonant frequencies and bandwidths to the

personal perception of speech”. Acustica, 1987, 63,

No. 2. pp. 120-128.

Necsulescu, D., Mechatronics, Prentice Hall, 2002.

Necsulescu, D., Zhang, W., Weiss, W., Sasiadek J.,

“Room Acoustics Measurement System Design using

Simulation”, IMTC 2006 - Instrumentation and

Measurement Technology Conference, Sorrento, Italy

24-27 Apr 2006.

Necsulescu, D., Advanced Mechatronics, World

Scientific, 2009.

Necsulescu, D, Weiss, W., Zhang, W., “Issues Regarding

Hardware in the Loop Experimental Set-up for Room

Acoustics”, Montreal, McGill University, 20th

Canadian Congress of Applied Mechanics, 2005.

Necsulescu, D., Weiss, W., Pruner, E., “Acousto-

Mechatronic System for Voice Training”, Bul. I. P.

Iasi, Tom LIV (LVIII), Fasc. X, 2008, pp. 481 - 487.

Purcell, D. W. and K. Munhall. “Adaptive control of

vowel formant frequency: Evidence from real-time

formant manipulation”. Journal of the Acoustical

Society of America 120(2), (2006), pp. 966-977.

Purcell, D. W. and Munhall, K. “Compensation following

real-time manipulation of formants in isolated

vowels”, J. Acoust. Soc. Am. Vol. 119, Apr. 2006,

2288–2297.

Weiss, W., “Towards a Mobile Voice, Minimal

Movements, Spatialization and Expression”, New

York, Ottawa, Toronto, Legas, 2006.