tivation/termination signal. The time delay between
EMG signal and speech start/stop was measured and
it was confirmed that EMG signals can be used to con-
trol on/off signals for the EL device.
Also (Pineda-Rico et al., 2008) picks up the EMG
based SND approach. They implemented a switching
capacitor CMOS based device. For activation and ter-
mination the same method as in (Heaton et al., 2011)
was taken: amplified, rectified and low-pass filtered
(c
f
= 3 Hz) envelope and single threshold imple-
mented as voltage comparator. Their focus was on the
implementation and on the advantages of switching
capacitor circuits which are: excellent time constants,
relative precision, simple design elements, minimum
power waste and reduced size on chip.
In this work we developed an EMG signal acqui-
sition hardware to capture EMG signals and recorded
a database consisting of EL speech and EMG signals.
We employed different strategies to smooth the EMG
envelope and developed a threshold based method
(single and double) and a statistical method (GMMs)
to detect voice activity and evaluate its performance.
2 METHODS
2.1 Data Acquisition Hardware
We developed a data acquisition hardware in order to
reduce costs and size. The requirements for the bio-
signal acquisition system were to be small, battery-
operated and real-time capable. It consists of three
main parts: the sensor straps, the bio-signal shield and
an ARDUINO
c
DUE micro-controller board. The
board serves as a host for the connected strap and the
shield (see Figure 1).
The strap is designed to be worn around the neck
to ensure correct electrode position at the surface of
the sternohyoid muscle. This muscle is a long, thin
muscle which is located along the length of the front
of the human neck. The functions of this muscle in-
clude, depression of the hyoid bone, head and neck
movement, and speech. This position is often used
in Automatic Speech Recognition and the relation be-
tween muscle movement and fundamental frequency
was confirmed (Ooe et al., 2010). The strap holds
three silver/silver-chlorideelectrodes, two of them are
used to detect the EMG signal, the third one serves as
a reference electrode to improve the common-mode
rejection ratio. The strap is connected to the instru-
mentation amplifier which is followed by an opera-
tional amplifier. The gain of this amplifier can be
modified manually. After a low-pass filter where high
frequency noise is suppressed, the positive and neg-
ative half-wave are split and fed to two discrete ana-
log inputs of the micro-controller. Using this method,
a higher bit resolution (i.e. 13 bit) of the digitized
signal amplitude can be achieved. Then the signal
is converted from analog to digital. In the following
experiments the micro-controller board is connected
to the computer via USB which serves to power the
shield via the micro-controller board (5V). The sam-
pling rate f
s
of the ARDUINO
c
DUE ADC is set to 8
kHz. This is enough as most of the frequency content
of EMG signals is between 0 and 500 Hz.
The authors are aware that in a real-world appli-
cation, the algorithms need to be implemented on a
DSP and the power supply and the hardware needs to
be summarized such as it can be worn on the body,
e.g. in a pocket.
2.2 Recorded Database
To evaluate different approaches for SND a database
was recorded and simulations were done off-line us-
ing the recordings.
We used around 100 phonetically balanced speech
utterances of a female and a male speaker. The skin
surface EMG sensor are positioned around the neck
and are attached to our processing hardware. EMG
and speech signals are recorded using both, the bio-
signal shield connected to an audio interface (RME
Fireface 800), and a head-mounted microphone AKG
HC 577L with omni-directional pickup pattern. The
audio interface ensures a high quality digital signal.
The sampling rate of the audio interface was set to
44100 Hz. Compared to the used sound card (24 bit),
the micro-controller system is able to convert the in-
put signal with a resolution of 13 bit. This is enough
to perform all processing steps which are proposed
in this work without drawbacks in respect to signal
detection. We analyzed the recordings manually and
annotated speech and non-speech sections in order to
obtain a ground truth.
All in all we recorded 18min45s of data. The
mean signal-to-noise ratio (SNR) for the male EMG
signals was 16.7 dB and for the female 12.6 dB. For
SNR calculations we used first order IIR smoothing.
This difference in SNR will also influence the thresh-
olds for SND. The main energy of the EMG signal is
between 0 Hz and 500 Hz, in fact, over 90% of the
energy can be found in this range. The ratio of speech
to non-speech in the database was 63% to 37%.
Speech/Non-SpeechDetectionforElectro-LarynxSpeechUsingEMG
139