the time-varying characteristics of speech developed
up to now, the symmetric higher order differential
energy function (SHODEO) with a symmetric
structure is known to show superior frequency
estimation performance compared to other methods
(Iem, 2010). It will be used as the basis for
developing fusion software and devices with elderly
voice (Seo, 2015).
2 VOICE ACTIVITY DETECTION
METHODS
2.1 Auto-Correlation Function (ACF)
ACF is an algorithm that extracts the pitch of a
speech signal through a correlation of a specific
signal at one time and at another time, and is defined
as Eq. (1) (Seo, 2015).
D
ACF
m
=
1
N
x
n
x(n+m)
N-1
n=0
(1)
In the Eq. (1), N is a data length, x(n) is a data
value at a specific point n, and x(n + m) is a value
from n to m. For example, when the autocorrelation
function of the speech interval is analyzed for every
256 frames, a waveform with a maximum peak at
256 frames appears, and the point at which this peak
appears is determined as the pitch cycle (Lawfence,
1977).
2.2 Average Magnitude Difference
Function (AMDF)
AMDF is an algorithm that detects the pitch of a
speech signal in the time domain as in the
autocorrelation function, and is defined as follows
(Abdullah-Al-Mamun, 2019) and (Seo, 2006).
D
AMDF
m
=
1
N
|x
n
-x(n+m)
N-1
n=0
| (2)
In Eq. (2), the signal is used as the input signal
of the AMDF as a result of operation between the
original speech signal and a windowing function of
arbitrary length N (Seo, 2006). In case of AMDF,
for example, the minimum peak point of the
waveform within the 256 frame range of the speech
interval is determined as the pitch cycle (Abdullah-
Al-Mamun, 2019) and (Seo, 2006).
2.3 Symmetric Higher Order
Differential Energy Function
(SHODEO)
The instantaneous frequency is defined as the
derivative of the phase of the signal, which is a
function of time (Iem, 2010). In the Eq. (3), The k
th
differential energy function of the continuous signal
is expressed by the following equation (Iem, 2010).
Γ
k
x
n
=x
n
x
n+k-2
-x
n-1
x
n+k-1
(3)
k denotes an arbitrary order, n denotes a sampled
range of the signal, and x(n) denotes a data value
according to the discrete variable n. The higher-
order derivative energy function is expressed by two
mathematical expressions according to arbitrary
order k as follows (Iem, 2010).
Ξ
k
x
n
=
Γ
k
x
n
+Γ
k
x
n-k+2
2
k=odd
Γ
k
x n-
k
2
+1 k=even
(4)
The instantaneous frequency is calculated by the
above Eq. (4) and (5) (Iem, 2010).
f
n
=
1
2π
1
k-1
cos
-1
Ξ
2k-1
x
n
2·Ξ
k
x
n
(5)
k is an arbitrary order, x (n) is a speech data
value at the current time n,
/2∙
is defined as the ratio of the higher order
differential energy function to the degree k.
3 DATABASE
In this paper, author used the voices of ten men and
women in their seventies who were extracted from
the voice database of the elderly distributed by The
Speech Information Technology & Industry
Promotion Center (SiTEC). As shown in Table 2,
five words and two sentences were used as
experimental data. Five words and two sentences
were spoken once for each sex. That is, 20 sentences
and 20 words were used as experimental data. The
data was also sampled at 16 Hz.