been well researched and commercial equipment ex-
ists that is capable of doing this (Awan and Roy, 2006;
KayPENTAX, 2008). However, the assessment of
Asthenia has been less extensively researched. It is
one of the most difficult components to score and
there is often more discrepancy between SLTs in As-
thenia scoring, than for the other dimensions. This
research is concerned with the objective assessment
of Asthenia (Hirano, 1981).
Patients with Asthenia might be referred to hospi-
tal for treatment. The weakness can caused by a low
intensity of the glottal source sound and is generally
associated with a lack of higher frequency harmonics
(Hirano, 1981). Figure 1 illustrates the methodology
of the approach. To assess a recorded voice signal
for Asthenia, it will be fed into a digital signal pro-
cessing system for extracting voice features such as
energy, pitch frequency variation, harmonic to noise
ratio and others. This followed by a mapping tech-
nique based on machine learning. The voice features
which reflect the lack of energy and higher frequency
harmonics will be extracted from the voice and used
as features by the mapping techniques.
Figure 1: Methodology of the Approach.
2 DATA COLLECTION AND
ASTHENIA SCORING
Voice data has been collected from a random selection
of 46 patients and 56 controls. Only participants that
can read English fluently were included in this study.
All participants were adults between 18 and 70 years
of age, and they were in different stages of their treat-
ment. Information about the participants was stored in
secure files. The sustained acoustic signals were cap-
tured by a high quality Shure SM48 microphone that
was held a constant distance of 20 cm from the lips
and digitized using the KayPentax 4500 CSL Com-
puterized Speech Laboratory (KayPENTAX, 2008).
Each recording consists of two sustained vowels /a/
and /i/ lasting about 10 seconds, a set of six standard
sentences as specified by CAPE-V (Consensus for au-
ditory perception and evaluation) (Kempster et al.,
2009) and about 15 seconds of free unscripted speech.
To assess the voice quality of each participant sub-
jectively according to the GRBAS scale, the voice
samples were scored by three experienced SLTs using
Sennheiser HD205 head-phones. The samples were
played out in random order with 21 randomly cho-
sen samples repeated as a test for consistency. To
facilitate the scoring process, we developed a ‘GR-
BAS Presentation and Scoring Package’ (GPSP) for
collecting GRBAS scores. The graphical user inter-
face presented by this package is shown in Figure 2.
The software is designed to play out in random order,
with appropriate repetition, the voice samples from a
database of recordings. It enables scores to be entered
by the SLT and stored in the data-base as an excel
spread-sheet easily. The SLTs are given the option of
listening to any samples again, and the software can
be paused at any point, without loss of data. The user
may therefore take breaks to prevent tiredness which
may affect the scoring. The scoring of the 102 voice
samples referred to in this paper was completed by
each SLT in two sessions.
Both Pearson correlation and the Cohen’s Kappa
coefficient were used to measure the level of agree-
ment in scoring Asthenia between each pair of SLTs
(Sheskin, 2003; Cohen, 1968). Equation (1) defines
the Pearson correlation (Sheskin, 2003) between the
two dimensions of a sample {(x
i
,y
i
)} containing n
pairs of random variables (x
i
, y
i
) ; ¯x and ¯y are the
sample means of {x
i
} and {y
i
} respectively.
r =
∑
n
i=1
(x
i
− ¯x)(y
i
− ¯y)
p
∑
n
i=1
(x
i
− ¯x)
2
p
∑
n
i=1
(y
i
− ¯y)
2
(1)
The Cohen Kappa coefficient is defined by Equa-
tion (2) where p
o
is the proportion (between 0 and 1)
of subjects for which the two SLTs agree on the scor-
ing, and p
e
is the probability of agreement ‘by chance’
when there is assumed to be no correlation between
the scoring by each SLT (Streiner, 1995; Viera et al.,
2005).
k =
p
o
− p
e
1 − p
e
(2)
Kappa is widely used for comparing raters or
scorers, and reflects any consistent bias in the aver-
age scores for each scorer (Viera et al., 2005) which
would be disregarded by Pearson’s correlation. A
value less than zero indicates no agreement. Values
in the range 0 to 0.2, 0.2 to 0.4, 0.4 to 0.6, 0.6 to 0.8
and 0.8 to 1 indicate slight, fair, moderate, substan-
tial and almost perfect agreement respectively (Viera
et al., 2005)
Weighted Kappa is often more appropriate when
there are more than two possible scores with a sense
of distance between the scores (Cohen, 1968). With
possible scores 0, 1, 2, 3, Kappa only considers
agreement or disagreement between scores, whereas
ObjectiveAssessmentofAstheniausingEnergyandLow-to-HighSpectralRatio
77