heart rate sequence, frequency spectrum of heart rate
sequence, kurtosis, cyclostationarity, power spectral
density and power spectral density of heart rate
sequence) has been used, that lead to classification
accuracy of 83.6%. However, this high dimension of
features vectors can reduce the performance of the
classification system of PCG signals in terms of
complexity (memory space, calculus time) and
probably in accuracy. The principal aim of the present
work is to reduce this high dimension using feature
selection algorithms for a lower complex system with
possible higher classification accuracy. Two principal
features selection approaches have been used in the
states of arts. The first “wrapper” approach is applied
in low dimension and uses the classifiers to measure
the relevancy of features. Conversely, the second
'filter' approach, which is independent of classifiers, is
generally applied in high dimension and uses the
information provided by the features to explain the
classes. Hence, in our work, we use a “filter” approach
based on the criterion of mutual information
maximisation of the selected features. Several
heuristic strategies of feature selection based on
mutual information will be applied to select the
relevant local and global features (S1, Systole, S2 and
Diastole) extracted from the previous set of multi-
domains features. In order to validate the importance
of this features selection, we propose to evaluate the
performance of the classification system using k-NN
classifier using the cross-validation with 5 folds
applied on the same database used in (Tang, Chen, Li,
& Zhong, 2016).
The organization of this study is as follows.
Section 2 shows the suggested approaches for PCG
signals classification and the related work, i.e.
extracting feature vectors and choosing the relevant
features for the classification task. Section 3 describes
the proposed classification system, which adds the
feature selection step. The experiments and their
findings are presented in Section 4. Section 5
concludes the paper with some ideas for further
research.
2 RELATED WORK
Several studies on heart sound classification have
used the pattern recognition approach for the task of
cardiovascular diseases diagnosis (Barschdorff,
Bothe, & Rengshausen, 1989) (Ali, et al., 2019)
(Whitaker, Suresha, Liu, Clifford, & Anderson,
2017). This approach requires three steps: pre-
processing and segmentation, features extraction with
optional step of features selection, and classification.
Firstly, the phonocardiogram (PCG) is pre-processed
and segmented in local regions (S1, Sys, S2, Dias).
Then, the features of each PCG recording are
extracted. Finally, the features are fed into the
designed model to classify normal and abnormal heart
sound. As a result, traditional classification system
for heart sounds includes the steps listed below. The
conception of this system requires training phase for
building the model of each class and testing phase for
evaluating performance of the classification system
using training and testing databases.
In (Tang, Chen, Li, & Zhong, 2016), the authors
have proposed a system of PCG signals classification
based on Back Propagation Neural Network classifier
applied on sequences of feature vectors extracted from
several domains. In this system, the segmentation step
is based on the hidden semi-Markov model (HSMM)
method that uses the ECG information to locate the
different local regions of the heartbeat sound
(Springer, Tarassenko, & Clifford, 2016). Then, the
global and the local regions (S1, Systole, S2 and
Diastole) of the heartbeat sound have been used to
extract local and global of several multi-domains
features in (Springer, Tarassenko, & Clifford, 2016)
(Tang, Chen, Li, & Zhong, 2016). Hence, each feature
vector representing the heartbeat sound is composed
of the concatenated features vectors extracted on each
local region and the global region. However, this
concatenation increases the vectors dimension which
augments the space memory, computing time and
probably reduces the accuracy caused by the curse of
dimensionality phenomenon.
3 PROPOSED CLASSIFICATION
SYSTEM
3.1 Descriptions of the Heart Sound
Databases
The dataset described in (Tang, Chen, Li, & Zhong,
2016) is used in the present work. This includes six
databases (labeled A through F), collecting a total of
3153 phonocardiogram (PCG) recordings. These
recordings were gathered from diverse settings,
including clinical and non-clinical environments, and
involve subjects ranging from healthy individuals to
those with pathological conditions. Each PCG
recording has undergone manual labeling, indicating
whether it is categorized as normal (-1) or abnormal
(1). The database is constituted of 2500 PCG
recordings of normal class and 653 of PCG
recordings of abnormal class. In the present work, this