A SURVEY OF AUDIO PROCESSING ALGORITHMS FOR DIGITAL

STETHOSCOPES

Fabio de Lima Hedayioglu, Miguel Tavares Coimbra

Instituto de Telecomunicações, Faculdade de Ciências da Universidade do Porto

Rua do Campo Alegre, 1021/1055, 4169 - 007 Porto, Portugal

Sandra da Silva Mattos

Fetal and Pediatric Cardiology Unit (UCMF) at Real Hospital Português de Beneciﬁencia em Pernambuco

Av. Portugal, 163 - Recife, PE Brazil

Keywords:

Digital stethoscope, Audio processing.

Abstract:

Digital stethoscopes have been drawing the attention of the biomedical engineering community for some

time now, as seen from patent applications and scientiﬁc publications. In the future, we expect ’intelligent

stethoscopes’ to assist the clinician in cardiac exam analysis and diagnostic, potentiating functionalities such as

the teaching of auscultation, telemedicine, and personalized healthcare. In this paper we review the most recent

heart sound processing publications, discussing their adequacy for implementation in digital stethoscopes. Our

results show a body of interesting and promising work, although we identify three important limitations of

this research ﬁeld: lack of a set of universally accepted heart-sound features, badly described experimental

methodologies and absence of a clinical validation step. Correcting these ﬂaws is vital for creating convincing

next-generation ’intelligent’ digital stethoscopes that the medical community can use and trust.

1 INTRODUCTION

Auscultation is one of the oldest, cheapest and most

useful techniques for the diagnosis of heart disease.

Since their invention in 1816, stethoscopes have been

used as part of the initial evaluation of all patients

with suspected heart or lung problems. An experi-

enced physician can diagnose a large number of clin-

ical conditions just from the initial auscultation of the

patient’s chest (Tilkian and Conover, 1984). There

have been several attempts to create electronically en-

hanced stethoscopes, with better sound ampliﬁcation

and frequency response. However, and according to

Durand (Durand and Pibarot, 1995), their introduc-

tion into clinical practice has been hindered by factors

such as their background noise, unfamiliar sounds

to clinicians due to ﬁltering or fragility and bad er-

gonomic design. Recent advances in electronics and

digital circuits allow us to not only overcome these

problems but also to exploit the beneﬁts of digital sig-

nal processing for signal analysis and visualization.

In this paper we will embrace this novel perspective

and analyze the state-of-the-art in audio processing of

heart sounds that might be adequate for integrating

into this next generation of stethoscopes. A deeper

explanation of digital stethoscopes is given in Section

2, a review of audio processing methods is described

in Section 3, followed by a discussion (Section 4) on

the future of digital stethoscopes and how biomedical

engineers can contribute to the success of this tech-

nology.

2 DIGITAL STETHOSCOPE

It is essential that we deﬁne digital stethoscope since

there can be various interpretations from the name

alone. Traditional stethoscopes depend solely on

acoustics to amplify and transmit the heart sounds to

the clinician. The concept of electronic stethoscope

arrives when electronic components were ﬁrst used to

amplify, ﬁlter and transmit the sound (Fig.1) (Durand

and Pibarot, 1995).

There are various examples in literature regard-

425

de Lima Hedayioglu F., Tavares Coimbra M. and da Silva Mattos S. (2009).

A SURVEY OF AUDIO PROCESSING ALGORITHMS FOR DIGITAL STETHOSCOPES.

In Proceedings of the International Conference on Health Informatics, pages 425-429

DOI: 10.5220/0001512104250429

 SciTePress

Figure 1: Lab prototype of an electronically enhanced

stethoscope.

ing the development of digital and electronic stetho-

scopes. Bredesen and Schmerler (Bredesen and

Schmerler, 1991) have patented an “intelligent stetho-

scope” designed for performing auscultation and for

automatically diagnosing abnormalities by comparing

digitized sounds to reference templates using a signa-

ture analysis technique. Several other electronically

enhanced and digital stethoscopes have been devel-

oped and described in literature (F. L. Hedayioglu,

2007; M.E. Tavel and Shander, 1994; Durand and

Pibarot, 1995; Brusco and Nazeran, 2005).

Figure 2: Block diagram of a digital stethoscope proto-

type developed by our group and ﬁeld-tested at Real Hos-

pital Português de Beneﬁciência em Pernambuco in Recife,

Brazil. Over 100 auscultations were performed during the

clinical validation stage.

Fig. 2 shows a block diagram of a digital stetho-

scope prototype developed by our group. The auscul-

tation quality was considered satisfactory in clinical

trials when compared to auscultation using acoustic

stethoscopes, but clinicians still perceived differences

in audio pitch although this did not affect their abil-

ity to diagnose heart conditions. Our ﬁeld experience

conﬁrms Durand’s (Durand and Pibarot, 1995) opin-

ion that audio enhancement alone is not enough for

the clinical community to adopt this new technology.

In order to make digital stethoscopes attractive to clin-

ical cardiologists, we clearly need to address the nu-

merous potential improvements provided by a fully

functional, robust digital stethoscope: real-time ac-

quisition, analysis, display and reproduction of heart

sounds and murmurs. Digital stethoscopes must also

open the doors for digital audio archives, simplifying

the acquisition, storage and transmission process of

cardiac exams and murmurs, potentiating functionali-

ties such as the teaching of auscultation, telemedicine,

and personalized healthcare.

3 AUDIO PROCESSING

For the analysis of the state-of-the-art on audio pro-

cessing in cardiology, we have very loosely adopted

some concepts of clinical systematic reviews. A rig-

orous systematic review of such a multi-disciplinary

vast ﬁeld is quite difﬁcult to implement in practice due

to the large number of papers retrieved by analysis

of both engineering and medical scientiﬁc databases.

Our review methodology was as follows:

• Considered that Durand’s (Durand and Pibarot,

1995) excellent review paper fully covers this

topic up to 1995.

• Consulted the IEEE Xplore (ieeexplore.ieee.org)

database with the following query: “(((feature ex-

traction)<in>metadata) <and> ((cardiology)<in>

metadata))”, obtaining 159 results after 1995.

• By title and abstract inspection, we kept only pa-

pers dealing with phonocardiogram data analysis,

reducing this number to 19.

• We analyzed the references from all these papers,

and selected all papers published after 1995 and

with more than 10 citations, obtaining 20 results.

This enabled us to cover additional articles be-

sides the ones published in IEEE journals and

conferences, artiﬁcially expanding the scope of

our review to other scientiﬁc databases.

• The total number of papers covered by this review

is thus 39 (19+20).

Although we are certain that it is possible to miss

some papers using this methodology, we feel that we

have covered a sufﬁciently vast and interesting sam-

ple to draw some important conclusions, as described

in Section 4.

3.1 Heart Sound Analysis and Feature

Extraction

The main constituents of a cardiac cycle are the ﬁrst

heart sound (typically referred to as S1), the systolic

period, the second heart sound (S2) and the diastolic

period. Whenever a clinician is performing an aus-

cultation, he tries to identify these individual compo-

nents, and is trained to analyze related features such

as rhythm, timing instants, intensity of heart sound

HEALTHINF 2009 - International Conference on Health Informatics

426

components, splitting of S2, etc (H. Liang and Har-

timo, 1997b). This analysis allows him to search for

murmurs and sound abnormalities that might corre-

spond to speciﬁc cardiac pathologies. From a signal

processing perspective, Heart Sound Analysis (HSA)

is not only interesting by itself (allowing quantita-

tive measures to be displayed automatically in a dig-

ital stethoscope), but is also an essential ﬁrst step for

the subsequent task of automatic pathology classiﬁca-

tion. In this paper, we will distinguish two sub-tasks

of HSA: Heart Sound Segmentation (HSS) and Aortic

Pulmonary Signal Decomposition (APSD).

3.1.1 Heart Sound Segmentation

In HSS we expect to identify and segment the

four main constituents of a cardiac cycle. This

is typically accomplished by identifying the posi-

tion and duration of S1 and S2, using some sort of

peak-picking methodology on a pre-processed sig-

nal. Liang (H. Liang and Hartimo, 1997a) has used

discrete wavelet decomposition and reconstructed the

signal using only the most relevant frequency bands.

Peak-picking was performed by thresholding the nor-

malized average Shannon energy, and discarding ex-

tra peaks via analysis of the mean and variance of

peak intervals. Finally, they distinguish between S1

and S2 peaks (assuming that the diastolic period is

longer than the systolic one, and that the later is more

constant), and estimate their durations. A classiﬁ-

cation accuracy of 93% was obtained on 515 peri-

ods of PCG signal recordings from 37 digital phono-

cardiographic recordings. The same authors further

improved the statistical signiﬁcance of their results

by obtaining the same accuracy using 1165 cardiac

periods from 77 recordings (H. Liang and Hartimo,

1997b), and later attempted murmur classiﬁcation

based on these features and neural network classi-

ﬁers, obtaining 74% accuracy (Liang and Hartimo,

1998b). Omran (Sherif Omran, 2003) has also studied

this problem using normalized Shannon entropy after

wavelet decomposition of the audio signal, but their

experimental methodology is not so convincing.

3.1.2 Aortic Pulmonary Signal Decomposition

Besides the four main components of the cardiac

cycle, there is a clinical interest in the analysis of

some of its associated sub-components (JingPing Xu,

2000). It has been recognized that S1 may be com-

posed of up to four components produced during ven-

tricular contraction (Durand and Pibarot, 1995), al-

though the complexity of this task has been a very

difﬁcult hurdle for the signal processing community.

The S2 sound is more well known, being composed

of an aortic component (A2), which is produced ﬁrst

during the closure and vibration of the aortic valve

and surrounding tissues, followed by the pulmonary

component (P2) produced by a similar process asso-

ciated with the pulmonary valve (JingPing Xu, 2000).

Durand (JingPing Xu, 2000) demonstrated that it is

possible to model each component of S2 by a narrow-

band nonlinear chirp signal. Later (JingPing Xu,

2001) he adapted and validated this approach for the

analysis and synthesis of overlapping A2 and P2 com-

ponents of S2. To do so, the time-frequency represen-

tation of the signal is generated and then estimated

and reconstructed using the instantaneous phase and

amplitude of each component (A2 and P2). In this

paper the accuracy evaluation was made by a simu-

lated A2 and P2 components having different overlap-

ping factors. The reported error was between 1% and

6%, proportional to the duration of the overlapping

interval. Nigam (Nigam and Priemer, 2006) also pre-

sented a method for extracting A2 and P2 components

by assuming them as statistically independent. To do

so, four simultaneous auscultations are analyzed us-

ing blind source separation. The main advantage of

this method is the lower dependence on the A2-P2

time interval, although it needs a non-conventional 4-

sensor stethoscope. Leung (T. S. Leung, 1998) also

analyzed the splitting of S2 using time-frequency de-

composition.

3.2 Automatic Pathology Classiﬁcation

The vast majority of papers we have found regarding

audio processing algorithms, adequate for the integra-

tion into a digital stethoscope, concern the detection

of speciﬁc heart pathologies. This highlights the in-

terest of the scientiﬁc community on this topic but, as

our analysis shows, there are still some major ﬂaws

in most of them such as the absence of a clinical val-

idation step and unconvincing experimental method-

ologies. Most papers use the well-established pattern

recognition approach of feature extraction followed

by a classiﬁer. Due to space limitations, we will de-

scribe the most interesting papers, leaving a more de-

tailed discussion on this topic to Section IV. Bentley

(P. M. Bentley and Grant, 1995) uses Choi-Williams

Distribution (CWD) as features, working with 45 nor-

mal/abnormal valve subjects. Some features were de-

termined via visual inspection, others automatically

from the CWD by simple rule-based classiﬁcation.

Latter (P. M. Bentley and McDonnell, 1998), the au-

thors show that CWD is a better method to represent

the frequencies in PCG and to get heart sound de-

scriptors, than other time-frequency (T-F) represen-

tations. According to them, a simple description of

A SURVEY OF AUDIO PROCESSING ALGORITHMS FOR DIGITAL STETHOSCOPES

427

the T-F distribution allows an analysis of the heart

valve’s condition. However, they highlight the need

of a more comprehensive evaluation using a larger

population of test patients. Wang (P. Wang and Soh,

2005) proposes a representation of heart sounds that

is robust to noise levels of 20dB, using mel-scaled

wavelet features. However, details regarding the used

dataset are not clear enough for robust conclusions.

Liang (Liang and Hartimo, 1998a) developed an inter-

esting feature vector extraction algorithm where the

systolic signal is decomposed by wavelets into sub-

bands. Then, the best basis set is selected, and the

average feature vector of each heart sound recording

is calculated. Neural Networks (NN) are used for

classifying 20 samples after being trained with 65,

obtaining an accuracy of 85%. NNs are also used

by Abdel-Alim (Onsy Abdel-Alim and El-Hanjouri,

2002) for the automatic diagnostics of heart valves us-

ing wavelets feature vectors and stethoscope location

information. They use two NNs: one for systolic dis-

eases and the other for diastolic diseases. A total of

1200 cases were used: 970 cases for training and 300

for testing. The recognition rate was 95%. Turkoglu

(Turkoglu and Arslan, 2001), Ozgur (Ozgur Say and

Olmez, 2002) and El-Hanjouri (M. El-Hanjouri and

Alim, 2002) also used wavelets as feature vectors for

classiﬁcation, although they provide too few details

regarding the used data sets. Trimmed mean spectro-

grams are used by Leung (T.S. Leung and Salmon,

2000) to extract features of phonocardiograms. To-

gether with the acoustic intensities in systole and di-

astole, the authors quantiﬁed the distinctive character-

istics of different types of murmurs using NNs. One

of the few papers that is conscious about the impor-

tant clinical validation step is from Kail (E Kail and

Balázs, 2004). The authors propose a novel sound

representation (2D and 3D) and feature extraction al-

gorithm using Morlet wavelet scalograms. After man-

ual classiﬁcation of the resulting graphs performed by

two cardiologists on 773 subjects, they clinically val-

idated the features as useful for sound and murmur

extraction. Sharif (Zaiton Sharif and Salleh, 2000)

also proposes other features for classiﬁcation systems

based on central ﬁnite difference and zero crossing

frequency estimation.

4 DISCUSSION

By covering the most interesting papers on audio-

processing from a digital stethoscope perspective, we

can make some observations regarding the state-of-

the-art on this ﬁeld. Section 3.1 has shown us that

there are already important results regarding audio

feature extraction. The S1 and S2 sounds can be ro-

bustly segmented and there is promising work regard-

ing the extraction of secondary sounds such as A2 and

P2.

The scenario is not so bright for automatic pathol-

ogy classiﬁcation (Section 3.2). Reviewing some of

the papers and simply observing the disparity in the

number of publications when compared with the other

challenges, we conclude that there is a strong inter-

esting in this topic. However, in our opinion, there is

still a long way to go before we can have robust au-

tomatic classiﬁcation systems that can be introduced

in the clinical routine of hospitals. We have identiﬁed

three major problems that afﬂict most of the papers

reviewed:

• Absence of a set of well-accepted features - We

rarely found papers that selected the same features

for pathology classiﬁcation. Most acknowledge

that the presence of S1 and S2 is important but

there is no consensus of the scientiﬁc community

on how these should be used. We have collected

more than 25 different features with minimum

overlap between papers. We clearly need more

studies on the statistical signiﬁcance and clinical

importance of heart sound features, from an auto-

matic pattern recognition perspective.

• Badly descript data-sets - It is not enough for au-

thors to mention that they have worked with 300

cardiac cycles. Where were these obtained? From

how many patients? In what conditions? Using

which equipment? All these factors are vital in

the analysis of a system’s performance and robust-

ness. Studies need to be much more rigorous on

this topic so their results can be reasonably con-

vincing.

• Absence of clinical validation - Almost no papers

bothered to handle this vital task of all assisted-

diagnostic systems. No medical specialist will

trust any kind of automatic system without it prov-

ing to be robust and accurate in real ﬁeld testing.

These conditions are very different from a typical

biomedical engineering research lab, which can

drastically affect results.

As a ﬁnal conclusion, we can say that working

towards next-generation ’intelligent’ digital stetho-

scopes is highly desirable judging from the signiﬁ-

cant number of scientiﬁc publications on this topic but

also examining the undeniable beneﬁts that such sys-

tems can provide. There is already solid work regard-

ing audio feature extraction and many unsolved chal-

lenges in this ﬁeld such as the complex analysis of the

sub-components of S1. Automatic pathology classiﬁ-

cation is still too undeveloped to be of any practical

HEALTHINF 2009 - International Conference on Health Informatics

428

usage and we hope that the valuable lessons learned

from this study can correct previous mistakes and pro-

vide a precious boost to the challenging ﬁeld of audio

processing for digital stethoscopes.

ACKNOWLEDGEMENTS

This work was supported by the Programme Al-

ban, the European Union Programme of High Level

Scholarships for Latin America, scholarship no.

E07M402298BR.

REFERENCES

Bredesen, M. and Schmerler, E. (1991). Us patent no.

5,010,889: Intelligent stethoscope.

Brusco, M. and Nazeran, H. (2005). Development of an

intelligent pda-based wearable digital phonocardio-

graph. In Proceedings of the 2005 IEEE Engineering

in Medicine and Biology 27th Annual Conference.

Durand, L.-G. and Pibarot, P. (1995). Digital signal pro-

cessing of the phonocardiogram: Review of the most

recent advancements. Critical Reviews in Biomedical

Engineering.

E Kail, S Khoór, B. K. K. F. and Balázs, F. (2004). Internet

digital phonocardiography in clinical settings and in

population screening. Computers in Cardiology.

F. L. Hedayioglu, S. S. Mattos, L. M. M. E. d. L. (2007). De-

velopment of a tele-stethoscope and it’s application in

pediatric cardiology. Indian Journal of Experimental

Biology, 45.

H. Liang, S. L. and Hartimo, I. (1997a). Heart sound

segmentation algorithm based on heart sound envel-

ogram. Computers in Cardiology, 24.

H. Liang, S. L. and Hartimo, I. (1997b). A heart sound

segmentation algorithm using wavelet decomposition

and reconstruction. In 19th International Conference

- IEEE/EMBS, Chicago, IL, USA.

JingPing Xu, L.G. Durand, P. P. (2000). Nonlinear transient

chirp signal modeling of the aortic and pulmonary

components of the secound heart sound. IEEE Trans-

actions on Biomedical Engineering, 47(7).

JingPing Xu, L.G. Durand, P. P. (2001). Extraction of

the aortic and pulmonary components of the sec-

ound heart sound using nonlinear transient chirp sig-

nal model. IEEE Transactions on Biomedical Engi-

neering, 48(3).

Liang, H. and Hartimo, I. (1998a). A feature extraction

algorithm based on wavelet packet decomposition for

heart sound signals. In Proceedings of the IEEE-SP

International Symposium.

Liang, H. and Hartimo, I. (1998b). A heart sound feature

extraction algorithm based on wavelet decomposition

and reconstruction. In Proc. IEEE EMBS, volume 20.

M. El-Hanjouri, W. Alkhaldi, N. H. and Alim, A. (2002).

Heart diseases diagnosis using hmm. In Electrotech-

nical Conference -MELECON.

M.E. Tavel, D. B. and Shander, D. (1994). Enhanced aus-

cultation with a new graphic display system. In Arch.

Intern. Med., volume 154, page 893.

Nigam, V. and Priemer, R. (2006). A procedure to ex-

tract the aortic and the pulmonary sounds from the

phonocardiogram. In Proceedings of the 28th IEEE

EMBS Annual International Conference, New York

City, USA.

Onsy Abdel-Alim, N. H. and El-Hanjouri, M. A. (2002).

Heart diseases diagnosis using heart sounds. In Radio

Science Conference.

Ozgur Say, Z. D. and Olmez, T. (2002). Classiﬁcation of

heart sounds by using wavelet transform. In Proceed-

ings of the Second Joint EMBS/BMES Conference,

volume 1.

P. M. Bentley, J. T. E. M. and Grant, P. M. (1995). Clas-

siﬁcation of native heart valve sounds using the choi-

williams time-frequency distribution. In IEEE-EMBC

and CMBEC.

P. M. Bentley, P. M. G. and McDonnell, J. T. E. (1998).

Time-frequency and time-scale techniques for the

classiﬁcation of native and bioprosthetic heart valve

sounds. IEEE Transactions on Biomedical Engineer-

ing, 45(1).

P. Wang, Y. K. and Soh, C. B. (2005). Feature extraction

based on mel-scaled wavelet transform for heart sound

analysis. In Engineering in Medicine and Biology So-

ciety, 2005. IEEE-EMBS 2005. 27th Annual Interna-

tional Conference.

Sherif Omran, M. T. (2003). A heart sound segmentation

and feature extraction algorithm using wavelet. In

Proc. of IEEE MWSCAS ’03, volume 1, pages 27–30.

T. S. Leung, P. R. White, J. C. W. B. C. E. B. A. P. S. (1998).

Analysis of the secound heart sound for diagnosis of

paediatric heart disease. In IEE Proceedings - Sci.

Meas. Technol., volume 145.

Tilkian, A. and Conover, M. (1984). Understanding heart

sounds and murmurs with an introduction to lung

sounds. W.B. Saunders Company.

T.S. Leung, P.R. White, W. B. C. E. B. and Salmon, A. P.

(2000). Classiﬁcation of heart sounds using time-

frequency method and artiﬁcial neural networks. In

Proceedings of the 22nd Annual International Con-

ference of the IEEE, volume 2.

Turkoglu, I. and Arslan, A. (2001). An intelligent pat-

tern recognition system based on neural network

and wavelet decomposition for interpretation of heart

sounds. In Proceedings of the 23rd Annual Interna-

tional Conference of the IEEE, volume 2, pages 25–

28.

Zaiton Sharif, Mohd Shamian Zainal, A. Z. S. and Salleh,

S. H. S. (2000). Analysis and classiﬁcation of heart

sounds and murmurs based on the instantaneous en-

ergy and frequency estimations. In Proceedings of

TENCON, volume 2.

A SURVEY OF AUDIO PROCESSING ALGORITHMS FOR DIGITAL STETHOSCOPES

429