Comparison the Performance of Hybrid HMM/MLP and RBF/LVQ

ANN Models

Application for Speech and Medical Pattern Classification

Lilia Lazli

, Abdennasser Chebira

, Mounir Boukadoum

and Kurosh Madani

Laboratory of Research in Computer Science (LRI/GRIA), Badji Mokhar University,

B.P.12 Sidi Amar 23000 Annaba, Algeria

Images, Signals and Intelligent Systems Laboratory (LISSI / EA 3956), PARIS XII University,

Senart-Fontainebleau Institute of Technology, Bat. A, Av. Pierre Point, F-77127 Lieusaint, France

Univesité du Québec A Montréal (UQAM), Montreal, Québec, Canada

Keywords: Speech Recognition, Medical Diagnosis, Hybrid RBF/LVQ Model, Hybrid HMM/MLP Model.

Abstract: In the last several years, the hybrid models have become increasingly popular. We use involves multi-

network RBF/LVQ structure and hybrid HMM/MLP model for speech recognition and medical diagnosis.

1 INTRODUCTION

The main difficulty in classification of speech and

biomedical signals is related, on the one hand, to a

large varity of such signals for a same diagnosis

result by example the variation panel of

corresponding medical signals could be very large,

and on the other hand, to the close resemblance

between such signals for two differents classification

results. We propose two hybrid approaches for

classification of electrical signals.

First, we have developed a serial multi-neural

network approach that involves both Learning

Vector Quantization (LVQ) and Radial Basis

Function (RBF) Artificial Neural Networks (ANN).

It is admitted that techniques based on single neural

network show a number of attractive futurs to solves

problems for which classical solutions have been

limited, it is also admitted that a flat neural structure

doesn`t represent the more appropriated way to

approach "intelligent behavior".

We propose in second part a hybrid HMM/MLP

model which makes it possible to join the

discriminating capacities, resistance to the noise of

MLP (Multi-Layer Perceptron) and the flexibilities

of HMMs (Hidden Markov Model) in order to

obtain better performances than traditional HMM.

2 DATABASES CONSTRUCTION

Three speech Data Bases (DB) and medical DB have

been used in this work:

2.1 Speech Databases

1) The first one referred to as DB1, the isolated

digits task has 13 words in the vocabulary: 1, 2, 3, 4,

5, 6, 7, 8, 9, zero, oh, yes and no, with a total of

3900 utterances.

2) The second DB2 contained about 50 speakers

saying their last and first name, the city of birth and

residence.

3) The DB3, contained the 13 control words (i.e.

View/new, save/save as/save all), the used training

set consists of 3900 sounds saying by 30 speakers.

2.2 Biomedical Database

The object is the classification of an electric signal

coming from a medical test, the used signals are

called Potentials Evoked Auditory (PEA) (Dujardin,

2006); (Lazli, 2007).

We choose 3 categories of patients (3 classes)

according to the type of their trouble, we selected

213 signals, so that every signal contains 128

parameters. 92 belong to the Normal (N) class, 83,

to the Endocochlear class (E) and 38 to the

Retrocochlear class (R). The basis of training

659

Lazli L., Boukadoum M., Chebira A. and Madani K..

Comparison the Performance of Hybrid HMM/MLP and RBF/LVQ ANN Models - Application for Speech and Medical Pattern Classiﬁcation.

DOI: 10.5220/0004133706590662

In Proceedings of the 9th International Conference on Informatics in Control, Automation and Robotics (ANNIIP-2012), pages 659-662

ISBN: 978-989-8565-21-1

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

contains 24 signals, of which 11 correspondent to

the R class , 6 to the E class and 7 to the N class.

3 MULTI-NEURAL NETWORK

BASED APPROACH

The approach we proposed to solve the problem is

based on Multi-Neural Network (MNN) concept. A

MNN could be seen as a neural structure including a

set of similar neural networks (homogeneous MNN

architecture) or a set of different neural nets

(heterogeneous MNN architecture).

The serial homogeneous MNN is equivalent to a

single neural network structure with a greater

number of layers with different neuron activation

functions. So the use of homogeneous MNN with a

serial organization is here out of real interest. In the

parallel homogeneous MNN configuration, each

neural net operates as some "expert". So the interest

of parallel homogeneous MNN appears when a

decision stage, to process the results pointed out by

the set of such "expert", is associated to such MNN

structure becomes then a serial/parallel MNN,

needing an optimization procedure to determine the

number of neural nets to be used (Dujardin, 2006).

We propose an intermediary solution: a two

stage serial heterogeneous MNN structure

combining a RBF based classifier (operating as the

first processing stage) with a LVQ based decision-

classification stage see figure 1.

The RBF model we use is a weighted-RBF

model but a standard one and so, it performs the

feature space mapping associating a set of

"categories" (in our case a category corresponds to a

possible pathological class for example for medical

DB ) to a set of "areas" of the feature space. The

LVQ neural model belongs to the the class of

competitive neural network structure. It includes one

hidden layer, called competitive layer. Even if the

LVQ model has essentially been used for the

classification tasks, the competitive nature of it`s

learning strategy (based on winner takes all

strategy), makes it usable as a decision-classification

operator. On the other hand, the weighted nature of

transfer functions between the input layer and the

hidden one and between the hidden layer and the

output one in this model allows non-linear

approximation capability, making such neural net a

function "approximation operator".

Taking into account the above analysis, the

proposed serial MNN structure could be seen as a

structure associating a neural decision operator to a

neural classifier. Moreover, the proposed structure

could also be seen as some global neural structure

with two hidden layers. So, the association of two

neural models improves the global order of the non

linear approximation capability of such global neural

operator, comparing to each single neural structure

constituting the MNN system. This technique allows

to fill in the gap induced by the RBF ANN, and thus,

to refine the classification.

Figure 1: Serial Multi-Neural Network based structure.

4 HYBRID HMM-ANN MODELS

We describe the theoretical formulation of our

hybrid HMM/ANN model, an approach for the

training and estimation of posterior probabilities

using a recursive algorithm that is reminiscent of the

EM (Expectation Maximization) algorithm for the

estimation of data likelihoods. The method is

developped in the context of a statistical model for

transition-based electrical signal recognition using

ANN to generate probabilities for HMM. In the new

approach, we use local conditional posterior

probabilities of transitions to estimate global

posterior probabilities of instance sequences given

acoustic data.

ICINCO2012-9thInternationalConferenceonInformaticsinControl,AutomationandRobotics

660

4.1

ANN c

and Se

systems,

approxi

Functio

equatio

p(x

vector

ANN c

robabil

acoustic

emissio

network

,Θ)

Which i

p(q

). It

during

robabil

roduci

robabil

classifie

robabil

or test c

Thu

emissio

obtaine

the train

Figure

transitio

stimatin

n be used to

lami, 2003)

the role

ate probab

s (PDF). Pra

s, we would

is the value

iven the h

an be train

ty p(q

)

data (figure

PDF value

outputs appr

is an estimat

)\(

xq =

plicitly co

is thus pos

lassification

ties occur o

g the netwo

ties can b

to compens

ties that are

nditions.

, scaled lik

probabilitie

by dividing

ng se

, whic

(

: An MLP

probabilities.

HMM Lik

classify

att

. For statis

f the local

lities or Pr

tically, give

ike to estim

f the PDF of

pothesized

d to produ

f the HM

2). This ca

using Bay

ximate Baye

of:

)(

)\(

qxp

tains a priori

ible to var

ithout retra

ly as multi

k outputs.

adjusted

te for traini

ot represent

lihoods p(x

s in standar

the network

gives us an

)(

hat estimates

lihoods w

rn classes (

ical recogn

estimator i

bability De

the basic H

te something

the observed

MM state.

e the post

state give

be converte

s’rule. Since

sian probabil

)(

class proba

the class p

ning, since t

licative ter

s a result,

uring use

g data with

tive of actua

) for us

HMM ca

outputs g

)

stimate of:

local condit

azli

tion

sity

data

The

rior

the

d to

the

ties,

(1)

ility

iors

hese

s in

lass

f a

lass

use

)

(2)

onal

cla

the

opt

rec

wit

cla

the

cor

inp

int

cor

ing recognit

stant for all

sification. It

he priors, w

o longer a di

discriminant

mization fo

gnition. Th

M formalis

racteristics.

CASE S

EXPER

Compa

and L

learning D

F network

a ra

e of co

s, 58% for th

For the tre

sification as

DB2, 63% f

The LVQ ne

ical DB wit

for the R

N class. F

ect classific

for the DB

Results

Multi-

cerning the

edical DB,

esponding t

t vectors, t

number of

rons.

For the LV

al to the n

N. The outp

y neurons a

idden layer i

sidering the

the 3 classe

he learning

the learnt

eralization p

l classifies

ect classific

on, the sea

classes and

ould be arg

re using a s

criminant cri

training has

the system

s, this permi

, while taki

UDY A

MNTAL

ison Stud

Q Approa

has succes

ell classifies

rect classific

E class, 68

speech D

follows: 51%

r the DB3.

wor

well cl

a ra

e of c

lass, 57% fo

r the tree s

tion as follo

, 68% for the

Relative t

eural Net

RBF ANN,

the number

e output la

neurons of t

ANN, the n

mber of out

t layer of the

classes (3).

8 neurons h

umber of su

B has succe

ectors are

ase. We ca

5% of the f

tion of: 71

ing factor

will not c

ed that, whe

aled likeliho

terion. Howe

ffected the

that is us

t uses of the

g advantage

RESULT

with Sin

hes

fully been l

62,3% of th

tion of: 61%

for the N cl

s, a rate o

for the DB1

ssifies 62%

rrect classifi

the E class

eech DBs,

s: 65% for

DB3.

RBF/LV

ork Appr

by exampl

of input neu

of compone

er contain 3

e hidden la

mber of inp

ut cells of

LVQ ANN c

he number o

as been dete

bclasses we

ssfully been l

ell classifie

see that thi

ll DB, with

for the R c

) is a

ange the

dividing

d, which

er, since

arametric

d during

standard

of ANN

le RBF

arnt. The

full DB,

for the R

ss.

f co

rect

52% for

f the full

ation of:

57% for

rate of

the DB1,

based

ach

for the

ons (88)

ts of the

neu

ons.

is 20

t cells is

the RBF

ntains as

f neurons

mined by

an count

earnt. All

in the

networ

a rate of

ass, 55%

ComparisonthePerformanceofHybridHMM/MLPandRBF/LVQANNModels-ApplicationforSpeechandMedical

PatternClassification

661

for the E class, 69% for the N class.

For the speech DBs, a rate of correct

classification as follows: 79% for the DB1, 86% for

the DB2, 72% for the DB3.

Comparing to the two single approaches, with

our proposed MNN technique, we obtain globally

better results than the single RBF or LVQ ANN

approach.

5.3 Results Relative to Discrete HMM

and Hybrid HMM/MLP Approach

Further assume that for each class in the vocabulary

we have a training set of k occurrences (instances) of

each class where each instance of the categories

constitutes an observation sequence.

a. Discrete HMM

For speech DBs, 10-state, strictly left-to-right,

discrete HMM were used to model each basic unit

(words). In this case, the acoustic feature were

quantizied into 4 independent codebooks according

to the KM algorithm: 128 clusters for the J RASTA-

PLP coefficients, 128 for the Δ J RASTA-PLP

coefficients, 32 clusters for Δ energy, 32 clusters for

ΔΔ energy.

For the PEA signals, 5-state, strictly left-to-right,

discrete HMM were used. Table 1 gives the results

of this experiment.

Table 1: Discrete HMM results.

BDB SDB1 SDB2 SDB3

Rate% 84 87 90 76

b. Discrete MLP with entries provided by the FCM

Algorithm

We use an hybrid HMM/MLP model using in entry

of the ANN an acoustic vector composed of real

values which were obtained by applying the FCM

algorithm (Lazli and Sellami, 2003) with 2880 real

components corresponding to the various

membership degrees of the acoustic vectors to the

classes of the "code-book". We reported the values

used for SDB2.

Table 2: Hybrid HMM/MLP results.

BDB SDB1 SDB2 SDB3

Rates % 94 97 97 83

10-state, strictly left-to-right, word HMM with

emission probabilities computed from an MLP with

9 frames of quantizied acoustic vectors at the input.

Thus a MLP with only one hidden layer including

2880 neurons at the entry, 30 neurons for the hidden

layer and 10 output neurons was trained.

For the PEA signals, a MLP with 64 neurons at

the entry, 18 neurons for the hidden layer and 5

output neurons was trained. Table 2 gives the

results.

6 CONCLUSIONS

In this paper, the association of RBF and LVQ

neural models improves the global order of the non

linear approximation capability of such global neural

operator, comparing to each single neural structure

constituting the MNN system.

For the second hybrid HMM/MLP model, a

recognition tasks show an increase in the estimates

of the posterior probabilities of the correct class after

training.

From the effectiveness, it seems that the hybrid

HMM/MLP model are more powerful than multi-

network RBF/LVQ structure.

REFERENCES

L. Lazli, M. Sellami. "Connectionist Probability

Estimators in HMM Speech Recognition using Fuzzy

Logic". MLDM 2003: the 3rd international conference

on Machine Learning & Data Mining in pattern

recognition, LNAI 2734, Springer-verlag, pp.379-388,

July 5- 7, Leipzig, Germany, 2003.

A-S. Dujardin. "Pertinence d'une approche hybride multi-

neuronale dans la résolution de problèmes liés au

diagnostic industrièle ou médical". Internal report, I2S

laboratory, IUT of "Sénart Fontainebleau", University

of Paris XII, Avenue Pierre Point, 77127 Lieusant,

France, 2006.

L. Lazli, A-N. Chebira, K. Madani. "Hidden Markov

Models for Complex Pattern Classification". Ninth

International Conference on Pattern Recognition and

Information Processing, PRIP’07 may 22-24, Minsk,

Belarus, 2007. http://uiip.basnet. by/conf/prip2007/

prip2007.php-id=200.htm

ICINCO2012-9thInternationalConferenceonInformaticsinControl,AutomationandRobotics

662