A NEURAL NETWORK FRAMEWORK FOR IMPLEMENTING
THE BAYESIAN LEARNING
Luminita State
University of Pitesti, Caderea Bastiliei #45, Bucharest #1, Romania
Catalina Cocianu, Viorica Stefanescu
Academy of Economic Studies, Calea Dorobantilor 15-17, Bucharest #1, Romania
Vlamos Panayiotis
Hellenic Open University, Greece
Keywords: Neural Networks, Competitive Learning, Hidden Markov Models, Pattern Recognition, Bayesian Learning,
Weighting Processes, Markov Chains.
Abstract: The research reported in the paper aims the development of a suitable neural architecture for implementing
the Bayesian procedure for solving pattern recognition problems. The proposed neural system is based on an
inhibitive competition installed among the hidden neurons of the computation layer. The local memories of
the hidden neurons are computed adaptively according to an estimation model of the parameters of the
Bayesian classifier. Also, the paper reports a series of qualitative attempts in analyzing the behavior of a
new learning procedure of the parameters an HMM by modeling different types of stochastic dependencies
on the space of states corresponding to the underlying finite automaton. The approach aims the development
of some new methods in processing image and speech signals in solving pattern recognition problems.
Basically, the attempts are stated in terms of weighting processes and deterministic/non deterministic
Bayesian procedures. The aims were mainly to derive asymptotical conclusions concerning the performance
of the proposed estimation techniques in approximating the ideal Bayesian procedure. The proposed
methodology adopts the standard assumptions on the conditional independence properties of the involved
stochastic processes.
1 HMM IN BAYESIAN LEARNING
Stochastic models represent a very promising
approach to temporal pattern recognition. An
important class of the stochastic models is based on
Markovian state transition, two of the typical
examples being the Markov model (MM) and the
Hidden Markov Model (HMM).
The latent structure of observable phenomenon is
modeled in terms of a finite automaton Q, the
observable variable being thought as the output
produced by the states of Q. Both evolutions, in the
spaces of non observable as well as in the space of
observable variables, are assumed to be governed by
probabilistic laws.
In the sequel, we denote by
()
0n
n
Λ
the stochastic
process describing the hidden evolution and by
(
)
0n
n
X
the stochastic process corresponding to the
observable evolution.
Let Q be the set of states of the underlying finite
automaton;
mQ =
. We denote by
n
τ
the
probability distribution on Q at the moment n. Let
(
)
P,,
be a probability space,
()
σ
,C, be a
measure space, where
σ
is a
σ
-finite measure. The
output of each state
Qq
is represented by the
random element
:X of density function
(
)
.f
q
. Let
be the apriori probability distribution
on Q. We assume that
()
0q,Qq
ξ
. The
326
State L., Cocianu C., Stefanescu V. and Panayiotis V. (2004).
A NEURAL NETWORK FRAMEWORK FOR IMPLEMENTING THE BAYESIAN LEARNING.
In Proceedings of the First International Conference on Informatics in Control, Automation and Robotics, pages 328-331
DOI: 10.5220/0001127503280331
Copyright
c
SciTePress
conclusions on the hidden evolution are derived
using the Bayesian procedure when the apriori
probability distribution
and the set of density
functions
(
)
Qq,f
q,n
are known.
Let
be a risk function. The
outputs of the automaton are represented by the
sequence of random elements
, where the
output at the moment n,
is distributed
),0[QQ:L ×
()
0n
n
X
n
X
(
)
n
q
ρ
if it
was emitted by the state
. Let
n
q
[
]
(
)
{
}
=
Q
1,0t/tR
be the set of random decision procedures, where, for
any
,x,Qq,Rt
(
)( )
qxt is the probability of
deciding that the output x is produced by the state q.
For any
we denote the expected risk by, Rt
() ()()() ()
(
)
∑∑
∈∈
=
QqQq
qq
dxxfxtqqLqftR
σξξ
,,,
The Bayesian decision procedure
Rt
~
assures
the minimum risk that is,
()
()()
f,f,t,Rinff,t
~
,R
Rt
ξΦξξ
=
and it is given by,
()
1
()
()
{}
()
()
{}
()
()
{}
()
=
>
<
=
x,qTminx,qT,
x,qTminx,qT,0
x,qTminx,qT,1
xt
~
*
q\Qq
q
*
q\Qq
*
q\Qq
q
*
*
*
α
, where
()
2
() ()
(
)()
=
Qq
q
xfq,qLqx,qT
ξ
, 1
Aq
q
=
α
,
0,Aq
q
α
and
()
{}
()
==
x,qTminx,qT,Qq/qA
*
q\Qq
*
.
The true evolution in the space Q of non
observable variables is governed by probabilistic
laws,
()
0n
n
τ
, where
n
τ
represents the probability
distribution on Q at the moment n.
Let
be a sequence of subjective utilities
assigned to the states of the automaton;
. We assume that, for any
. For any and
()
0n
n
u
),0[Q:u,0n
n
()
0qu,1n
Qq
n
0n
Qq
,
stands for the subjective utility assigned to the
state q at the moment n. Typically,
can be
taken as the relative emitting frequency of the state q
during the time interval
[]
.
()
qu
n
()
qu
n
n,0
Let
be a sequence of measurable
functions,
, , a Parzen-
like basis of asymptotically unbiased estimates of
the system of density functions
()
1n
n
g
),0[:g
n
× 1n
(
)
Qq,f
q
satisfying a series of convenable regularity
assumptions. Our method is a supervised technique
based on the learning sequence
(
)
(
)
1n/X,S
nn
=
Λ
, where the true probability
distribution
n
τ
is approximated by a weighting
process
(
)
(
)
0n
n
Qq,q
ξ
defined by
()
(
)
(
)
() ()
=
Qq
n
n
n
quq
quq
q
ξ
ξ
ξ
representing the guess that q
is the emitting state at the moment
n. The decision
procedure
*
n
t
~
is defined by
(
in terms of
)
1
(
)
q
n
ξ
and
()
()
()(
=
=
n
1j
jnj
n
q,n
X,xgq,
qn
1
xf
Λδ
ξ
)
, where
()
=
=
qq,0
qq,1
q,q
δ
. The criterion function
(
)
x,qT
given by
(
)
2 is replaced by
(
)
3
(
)
(
)
(
)()
=
Qq
q,nn
xfq,qLqx,qT
ξ
.
2 THEORETICAL RESULTS
SUPPORTING THE
QUALITATIVE ANALYSIS OF
THE BEHAVIOR OF THE
LEARNING SCHEME
Let
(
)
(
)
(
)
f,t
~
,REt
~
R
*
n
*
n
ξ
ξ
= be the expected risk
corresponding to the random decision procedure
*
n
t
~
when
is the true probability distribution on Q and
(
)
Qq,ff
q,n
=
is the set of output density
functions.
Theorem 1. (State, 2002) Let
be a
sequence of measurable functions such that the
assumptions
A
()
0n
n
g
1
, A
2
, A
3
, A
4
hold, where,
(
)
4
A for any , 1k
x,Qq
,
(
)
4
A
(
)
(
)
(
)
xfX,xgE
qkq
=
.
If
(
)
(
)
1n/X,S
nn
=
Λ
is a learning sequence
such that the random elements
()
1n,X,
nn
Λ
are
independent,
n
Λ
is distributed
and is
distributed
if
n
X
q
f
q
n
=
Λ
, then, for the Parzen-like
basis
(
)
0n
n
g
,
(
)
()
f,t
~
Rlim
*
n
n
ξΦ
ξ
=
.
A NEURAL NETWORK FRAMEWORK FOR IMPLEMENTING THE BAYESIAN LEARNING
327
Theorem 2. (State, 2002) Let
()(
1n/X,S
nn
=
)
Λ
be a learning sequence such
that the random elements
()
1n,X,
nn
Λ
are
independent,
n
Λ
is distributed
n
τ
and is
distributed
if
n
X
q
f
q
n
=
Λ
. If for the sequence
, the assumptions A
()
0n
n
g
1
, A
2
, A
3
, A
4
hold and, for
any
,
Qq
() ()
qq
n
1
lim
n
1j
j
n
ττ
=
=
, then,
()()
()
f,f,t
~
,RElim
*
n
n
ξΦτ
=
.
Theorem 3. (State, 2002) Assume that the
conditions mentioned in theorem 2 hold. If, for any
,
Qq
() ()
qqlim
n
n
τ
τ
=
, then,
()()
(
)
fftRE
nn
n
,,
~
,lim
*
ττ
Φ=
.
Theorem 4. (State, 2002) Let
()(
1n/X,S
nn
=
)
Λ
be a learning sequence such
that
(
1n,
n
)
Λ
is a Markov chain of stationary
transition probabilities having an unique recurrent
class
Q’. If
()
are independent and is
distributed
if
1n,X
n
n
X
q
f
q
n
=
Λ
, then
()()
(
)
fftRE
n
n
,,
~
,lim
*
ττ
Φ=
,
where
τ
is the probability distributions of
1
Λ
.
If
(
1n,
n
)
Λ
is a Markov chain then
()
1n,X,
nn
Λ
is a Markov chain of stationary
transition probabilities having an unique recurrent
class
()
{
}
U
'Qq
q
Cx/x,q'R
= , where
()
{
}
0xf,x/xC
qq
=
.
3 NEURAL ARCHITECTURE FOR
IMPLEMENTING THE
BAYESIAN PROCEDURE
*
n
t
~
We assume that . Then the neural
architecture consists of the layers
of d and
respectively
d
R=
HX
F,F
Q
neurons. The neurons of the input
layer have no local memory, they distribute the
corresponding inputs toward the neurons of the
hidden layer
. Each neuron of is assigned to
one of the pattern classes from
Q. For simplicity
sake, we’ll refere to each neuron of
by its
corresponding pattern class. The local memory of
each neuron
consists of
X
F
H
F
H
F
H
F
H
Fq
()
q
n
ξ
and the
parameters needed to compute
. The activation
function of the neuron
q,n
f
H
Fq
at the moment n is
(
)
(
)
(
)
qxfxh
nq,nq,n
ξ
=
. The layer is fully
connected, the connection from
q to
H
F
q
is weighted
by
(
)
(
)
q,qL
. Consequently, the input
(
)
d1
x,...,xx
=
applied to induces the neural
activations,
X
F
(
)
(
)
(
)() ()
H
Qq
qnn
FqxqTxfqqLqqnet =ξ=
,,,0,
,
The recognition task corresponds to the
identification of the states
q
for which
(
x,qT
)
is
minimum. This task is solved by installing a
discrete time competitive process among the neurons
of
. Let
H
F
(
)
(
)
(
)
t,qnetftS
q
=
be the output of the
neuron
H
Fq
at the moment t, where the
competition process starts at the moment 0 and the
activation function
f is given by .
We denote by
()
<
=
0u,u
0u,0
uf
(
)
(
)
(
)
Hq
Fq,tStS
=
the state at the
moment
t. The initial state is
(
)
(
)
(
)
(
)
H
Fq,0,qnetf0S
=
.
The synaptic weights of the connections during
the competition are,
=
=
qq,
qq,1
w
q,q
ε
, where
0>
ε
is a vigilance parameter.
The update of the state is performed
synchronously, that is, for any
,
H
Fq
(
)
(
) ()
()() ()
εε+=
=ε=+
H
Fq
qq
qq
qq
tStS
tStStqnet
1
1,
(
)
(
)
(
)
1t,qnetf1tS
q
+
=
+
.
The conclusions concerning the behavior of the
competition in the space of states stem from the
following arguments. Note that
, for any
and
()
0tS
q
0t
H
Fq
.
1.
If
(
)
0tS
q
=
, then , hence
()
01th
q
+
(
)
01tS
q
=
+
. Moreover, for any .
()
0'tS,t't
q
=
2.
Assume that for some , ,
H
F'q,q 0t
(
)
(
)
0tStS
q'q
<
=
. Then for any , t't
(
)
(
)
0'tS'tS
q'q
=
.
3.
Assume that for some , ,
H
F'q,q 0t
(
)
(
)
0tStS
q'q
<
<
. Then, .
() ()
1tS1tS
q'q
++
Moreover, for any
.
,t't
() ()
'tS'tS
q'q
ICINCO 2004 - SIGNAL PROCESSING, SYSTEMS MODELING AND CONTROL
328
Using some of the previous arguments, we get that
there exists
such that for any
.
()
0'qt
()
0tS
'q
=
()
'qtt
4.
Assume that for ,
H
F'q,q
() (
x,'qTx,qT0 <<
)
,
Then, for any
, 0t
(
)()
0tStS
q'q
hence there
exists
such that for any
. Therefore, the competition installed by the
above mentioned process among the neurons of
determines that the outputs of all neurons q’ that
received values
are inhibited
in a finite number of stages, that is there exists
such that
()
0'qt
()
0tS
'q
=
()
'qtt
H
F
() (
x,qTminx,'qT
H
Fq
>
)
fin
t
(
)
0tS
fin'q
if and only if
.
() (
x,qTminx,'qT
H
Fq
=
)
)
Also, for any
such that
H
F"q,'q
,
()( ) (
x,qTminx,"qTx,'qT
H
Fq
==
(
)
(
)
0tStS
fin"qfin'q
=
and for any
, . 0t
() ()
tStS
"q'q
=
The local memories of the hidden neurons are
determined in a supervised way by adaptive learning
algorithms using a learning sequence
()(
1n/X,S
nn
=
)
Λ
. The recurrent relations for
()
Hnq,n
Fq,1n,q,f
ξ
are derived in terms of the
particular relationships of
. For
instance, if
()()((
y,xg,qu
nn
))
()
()
n
q,
qu
n
1j
j
n
=
=
Λδ
and
() (
y,xy,xg
n
)
δ
= then we get the following
relations. Let
1n1n
q
++
=
Λ
. Then
()
()
()
=
+
+
+
=
+
+
+
1n
n
1nn
1n
qq,
1n
1qnu
qq,qu
1n
n
qu
,
therefore
() () ()
1nn1n
q,q
1n
1
qu
1n
n
qu
++
+
+
+
=
δ
For
we get,
1n
qq
+
()
() ()
() () ( )
1n
F'q
n
n
1n
q'qu'qn
quqn
q
H
+
+
+
=
ξξ
ξ
ξ
.
If
()
0q =
ξ
or , then
()
0qu
n
=
() ()
0qq
1nn
==
+
ξ
ξ
. If
() ()
0qu,0q
n
ξ
, then,
denoting by
()
()
q
1
qP
n
n
ξ
=
, we get
() ()
(
)
() ()
quqn
q
qPqP
n
1n
n1n
ξ
ξ
+
+
+= .
For
1n
qq
+
=
we get
()
(
)()
(
)
() () ( )
1n
F'q
n
1nn1n
1n1n
q'qu'qn
qnu1q
q
H
+
++
++
+
+
=
ξξ
ξ
ξ
,
that is
() ()
()
()
()
11
1
111
1
1
1
++
+
+++
+
+
+
=
nnnn
nn
nnnn
qnuqnu
qnu
qPqP
Since
(
)
(
)
(
)( )( )
() ()
xfqn
Xxqqxfn
qnn
nnnqn
,
111,1
,,1
ξ+
+δΛδ=ξ+
++++
we get, for
1n
qq
+
,
() ()
()()
() ()
+
+
=
+
+
quqn
qq
1xf
1n
n
xf
n
n1n
q,nq,1n
ξ
ξξ
and respectively,
()
(
)()(
()()
)
1n1n
q,n1nn1n
q,1n
q1n
xfqnX,x
xf
1n
1n
++
++
+
+
+
=
+
+
ξ
ξ
δ
.
REFERENCES
Bishop, C., 1996, Neural Networks for Pattern
Recognition; Oxford University Press
Devroye, L., Gyorfi, L., Lugosi, G., 1996.; A Probabilistic
Theory of Pattern Recognition, Springer Verlag
Fukunaga, K., 1990, Introduction to Statistical Pattern
Recognition, Academic Press
Lampinen, J., Vehtari, A, 2001 Bayesian Approach for
Neural Networks: Review and Case Studies In Neural
Networks, Vol. 14
State, L., Cocianu, C., 2001, Information Based
Algorithms in Signal Processing In Proceedings of
SYNASC’2001 (The 3
rd
International Workshop on
Symbolic and Numeric Algorithms for Scientific
Computation), Timişoara, 3-5 octombrie 2001.
State, L., Cocianu, C., Vlamos, P., 2002, Nonparametric
Approach to Learning the Bayesian Procedure for
Hidden Markov Models In Proceedings of SCI2002,
Orlando, USA, July 14-18, 2002
Stewart, W., 1994, Introduction to the Numerical
Solutions of Markov Chains; Princeton University
Press
A NEURAL NETWORK FRAMEWORK FOR IMPLEMENTING THE BAYESIAN LEARNING
329