On the Capability of Neural Networks to Approximate

the Neyman-Pearson Detector. A Theoretical Study

P. Jarabo-Amores

, R. Gil-Pita, M. Rosa-Zurera, and F. L

opez-Ferreras

Departamento de Teor

ıa de la Se

nal y Comunicaciones,

Escuela Polit

ecnica Superior, Universidad de Alcal

Ctra. Madrid-Barcelona, km. 33.600, 28805, Alcal

a de Henares - Madrid (SPAIN).

Abstract. In this paper, the application of neural networks for approximating

the Neyman-Pearson detector is considered. We propose a strategy to identify

the training parameters that can be controlled for reducing the effect of approx-

imation errors over the performance of the neural network based detector. The

function approximated by a neural network trained using the mean squared-error

criterion is deduced, without imposing any restriction on the prior probabilities

of the clases and on the desired outputs selected for training, proving that these

parameters play an important role in controlling the sensibility of the neural net-

work detector performance to approximation errors. Another important parameter

is the signal-to-noise ratio selected for training. The proposed strategy allows to

determine its best value, when the statistical properties of the feature vectors are

known. As an example, the detection of gaussian signals in gaussian interference

is considered.

1 Introduction

The objective of this paper is to study the capability of neural networks to approximate

a Neyman-Pearson detector. This detector maximices the probability of detection (P

while maintaining the probability of false alarm (P

F A

) lower than or equal to a spec-

iﬁed value. The characteristics of such a detector are reﬂected in its ROC (Receiver

Operating Characteristic) curve, that relates P

to P

F A

[1].

Ruck et al. [2], and Wan [3], demonstrated that a neural network can be used to

approximate the optimum bayessian classiﬁer when trained using the mean squared-

error criterion.

In previous works, neural networks have been proposed for approximating the

Neyman-Pearson detector in different environments [4][5]. These works highlighted the

strong dependence of the neural network-based detector performance on the signal-to-

noise ratio selected for training (TSNR). They also observed that the difference between

the neural detector performance and the Neyman-Pearson detector one, depends on the

desired P

F A

, and so, on the corresponding detection threshold.

Recently, some attempts to reduce the dependence of the neural detector perfor-

mance on TSNR have been carried out [6], based on the use of a complex pre-processing

This work has been supported by the “Consejer

ıa de Educaci

on de la Comunidad de Madrid”

(SPAIN), under Project 07T/0036/2003 1

Jarabo-Amores P., Gil-Pita R., Rosa-Zurera M. and López-Ferreras F. (2004).

On the Capability of Neural Networks to Approximate the Neyman-Pearson Detector - A Theoretical Study.

In Proceedings of the First International Workshop on Artiﬁcial Neural Networks: Data Preparation Techniques and Application Development, pages

67-74

DOI: 10.5220/0001150100670074

 SciTePress

stage that reduce this dependence at the expense of a high computational cost. Never-

theless, no effort has been made in order to explain the reasons of such a dependence,

whose knowledge could help to easily select the best TSNR and training strategy for

designing neural network-based detectors which approximate the Neyman-Pearson de-

tector.

This paper deals with the theoretical explanation of the effects of the approxima-

tion errors in the performance of neural network based detector for approximating the

Neyman-Pearson detector.

2 Problem Formulation

The performance of a detector that approximates the Neyman-Pearson detector must be

evaluated from the difference between its ROC curve and the Neyman-Pearson detector

one. For a given P

F A

, the difference between the probabilities of detection must be as

lower as possible. The decrease in P

that is observed in the ROC curve for a given

F A

, is expressed in (1):

∂P

F A

∂P

F A

∂η

4η (1)

Practical P

F A

values are below 10

−6

, while practical P

values can be higher than

0.8, so, in practical conditions, ROC curves have high positive slopes for low values

of P

F A

, and low positive slopes for high values of P

F A

. Besides, the function that

relates P

F A

to detection threshold, η, has a negative slope. Taking into consideration

this characteristic, and the fact that the ROC curve of the Neyman-Pearson detector is a

characteristic that cannot be modiﬁed, the following conclusions can be extracted:

– In the low P

F A

region,

∂P

F A

is usually very high. For the reduction of P

being

low, the magnitude of

∂P

F A

∂η

4η must be very low.

– In the high P

F A

region,

∂P

F A

is usually very low. So a low reduction in P

can be

guaranteed although the magnitude of

∂P

F A

∂η

4η is big.

The Neyman-Pearson detector decision rule is the result of comparing the likeli-

hood ratio, or any other equivalent statistic, to the detection threshold, η. For a desired

F A

, this threshold depends on the expression of the selected statistic. In order to study

the P

decrease due to approximation errors, expression (1) must be calculated. Taking

this study as starting point, it is possible to identify the design parameters that minimize

expression (1), with independence of the mean squared-error minimization strategy se-

lected.

3 Expression of the approximated discriminant function

D. W. Ruck et al. [2] demonstrated that a multilayer perceptron (MLP) converges to a

mean squared-error approximation of the Bayes optimal discriminant function, when

trained using the mean squared-error criterion. They study the two-class and multiclass

problems, and extended this result to any mean squared-error minimization technique.

For binary detection, they studied a MLP with only one neuron in the output layer.

The network was trained to produce 1 when the feature vector was from class H

and

−1 when the vector is from class H

. They proved that the neural network output ap-

proximates the Bayes optimal discriminant function g

(z), given in (2) where z is the

feature vector, and P (H

|z) and P(H

|z) are the a posteriori probability of the classes.

(z) = P (H

|z) − P (H

|z) (2)

The mean squared-error between the network output, F (z, W), for a given set of

weights, W, and the desired outputs, is given by (3). E

(W) is the sample mean error

calculated for a set of n pre-classiﬁed feature vectors, and Z

and Z

are the sets of all

possible feature vectors for class H

and H

, respectively (Z

∪ Z

= Z, Z

∩ Z

= ∅,

Z being the input space).

(W) = lim

n→∞

(W)

= lim

n→∞

[

z∈Z

(F (z, W) − 1)

z∈Z

(F (z, W) + 1)

] (3)

Using the Strong Law of Large Numbers, expression (3) can be expressed as (4).

Finally, applying the Bayes Formula and rearranging terms, (4) converts into (5).

(W) = P (H

)

(F (z, W)−1)

f(z|H

)dz+P (H

)

(F (z, W)+1)

f(z|H

)dz (4)

(W) =

(F (z, W) − g

(z))

f(z)dz +

1 −

(z)f(z)dz

(5)

If the training set represents a reasonable approximation to the input space, although

the network is trained for minimizing E

(W), E

(W) will be minimized. Since the

tern in braces in expression (5) is independent of W, minimizing E

(W) is equivalent

to minimizing (6). So the network output is an approximation of the Bayes optimal

discriminant function in the mean squared-error sense.

E(W) =

(F (z, W) − g

(z))

f(z)dz (6)

In a more general problem, if the network is trained to produce t

when the feature

vector is from class H

and t

when the feature vector is from class H

, expression

(4) converts into (7):

(W) = P (H

)

(F (z, W)−t

)

f(z|H

)dz+P (H

)

(F (z, W)−t

)

f(z|H

)dz

(7)

In this case, the network output is a mean squared-error approximation of the func-

tion f

(z) deﬁned in (8).

(z) =

P (H

)f(z|H

+ P (H

)f(z|H

P (H

)f(z|H

) + P (H

)f(z|H

)

(8)

If η

net

is the detection threshold for a given P

F A

, the decision rule approximated

with the neural network is given by (9).

P (H

)f(z|H

+ P (H

)f(z|H

P (H

)f(z|H

) + P (H

)f(z|H

)

≷

net

(9)

(z) is equal to g

(z) for t

= 1 and t

= −1, and for minimizing the proba-

bility of miss-classiﬁcation, η

net

must be set to 0. But what Ruck et al. [2], and Wan [3]

did not notice is the fact that as f

(z) can be expressed as a function of the likelihood

ratio, the network not only is approximating the minimum probability of error classi-

ﬁer, but it can approximate the Neyman-Pearson detector if the detection threshold is

modiﬁed attending to probability of false alarm requirements.

The rule (9) shows that for implementing the Neyman-Pearson detector, the detec-

tion threshold for a given P

F A

is not only a function of the likelihood functions, but also

depends on the a priori probabilities and the desired outputs. These are parameters that

can be selected by the designer when generating the training set, and when determining

the activation function of the output neuron.

4 Effect of approximation errors on P

F A

The neural network will converge to an approximation of (8), so the decision rule im-

plemented can be expressed as in (10), where ∆f

(z, W) is the approximation error.

(z) + ∆f

(z, W)

≷

net

(10)

This decision rule can also be expressed with (11) revealing that the effect of ap-

proximation errors can be studied as the effect of erroneous detection thresholds.

(z)

≷

net

− ∆f

(z, W) (11)

In order to evaluate the decrease in P

due to threshold errors, the partial derivative

of P

F A

with respect to the detection threshold must be calculated. This calculus requires

the knowledge of the likelihood functions, and can be very tedious due to the complexity

of rule (9).

In practice, when designing a Neyman-Pearson detector, the ﬁrst step consists in

determining the likelihood ratio for the problem to be solved, as indicates expression

(12). Before determining the detection threshold for a given P

F A

, a simpler sufﬁcient

statistic is calculated applying a set of simpliﬁcations and using monotonic functions.

∧(z) =

f(z|H

)

f(z|H

)

≷

(12)

Following a similar strategy, in a ﬁrst step, the rule (9) can be expressed as a function

of the likelihood ratio, as indicates expression (13).

P (H

) ∧ (z)t

+ P (H

P (H

) ∧ (z) + P (H

)

≷

net

(13)

The relation between η

and η

net

is given in (14).

P (H

)(η

net

− t

)

P (H

)(t

− η

net

)

(14)

The partial derivative of P

F A

with respect to η

net

can be calculated using the chain

rule (15).

∂P

F A

∂η

net

∂P

F A

∂η

net

(15)

The second factor of the right part of (15) can be calculated from (14) to obtain (16).

It depends on the a priori probabilities of the classes and the desired outputs selected

for training, factors that can be controlled by the designer.

∂η

net

= −

P (H

)(t

− t

)

P (H

)(t

− η

net

)

(16)

From the analysis of expression (16) the following conclusions can be extracted:

– The function approximated by the neural network has been expressed as a function

of the likelihood ratio. When this ratio is greater than or equal to η

, we decide that

hypothesis H

is true, and if it is lower than η

, we decide in favor of hypothesis

. So, t

must be greater than t

– η

net

takes values between t

and t

, so η

net

≤ t

– From the previous two points, we can conclude that the partial derivative of η

with respect to η

net

is always positive.

– For η

net

values closed to t

, that is, for very low P

F A

values,

∂η

net

is very high

(for η

net

= t

tends to inﬁnity). We can try to compensate it in some degree by

increasing the difference between the desired outputs during training, or construct-

ing training sets where features vectors from hypothesis H

are more likely than

those from hypothesis H

The ﬁrst factor of the second part of (15) depends on the likelihood functions of the

problem to be solved. Its value is calculated in (17).

∂P

F A

∂η

∂

∂η

1 −

−∞

f(∧(z)|H

)d(∧(z))

= −f(∧(z)|H

∧(z|H

)=η

(17)

To gain an insight into the inﬂuence of

∂P

F A

∂η

, we follow the strategy of looking for

a simpler test statistic. If we denote this new statistic as z(z) and the corresponding

detection threshold as η

, the decision rule can be expressed as in (18).

z(z)

≷

(18)

The relation between η

y η

is determined by the relation that exist between the

likelihood ratio and the selected statistic, so it is known. Expression (15) can be re-

written as a function of η

∂P

F A

∂η

net

∂P

F A

∂η

net

(19)

Expression (19) shows that the partial derivative of P

F A

with respecto to η

net

can

be expressed as the product of three factors:

– The ﬁrst factor,

∂P

F A

∂η

, is a characteristic of the problem to be solved.

– The second factor,

∂η

, also is characteristic of the problem to be solved.

– The third factor,

∂η

net

, not only depends on the a priori probabilities of the classes

and the desired outputs selected from training, because the value of η

net

required

for a given P

F A

depends on the problem to be solved.

The usefulness of adding a new factor in (15) only can be proved if a particular case

is considered. In the next section, the problem of detecting gaussian signals in gaussian

interference in considered.

5 A case study: Detection of gaussian signals in gaussian

interference

The problem of detecting gaussian signals in gaussian interference is considered. The

feature vector is composed by n independent gaussian samples of zero mean and unity

variance under hypothesis H

, and zero mean and a variance σ

+ 1 under hypothesis

. The signal-to-noise ratio is deﬁned in (20) and the value selected for constructing

the training set is denoted as tsnr.

SN R = 10log(snr) = 10log(σ

) (20)

For a given tsnr, the likelihood functions are expressed in (21) and (22); the likeli-

hood ratio and the corresponding detection rule are given by (23).

f(z/H

) =

(2π)

exp

−

(21)

f(z/H

) =

(2π)

(tsnr + 1)

exp

−

2(tsnr + 1)

(22)

∧(z) =

(tsnr + 1)

exp

tsnr

2(tsnr + 1)

≷

(23)

A simpler sufﬁcient statistic can be obtained applying logarithms and re-arranging

terms:

z(z) = z

≷

tsnr + 1

tsnr

ln[η

(tsnr + 1)

] = η

(24)

As the likelihood function under hypothesis H

does not depend on tsnr, the prob-

ability density function of z(z) does not depend on it, and for a given P

F A

, η

is inde-

pendent on tsnr. Because of that, the performance of the Neyman-Pearson detector is

independent on the tsnr value.

Although the ROC curves and

∂P

F A

do not depend on tsnr,

∂P

F A

∂η

net

and the sensi-

bility of the neural detector to approximation errors depends on it.

The partial derivative of η

with respect to η

is given in (25).

∂η

2(1 + tsnr)

tsnr

(25)

z(z|H

) is a chi-square random variable with n degrees of freedom. The partial

derivative of P

F A

with respect to η

is calculated in (26).

∂P

F A

∂η

= −

(

− 1)!

(

− 1) exp(

−η

) (26)

Combining expressions (16), (25) and (26) the partial derivative of P

F A

with respect

to η

net

has been calculated in (27).

∂P

F A

∂η

net

= −

(

− 1)!

(

−1)

exp(

−η

)

2(1 + tsnr)

tsnr

P (H

)(t

− t

)

P (H

)(t

− η

net

)

(27)

6 Conclusions

In this paper, the application of neural networks for approximating the Neyman-Pearson

detector is considered. We propose the calculus of the partial derivative of the probabil-

ity of false alarm with respect to the detection threshold, as a tool to identify the training

parameters that can be controlled for reducing the effect of approximation errors over

the performance of the neural network based detector.

As a ﬁrst step, the function approximated for a neural network trained using the

mean squared-error criterion is deduced. Without imposing any restriction on the prior

probabilities of the classes and on the desired outputs selected for training, we obtain

a general expression that reveals that these parameters play an important role in con-

trolling the sensibility of the neural network detector performance to approximation

errors.

In previous works, the signal-to-noise ratio selected for training appeared as a crit-

ical design parameter, but no effort has been done so as to explain the dependence of

the neural network based detector on this parameter. In this paper, we explain this de-

pendence and provide an strategy to determine the best tsnr value when the statistical

properties of the feature vectors are known.

References

1. Van Trees, H.L.: Detection, estimation, and modulation theory, Vol. 1. Wiley, (1968)

2. Ruck, D.W., Rogers, S.K., Kabrisky, M., Oxley, M.E., Suter, B.W.: The multilayer perceptron

as an aproximation to a Bayes optimal discriminant function. IEEE Transactions on Neural

Networks, vol. 1, no. 4, 296–298, December 1990.

3. Wan, E.A.: Neural Network Classiﬁcation: A Bayesian Interpretation. IEEE Transactions on

Neural Networks, vol.1, no. 4, pp. 303-305, December 1990.

4. Gandhi, P.P., Ramamurti, V.: Neural networks for signal detection in non-gaussian noise. IEEE

Transactions on signal processing, vol. 45, no. 11, pp. 2846-2851, November 1997.

5. Andina, D., Sanz-Gonz=E1lez, J.L.: Comparison of a neural network detector vs. Neyman-

Pearson optimal detector. Proceedings of the 1996 IEEE International Conference on Acous-

tics, Speech and Signal Processing, vol. 6, pp. 3573-3576, 1995.

6. Jarabo-Amores, P., Rosa-Zurera, M., L

opez Ferreras, F.: Design of a Pre-processing Stage for

Avoiding the Dependence on TSNR of a Neural Radar Detector. Lecture Notes in Computer

Sciences, Vol. 2085. Springer-Verlag, Berlin Heidelberg New York (2001) 652–659,