REFLECTIONS ON NEUROCOMPUTATIONAL RELIABILISM

Marcello Guarini, Joshua Chauvin and Julie Gorman

Department of Philosophy, University of Windsor, 401 Sunset, Windsor, ON, Canada

Keywords: Knowledge, Neural Networks, Propositional Attitudes, Reliabilism, Representation, Representational

Success, State space models, Truth.

Abstract: Reliabilism is a philosophical theory of knowledge that has traditionally focused on propositional

knowledge. Paul Churchland has advocated for a reconceptualization of reliabilism to “liberate it” from

propositional attitudes (such as accepting that p, believing that p, knowing that p, and the like). In the

process, he (a) outlines an alternative for the notion of truth (which he calls “representational success”), (b)

offers a non-standard account of theory, and (c) invokes the preceding ideas to provide an account of

representation and knowledge that emphasizes our skill or capacity for navigating the world. Crucially, he

defines reliabilism (and knowledge) in terms of representational success. This paper discusses these ideas

and raises some concerns. Since Churchland takes a neurocomputational approach, we discuss our training

of neural networks to classify images of faces. We use this work to suggest that the kind of reliability at

work in some knowledge claims is not usefully understood in terms of the aforementioned notion of

representational success.

1 INTRODUCTION

Claims to propositional knowledge have the form, S

knows that p, where p is a proposition. Reliabilism is

a philosophical approach to the theory of

propositional knowledge. Among the necessary

conditions for some agent or subject S knowing

proposition p are that (a) p is true, (b) S believes p,

and (c) p is the outcome of a reliable process or

method. According to Alvin Goldman (1986, 1992,

1999, 2002) reliability is required for both epistemic

justification and knowledge. As we will concern

ourselves primarily with the reliability requirement

in this paper, we shall not engage the issue of what

might constitute sufficient conditions for either

knowledge or justification.

The reliability of a process or method is

understood in terms of a ratio: it is the number of

true beliefs produced by a process or method divided

by the total number of beliefs produced by a process

or method. A process that produces 100 beliefs, only

80 of which are true, is 80 percent reliable. We need

not concern ourselves here over exactly what the

standard of reliability needs to be either for

epistemic justification or for knowledge. What does

need to be noticed is that reliability, traditionally

understood, requires us to look at propositional

attitudes (either a belief that p, or acceptance that p,

or something along these lines) and truth.

It is not uncommon for philosophers to

distinguish between propositional knowledge on the

one hand and capacity knowledge or skill knowledge

on the other. Skill knowledge takes the form S

knows how to x, where x is some sort of behaviour

or action. While it is often contested whether it is

appropriate to say that pre-linguistic children or

animals have propositional knowledge, it is

generally conceded that they have various sorts of

capacity or skill knowledge. A dog may know how

to stay afloat and swim in water without any

propositional knowledge of the physics of these

matters.

Paul Churchland’s “What Happens to

Reliabilism When It Is Liberated from the

Propositional Attitudes?” (chapter six of

Neurophilosophy at Work) is a thought provoking

attempt to take a reliabilist approach to

epistemology, divorce it from propositional

attitudes, and explain how we can have non-

propositional knowledge. Churchland begins by

enumerating many instances of know-how. The

examples include the knowledge possessed both by

humans and non-humans. He argues that much of

what we call knowledge has little or nothing to do

with the fixing of propositional attitudes. There are

many useful and important insights here. He also

711

Guarini M., Chauvin J. and Gorman J..

REFLECTIONS ON NEUROCOMPUTATIONAL RELIABILISM.

DOI: 10.5220/0003293307110716

In Proceedings of the 3rd International Conference on Agents and Artiﬁcial Intelligence (ICAART-2011), pages 711-716

ISBN: 978-989-8425-40-9

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

goes on to argue that a reliabilist epistemology can

be developed that requires neither propositional

attitudes (belief or acceptance) nor truth. This is a

striking claim. After all, the reliabilist understands

knowledge in terms of reliably arrived at true

beliefs. Clearly, this way of doing things requires

talking about both propositional attitudes and truth.

Churchland tries to formulate a reliabilism where

neither truth nor propositional attitudes are required.

In the process, he develops a notion of

representational success and defines reliability in

terms of it. We will argue that at least in some cases

of attributing skill knowledge or know-how, the

notion of representational success is simply not

needed. At best, representational success might play

a role in explaining the source of reliability, but even

that will be shown to be less attractive than it first

appears.

2 RELIABILITY AND

REPRESENTATIONAL SUCCESS

Churchland recognizes the importance of truth in

classical approaches to reliabilism, but he resists

talking of truth since (a) it attaches to propositional

attitudes, and (b) much of our knowledge is not

about fixing propositional attitudes. In place of truth,

Churchland formulates a notion of representational

success that is compatible with analyses of neural

networks. To keep things simple, consider a three

layer feed forward neural network. When it is

trained, it will have a hidden unit activation vector

state space that is multiply partitioned. Each

different pattern of activation across the hidden units

is a different point in that space. We can then

measure the distance between points (which

Churchland often refers to as similarity relations). In

short, this space in question is a kind of similarity

space. Churchland treats (somewhat metaphorically)

similarity spaces as maps that guide our interactions

with the world. Just as a map is representationally

successful when the distance relations in the map

preserve distance relations in the world, conceptual

spaces understood as similarity spaces are

representationally successful when they preserve

various similarity or distance relations in the world.

In the ideal case, representational success would

occur when the relative distance relations between

the learned points in state space correspond to real-

world similarity relations. Since the preservation of

similarity relations requires many points in space

and many relations between them, some kind of

holism is entailed by this position. It cannot be the

case that one representation (or individual vector),

all on its own or in the absence of other

representations (or vectors), can be

representationally successful. Since representational

success is cached in terms of preserving similarity

relations between vectors/representations,

representational success is a notion that attaches to

multiple representations all at once.

On classical accounts of reliabilism, the

reliability required for knowledge is a function of

true beliefs. Churchland’s representational success,

loosely modeled after the representational success of

maps, is his replacement for truth. Churchland

(2007, p. 111) understands conceptual spaces as

similarity spaces, and the reliability requirement for

knowledge amounts to the claim that a conceptual

framework or similarity space be “produced by a

mechanism of vector-fixation that is generally

reliable in producing activation vectors that are

[representationally] successful in the sense just

outlined.”

We just tended to the issue of how Churchland

formulates reliabilism without reference to truth.

Before going further, we need to review how he

conceives of theories. Churchland treats the

information stored in the synaptic weights of a

network as the network’s theory. His criterion for

theory identity has to do with the distance relations

that hold between points in hidden unit activation

vector state space. He wants to allow for the

possibility that different sets of synaptic weights

may implement the same theory. Given two sets of

synaptic weights, S1 and S2, they can be said to

implement the same theory if they lead to a

partitioning of hidden unit activation vector state

space such that the distance relations between

points, in the respective state space they generate,

are preserved. In this way, we can understand what it

means for a theory to change or stay the same in one

network, and what it means for two different

networks to implement the same or different

theories.

What if we had a non-propositional task

performed by a network where (a) we could measure

reliability and (b) that reliability was not understood

in terms of the aforementioned notion of

representational success? This would be a problem

for the type of position Churchland has developed.

In the next section we describe some neural

networks so that in the fourth section, we can

discuss scenarios where representational success is

not needed to discuss reliability.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

712

3 SEX CLASSIFICATION

NETWORK

In this section, we will describe artificial neural

networks (ANNs) that we have trained to classify

images of faces as either male or female. All of the

networks created were three-layer, fully

interconnected feed-forward networks trained by

supervised learning using the generalized delta rule.

To conduct our experiments, images were first

converted into vectors that were capable of being

analyzed by the ANN. The converted vectors

consisted of 5824 dimensions, one for each pixel of

the image, where each image was 64 x 91 pixels.

Each unit (i.e., each pixel) varied from 0 to 255,

which corresponds to the 256 shades of grey in the

images. All networks discussed contained 1 output

unit, 60 hidden units, and 5824 input units. We

experimented using both sigmoid and radial basis

activation functions. Although the results were

comparable, we opted to carry out most of our trials

using sigmoid activation functions. The results in

this paper reflect this preference.

Initially there were 101 images. However, due to

image/file corruption, a number of the images were

either corrupted outright or corrupted during the

vector conversion process. A total of 89 images were

used for training and testing purposes.

For training purposes, the desired output for all

female images was set to 0; the desired output for all

male images was set to 1. For testing, any output

result over 0.5 was interpreted as a male

classification, and any result below 0.5 was

interpreted as a female classification.

The training and testing runs we will discuss

herein are of two types. First, we trained on partial

sets. We randomly selected 44 images, used them

for training, and then we tested on the remaining 45.

We also trained on the 45 image set, and tested on

the remaining 44. Second, we trained the network on

the entire 89 image corpus.

4 A DISCUSSION OF

CHURCHLAND’S POSITION

The first point we want to make is that, given the

way Churchland defines theories and

representational success, it is possible for two

different theories to have equal levels of

representational success. Consider, for example,

networks N1 and N2. N1 was trained on 44 images

and tested on 45; N2 was trained on 45 images and

tested on 44. Each had its weights randomly

selected; each was trained using the same parameter

values, and each achieved essentially the same level

of success in classifying outputs of previously

unseen images (approximately 89%), but there are

important differences in the cluster plots. These

plots pair each face with its closest neighbour in

state space; then averages for each pair are

computed, and each average is paired with its closest

neighbour, and so on. Figure 1 is the cluster plot for

N1, and Figure 2 is the cluster plot for N2. While

there is some overlap between the plots, there are

important differences as well. An examination of the

lower portions of the cluster plots immediately

reveals some significant differences. We have two

networks with equal levels of classificatory success,

but each implements a different theory. This does

not change if we train the network on the entire set

of images. If we randomly select weights for

networks N3 and N4, and train each on the entire

training corpus with perfect classificatory success,

we can still generate different cluster plots (or

theories) for the networks. Assuming this means that

we can say that N1 and N2 have equal levels of

representational success, and N3 and N4 have equal

levels of representational success, then there is a

difference between classical truth as correspondence

and Churchland’s substitute, representational

success. On classical conceptions of theories and

truth, two inconsistent theories cannot both be true.

However, it may well be that two conflicting

theories (in Churchland’s sense of “theory”) can

both be equally representationally successful. There

may well be different ways of measuring similarities

and differences between faces, and different

networks may hone in on different features or

relations, or perhaps on the same features and

relations but weigh them differently, leading to

different similarity spaces (or different theories) that

achieve equally good or even perfect performance.

We offer this as a point of clarification since it might

be something Churchland (2007, p. 132-134) is

happy to concede.

The second point we want to make is that

representational success and reliability appear to

come apart. Remember, Churchland defines

reliability in terms of representational success. With

networks N1 and N2, we achieved 89%

classificatory success on new cases, and with N3 and

N4, we achieved 100% classificatory success on the

total set of images. Notice, we said “success,” not

“reliability.” To talk of reliability in Churchland’s

sense, we would have to be assured that the distance

relations in the state spaces map on to distance

relations in the world, since reliability is defined in

terms of representational success. However, it seems

REFLECTIONS ON NEUROCOMPUTATIONAL RELIABILISM

713

Figure 1: A cluster plot of faces for network N1’s state space. M = male; F = female.

perfectly natural in cases like these to talk about the

reliability of the network even if we have no prior

views on the level of representational success

achieved. To see this, let us consider two scenarios.

First, consider two hypothetical networks, N5 and

N6, and let us say that training leads to poor

classificatory performance. Second, consider

hypothetical networks N7 and N8, and let us say that

training leads to outstanding classificatory

performance. Our intuition is that it is quite

reasonable to say that N5 and N6 have poor

reliability and that N7 and N8 have high levels of

reliability before we learn anything about the

structure of the hidden unit activation vector state

spaces of any of these networks. Before doing any

sort of detailed analysis, we simply do not know

exactly which distance relations in faces the

networks are honing in on during training, and for

purposes of discussing reliability, it just does not

seem to matter. But Churchland’s definition of

reliability in situations like this is about the level of

success with respect to distance relations in state

space mapping on to distance relations in the faces

(i.e., real-world features). If an objector were to

insist that this is not a problem since, in spite of our

not being aware of it, networks having high levels of

classificatory success are constructing similarity

spaces that map on to the world, and those without

classificatory success are not producing such spaces,

then it is not clear how much explanatory work the

notion of representational success is doing for the

notion of reliability. In arguing against a pragmatist

notion of truth, Churchland (2007, p. 103) claims

that he would not want to explain truth in terms of

successful behaviour, and representational success is

his substitute for truth. We are suggesting that the

only evidence we have for success in the networks

we have been considering is successful classificatory

behaviour. (We are not arguing for a pragmatist

theory of truth. Rather, we are suggesting that when

it comes to explaining know-how or attributions of

know-how, a system’s or individual’s behaviour is

very much of the essence.) Whatever the structure of

state spaces generated by N5 and N6 (whether they

are the same or different) we will say that they are

unreliable. Whatever the structure of the state spaces

generated by N7 and N8 (whether the same or

different) we will say that they are reliable. And we

will make our claims based on behaviour. What we

are interested in when discussing a network’s (or an

individual’s) reliability in classifying faces is the

ability to successfully perform. One further piece of

evidence for this is that when we make attributions

of know-how, we are not much interested in how

that know-how is achieved. For example, if we say

that two year old Jasmine knows how to recognize

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

714

Figure 2: A cluster plot of faces for network N2’s state space.

boys and girls by looking at their faces, we are

saying that Jasmine can perform this task very well,

and that we can rely on her to do so. Robert

Brandom (2000, chapter 3) discusses the importance

of the intersubjective nature of knowledge

attribution (though his focus is on propositional

knowledge, whereas ours is on capacity or skill

knowledge). We can say all of this without ever

knowing the structure of her face state space and the

ways in which its distance relations do or do not

map on to the world.

The above arguments assume that equal levels of

classificatory performance mean equal levels of

representational success. Some might challenge this

assumption, but we do not think that doing so leads

to a plausible defence of Churchland’s position.

Consider: if two networks can achieve perfect

classificatory success, and that is still not enough to

say that they are equally representationally

successful, then the notion of representational

success seems puzzlingly irrelevant to defining

reliability since in such cases, surely we would like

to say that the networks in question are equally

reliable; that is, that they know equally well how to

classify faces as male and female.

It might be thought that a neurocomputational

reliabilist would remain content with saying that

representational success explains successful

behaviour, and if there is more than one way to be

representationally successful, then it will turn out

that there is more than one way to explain how the

successful behaviour was arrived at. Perhaps, in the

end, such a response may be made to work, but

much work would have to be done. There are many

logically possible metrics. See Laakos and Cottrell

(2006) for an extended discussion of the importance

of metrics. Churchland appears to assume that the

distances between points in state space are Euclidean

distances. Mahalanobis distance, city block or taxi

cab distance, and other metrics are available. For the

sake of argument, say that by using a Euclidean

metric, a given set of faces is very similar in the

state space for network N9, and by using a

Mahalanobis metric, they are not similar. Is N9

representationally successful or not? Is N9 reliable

or not? We suspect that you probably want to know

how N9 performs in terms of classifying faces

before you answer these questions. When

Churchland remarks that street maps are

representationally successful in virtue of preserving

distance relations in the world, it must be understood

that there is a preferred metric for distance at work.

We have been considering a case (faces in state

space) where it is not obvious that there is a

preferred metric. Without a preferred metric, it is not

clear what talk of preserving similarity relations

amounts to (since such talk is a function of some

metric). In the absence of a preferred metric for

REFLECTIONS ON NEUROCOMPUTATIONAL RELIABILISM

715

similarity relations, performance becomes the

driving consideration since it is not even clear what

representational success (in Churchland’s sense)

amounts to if we do not have a preferred metric.

However, we can still have capacity knowledge in

such cases (for example, the male-female

discrimination task).

4.1 Of Maps and State Spaces

Finally, let us close with some reflections on the

map metaphor, which appears to inspire many of

Churchland’s thoughts on these matters. The idea is

that a street map is representationally successful

because it preserves distance relations that are in the

world. Part of what makes this metaphor attractive is

that such a map would cease to be a reliable guide if

it did not at least roughly track distance relations in

the world. Imagine that the map says your desired

exit is 10km away and, in fact, it is only 0.1km away

– that is an exit you will likely miss. In a case like

this, tracking a specific set of distance/similarity

relations in the world is the key to success.

However, the point does not generalize to all state

spaces set up by neural networks being seen as high

dimensional maps that preserve distance/similarity

relations in the world. The burden of the discussion

section has been to show that we can have high

levels of success (in face classification) with

differing similarity or distance relations. When that

happens, we can still talk of how reliably a system

performs some task, but the notion of

representational success (as Churchland defines it)

does not play a role in defining that reliability.

In the case of the street map, there really is a

kind of plausibility in saying that what explains the

reliability of the map is that it preserves certain

distance relations in the world. Two points need to

be made about this. First, we need to understand that

there is a difference between these two things: (a)

being reliable and (b) explaining the source of that

reliability. We have seen that we can understand

what it is for a system (a face classifying neural

network) to be reliable independent of understanding

the source of that reliability. Churchland uses the

notion of representational success (or preservation of

distance relations) both to define reliability and to

understand its source (i.e. to do both (a) and (b)). As

we have seen, we can say that a system is reliable

without having any information about the

preservation of distance relations. Second, we need

to be careful not to overstate the explanatory work

the preservation of a set of distance relations does

when there are many possible sets of such relations.

If S1, S2, … Sn are all different sets of similarity

relations that lead to equal levels of reliability in

classifying faces, then it cannot be said that the

network is successful because it persevered the

similarity or distance relations in the world. The

most that can be said in explaining the source of

reliability is that the network is reliable because it

captured or preserved one of S1 through Sn. We do

not want to suggest that such a claim would be

vacuous. It is not. However, it is not nearly as

powerful or attractive as the case where there

appears to be a single set of similarity relations in

virtue of which reliability is achieved. The street

map metaphor is suggestive of such a powerful case;

there is no reason to expect that sort of case to

capture what is going on in all cases of classificatory

reliability in neural networks.

The street map example may be a special case. It

turns out to be (optimally) reliable or something we

can rely on if, and only if, a specific set of distance

relations from the world are preserved by the map.

We have not been given a reason for thinking that

such will generally be the case when the high

dimensional similarity spaces set up by neural

networks are compared to the world. Churchland’s

position is at its strongest when dealing with cases

like the street map. We take ourselves to have shown

that such an example does not always generalize.

The extent to which it might generalize is a question

for future work.

ACKNOWLEDGEMENTS

We thank the Shared Hierarchical Academic

Research Computing Network (SHARCNet) for

financial support.

REFERENCES

Brandom, R. (2000). Articulating Reasons. Cambridge,

MA: Harvard University Press.

Churchland, P. M. (2007). Neurophilosophy at Work.

Cambridge, UK: Cambridge University Press.

Goldman, A. (1986). Epistemology and Cognition.

Cambridge, MA: Harvard University Press.

Goldman, A. (1999). Knowledge in a Social World. Oxford:

Oxford University Press.

Goldman, A. (1992). Liaisons: Philosophy Meets the

Cognitive and Social Sciences. Cambridge, MA: MIT

Press.

Goldman, A. (2002). Pathways to Knowledge, Private and

Public. Oxford: Oxford University Press.

Laakso, A. and Cottrell, G. (2006). Churchland on

Connectionism. In Keeley, B.L. (Ed), Paul Churchland.

Cambridge, UK: Cambridge University Press.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

716