3 SEX CLASSIFICATION
NETWORK
In this section, we will describe artificial neural
networks (ANNs) that we have trained to classify
images of faces as either male or female. All of the
networks created were three-layer, fully
interconnected feed-forward networks trained by
supervised learning using the generalized delta rule.
To conduct our experiments, images were first
converted into vectors that were capable of being
analyzed by the ANN. The converted vectors
consisted of 5824 dimensions, one for each pixel of
the image, where each image was 64 x 91 pixels.
Each unit (i.e., each pixel) varied from 0 to 255,
which corresponds to the 256 shades of grey in the
images. All networks discussed contained 1 output
unit, 60 hidden units, and 5824 input units. We
experimented using both sigmoid and radial basis
activation functions. Although the results were
comparable, we opted to carry out most of our trials
using sigmoid activation functions. The results in
this paper reflect this preference.
Initially there were 101 images. However, due to
image/file corruption, a number of the images were
either corrupted outright or corrupted during the
vector conversion process. A total of 89 images were
used for training and testing purposes.
For training purposes, the desired output for all
female images was set to 0; the desired output for all
male images was set to 1. For testing, any output
result over 0.5 was interpreted as a male
classification, and any result below 0.5 was
interpreted as a female classification.
The training and testing runs we will discuss
herein are of two types. First, we trained on partial
sets. We randomly selected 44 images, used them
for training, and then we tested on the remaining 45.
We also trained on the 45 image set, and tested on
the remaining 44. Second, we trained the network on
the entire 89 image corpus.
4 A DISCUSSION OF
CHURCHLAND’S POSITION
The first point we want to make is that, given the
way Churchland defines theories and
representational success, it is possible for two
different theories to have equal levels of
representational success. Consider, for example,
networks N1 and N2. N1 was trained on 44 images
and tested on 45; N2 was trained on 45 images and
tested on 44. Each had its weights randomly
selected; each was trained using the same parameter
values, and each achieved essentially the same level
of success in classifying outputs of previously
unseen images (approximately 89%), but there are
important differences in the cluster plots. These
plots pair each face with its closest neighbour in
state space; then averages for each pair are
computed, and each average is paired with its closest
neighbour, and so on. Figure 1 is the cluster plot for
N1, and Figure 2 is the cluster plot for N2. While
there is some overlap between the plots, there are
important differences as well. An examination of the
lower portions of the cluster plots immediately
reveals some significant differences. We have two
networks with equal levels of classificatory success,
but each implements a different theory. This does
not change if we train the network on the entire set
of images. If we randomly select weights for
networks N3 and N4, and train each on the entire
training corpus with perfect classificatory success,
we can still generate different cluster plots (or
theories) for the networks. Assuming this means that
we can say that N1 and N2 have equal levels of
representational success, and N3 and N4 have equal
levels of representational success, then there is a
difference between classical truth as correspondence
and Churchland’s substitute, representational
success. On classical conceptions of theories and
truth, two inconsistent theories cannot both be true.
However, it may well be that two conflicting
theories (in Churchland’s sense of “theory”) can
both be equally representationally successful. There
may well be different ways of measuring similarities
and differences between faces, and different
networks may hone in on different features or
relations, or perhaps on the same features and
relations but weigh them differently, leading to
different similarity spaces (or different theories) that
achieve equally good or even perfect performance.
We offer this as a point of clarification since it might
be something Churchland (2007, p. 132-134) is
happy to concede.
The second point we want to make is that
representational success and reliability appear to
come apart. Remember, Churchland defines
reliability in terms of representational success. With
networks N1 and N2, we achieved 89%
classificatory success on new cases, and with N3 and
N4, we achieved 100% classificatory success on the
total set of images. Notice, we said “success,” not
“reliability.” To talk of reliability in Churchland’s
sense, we would have to be assured that the distance
relations in the state spaces map on to distance
relations in the world, since reliability is defined in
terms of representational success. However, it seems
REFLECTIONS ON NEUROCOMPUTATIONAL RELIABILISM
713