neurons, constitutive units in an ANN, are mathemat-
ical function observed as a rudimentary model, or ab-
straction of biological neurons. Mathematically, let
there be n +1 inputs with signals x
0
to x
n
and weights
w
0
to w
n
, respectively. Usually, the x
0
input is as-
signed the value +1, which makes it a bias input with
w
0
= b. This leaves only n actual inputs to the neuron:
from x
1
to x
n
. The output of such neuron is (where ϕ
is the activation function):
y = ϕ
n
∑
j=0
w
j
x
j
!
(1)
In the “learning” phase of a neural network, we try
to find best approximations of the different weights
w
0
,w
1
,...,w
n
. This is done by minimizing a cost
function which gives a measure of the distance be-
tween a particular solution and the optimal solution
that we try to achieve. Numerous algorithms are
available for training neural network models (Bishop,
2005); most of them can be viewed as a straightfor-
ward application of optimization theory and statistical
estimation. We have implemented one of the more
popular learning algorithms called Backpropagation
algorithm. It is supervised learning in an iterative
way, where the error produced in each iteration is used
to improve the weights corresponding to each input
variable and thus forcing the output value to converge
to the known value.
4
In order to use neural networks for our approach,
we first require that the sentences are in some mathe-
matical model so that we can use them as input in our
network. For that purpose, we introduce Word Space
Modelling, which is a spatial representation of word
meaning, through Random Indexing (RI) (Chatterjee
and Mohan, 2007). RI transforms every sentence into
a vector location in word space and NN then uses that
vector as input for computational purposes.
3 WORD SPACE MODEL
The Word-Space Model (Sahlgren, 2006) is a spatial
representation of word meaning. It associates a vec-
tor with each word defining its meaning. However,
the Word Space Model is based entirely on language
data available. When meanings change, disappear or
appear in the data at hand, the model changes ac-
cordingly. The primary problem with this represen-
tation is that we have no control over the dimension
of the vectors. Consequently, use of such a represen-
tation scheme in NN-based model lacks appropriate-
4
See (Rojas, 1996) and (Bishop, 2005) for details of
Backpropagation algorithm.
ness. We use a Random Indexing based representa-
tion scheme to deal with this problem.
3.1 Random Indexing Technique
The Random Indexing was developed to tackle the
problem of high dimensionality in Word Space model.
It removes the need for the huge co-occurrence ma-
trix by incrementally accumulating context vectors,
which can then, if needed, be assembled into a co-
occurrence matrix (Kanerva, 1988).
In Random Indexing each word in the text is as-
signed a unique and randomly generated vector called
the index vector. All the index vectors are of the
same predefined dimension R, where R is typically
a large number, but much smaller than n, the number
of words in the document. The index vectors are gen-
erally sparse and ternary i.e. they are made of three
values chosen from {0, 1, −1}, and most of the values
are 0. When the entire data has been processed, the
R-dimensional context vectors are effectively the sum
of the words’ contexts. For illustration we can take
the example of the sentence
A beautiful saying, a person is beautiful when
he thinks beautiful.
Let, for illustration, the dimension R of the index
vector be 10. The context is defined as one preceding
and one succeeding word. Let ‘person’ be assigned
a random index vector: [0,0,0,1,0,0,0,0,−1,0]
and ‘beautiful’ be assigned a random index vector:
[0,1,0,0,−1,0,0,0,0,0]. Then to compute the con-
text vector of ‘is’ we need to sum up the index vec-
tor of its context which is, [0, 1, 0, 1, −1, 0, 0, 0, −1, 0].
The space spanned by the context vectors can be rep-
resented by a matrix of order W ×R, where i
th
row is
the context vector of i
th
distinct word.
If a co-occurrence matrix has to be constructed,
R-dimensional context vectors can be collected into
a matrix of order W ×R, where W is the number of
unique word types, and R is the chosen dimensionality
for each word. Note that this is similar to construct-
ing an n-dimensional unary context vector which has
a single 1 in different positions for different words
and n is the number of distinct words. Mathemati-
cally, these n-dimensional unary vectors are orthog-
onal, whereas the R-dimensional random index vec-
tors are nearly orthogonal. However, most often this
does not stand on the way of effective computation.
On the contrary, this small compromise gives us huge
computational advantage as explained below. There
are many more nearly orthogonal than truly orthogo-
nal directions in a high-dimensional space (Sahlgren,
2005). Choosing Random Indexing is an advanta-
geous trade-off between the number of dimensions
KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development
172