SINGLE DOCUMENT TEXT SUMMARIZATION USING RANDOM

INDEXING AND NEURAL NETWORKS

Niladri Chatterjee and Avikant Bhardwaj

Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, India

Keywords:

Text summarization, Neural networks, Random indexing, Word space modelling.

Abstract:

This paper presents a new extraction-based summarization technique developed using neural networks and

Random Indexing. The technique exploits the advantages that a neural network provides in terms of compati-

bility and adaptability of a system as per the user. A neural network is made to learn the important properties

of sentences that should be included in the summary through training. The trained neural network is then used

as a sieve to ﬁlter out the sentences relevant for corresponding summary. Neural network along with Random

Indexing extracts the semantic similarity between sentences in order to remove redundancy from the text to

great success. One major advantage of the proposed scheme is that it takes care of human subjectivity as well.

1 INTRODUCTION

Automatic text summarization has become an impor-

tant tool for interpreting text information, due to the

extension of Internet and the abundance of knowledge

in textual form available on the World Wide Web.

However, it provides more information than is usually

needed. Hence, extracting a large quantity of relevant

information has put it in focus for researchers in NLP

(Kaikhah, 2004).

The prime focus of the present work is to gener-

ate an extractive summary of a single document using

neural networks. The basic difﬁculty seen in sum-

marization is the subjectivity of generated summary

in view of different users, and almost all the current

techniques tend to ignore to incorporate this feature.

However, our intuition is that using Neural Networks

(NN) one can generate summaries much closer to hu-

man extracted summaries as they are trained on al-

ready available standard human summaries to pro-

duce a better result. In this work we have combined

NN and Random Indexing (Sahlgren, 2005) which

have provided signiﬁcant improvement over the com-

mercially available summarizing tools.

Traditional extractive summarization techniques

are typically based on simple heuristic features of the

sentences. Though there has been a considerable and

thorough research work on graph based or other im-

plementations of text summarizer, there has not been

much work using Artiﬁcial Intelligence techniques in

general, and neural networks, in particular, except for

some heuristic approaches

. Researchers have tried

to integrate machine learning techniques into sum-

marization with various features, such as sentence

length cut-off, ﬁxed-phrase, thematic word, and many

more (Kupiec et al., 1995). Apart from those, some

commercially available extractive summarizers like

Copernic

and Microsoft Ofﬁce Word summarizer

use certain statistical algorithms to create a list of im-

portant concepts and hence generate a summary.

This paper is organized as follows. Section 2 de-

scribes neural networks along with back propagation

algorithm. Section 3 explains the technique of Ran-

dom Indexing. Sections 4 and 5 discuss the experi-

mental set ups and the results obtained, respectively.

In Section 6 we conclude the paper.

2 NEURAL NETWORKS

An Artiﬁcial Neural Network (ANN) is a mathemati-

cal model or computational model that tries to simu-

late the structure and/or functional aspects of biolog-

ical neural networks. Neural networks are non-linear

statistical data modeling tools & are used to model

complex relationships between inputs and outputs and

to ﬁnd patterns in data (Rojas, 1996). The artiﬁcial

See (Luhn, 1958; Edmundson, 1969; Mani and Bloe-

dorn, 1999; Kaikhah, 2004)

www.copernic.com/en/products/summarizer/

www.microsoft.com/education/autosummarize.mspx/

171

Chatterjee N. and Bhardwaj A..

SINGLE DOCUMENT TEXT SUMMARIZATION USING RANDOM INDEXING AND NEURAL NETWORKS.

DOI: 10.5220/0003066601710176

In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD-2010), pages 171-176

ISBN: 978-989-8425-29-4

 2010 SCITEPRESS (Science and Technology Publications, Lda.)

neurons, constitutive units in an ANN, are mathemat-

ical function observed as a rudimentary model, or ab-

straction of biological neurons. Mathematically, let

there be n +1 inputs with signals x

to x

and weights

to w

, respectively. Usually, the x

input is as-

signed the value +1, which makes it a bias input with

= b. This leaves only n actual inputs to the neuron:

from x

to x

. The output of such neuron is (where ϕ

is the activation function):

y = ϕ

∑

j=0

(1)

In the “learning” phase of a neural network, we try

to ﬁnd best approximations of the different weights

,...,w

. This is done by minimizing a cost

function which gives a measure of the distance be-

tween a particular solution and the optimal solution

that we try to achieve. Numerous algorithms are

available for training neural network models (Bishop,

2005); most of them can be viewed as a straightfor-

ward application of optimization theory and statistical

estimation. We have implemented one of the more

popular learning algorithms called Backpropagation

algorithm. It is supervised learning in an iterative

way, where the error produced in each iteration is used

to improve the weights corresponding to each input

variable and thus forcing the output value to converge

to the known value.

In order to use neural networks for our approach,

we ﬁrst require that the sentences are in some mathe-

matical model so that we can use them as input in our

network. For that purpose, we introduce Word Space

Modelling, which is a spatial representation of word

meaning, through Random Indexing (RI) (Chatterjee

and Mohan, 2007). RI transforms every sentence into

a vector location in word space and NN then uses that

vector as input for computational purposes.

3 WORD SPACE MODEL

The Word-Space Model (Sahlgren, 2006) is a spatial

representation of word meaning. It associates a vec-

tor with each word deﬁning its meaning. However,

the Word Space Model is based entirely on language

data available. When meanings change, disappear or

appear in the data at hand, the model changes ac-

cordingly. The primary problem with this represen-

tation is that we have no control over the dimension

of the vectors. Consequently, use of such a represen-

tation scheme in NN-based model lacks appropriate-

See (Rojas, 1996) and (Bishop, 2005) for details of

Backpropagation algorithm.

ness. We use a Random Indexing based representa-

tion scheme to deal with this problem.

3.1 Random Indexing Technique

The Random Indexing was developed to tackle the

problem of high dimensionality in Word Space model.

It removes the need for the huge co-occurrence ma-

trix by incrementally accumulating context vectors,

which can then, if needed, be assembled into a co-

occurrence matrix (Kanerva, 1988).

In Random Indexing each word in the text is as-

signed a unique and randomly generated vector called

the index vector. All the index vectors are of the

same predeﬁned dimension R, where R is typically

a large number, but much smaller than n, the number

of words in the document. The index vectors are gen-

erally sparse and ternary i.e. they are made of three

values chosen from {0, 1, −1}, and most of the values

are 0. When the entire data has been processed, the

R-dimensional context vectors are effectively the sum

of the words’ contexts. For illustration we can take

the example of the sentence

A beautiful saying, a person is beautiful when

he thinks beautiful.

Let, for illustration, the dimension R of the index

vector be 10. The context is deﬁned as one preceding

and one succeeding word. Let ‘person’ be assigned

a random index vector: [0,0,0,1,0,0,0,0,−1,0]

and ‘beautiful’ be assigned a random index vector:

[0,1,0,0,−1,0,0,0,0,0]. Then to compute the con-

text vector of ‘is’ we need to sum up the index vec-

tor of its context which is, [0, 1, 0, 1, −1, 0, 0, 0, −1, 0].

The space spanned by the context vectors can be rep-

resented by a matrix of order W ×R, where i

row is

the context vector of i

distinct word.

If a co-occurrence matrix has to be constructed,

R-dimensional context vectors can be collected into

a matrix of order W ×R, where W is the number of

unique word types, and R is the chosen dimensionality

for each word. Note that this is similar to construct-

ing an n-dimensional unary context vector which has

a single 1 in different positions for different words

and n is the number of distinct words. Mathemati-

cally, these n-dimensional unary vectors are orthog-

onal, whereas the R-dimensional random index vec-

tors are nearly orthogonal. However, most often this

does not stand on the way of effective computation.

On the contrary, this small compromise gives us huge

computational advantage as explained below. There

are many more nearly orthogonal than truly orthogo-

nal directions in a high-dimensional space (Sahlgren,

2005). Choosing Random Indexing is an advanta-

geous trade-off between the number of dimensions

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

172

and orthogonality, as the R-dimensional random in-

dex vectors can be seen as approximations of the n-

dimensional unary vectors.

3.2 Assigning Semantic Vectors to

Documents

The average term vector can be considered as the cen-

tral theme of the document and is computed as:

mean

∑

i=0

(2)

where n is the number of distinct words in the doc-

ument. While we compute the semantic vectors for

the sentences we subtract ~x

mean

from the context vec-

tors of the words of the sentence to remove the bias

from the system (Higgins et al., 2004). The semantic

vector of a sentence is thus computed as:

semantic

∑

i=0



−

(x)

mean



(3)

where m is the number of words in the focus sen-

tence and x

refer to the context vector of i

word.

Note that subtracting the mean vector reduces the

magnitude of those term vectors which are close in

direction to the mean vector, and increases the magni-

tude of term vectors which are most nearly opposite in

direction from the mean vector. Thus the words which

occur very commonly in a text, such as the auxiliary

verbs and articles, will have little inﬂuence on the sen-

tence vector so produced. Further, the terms whose

distribution is most distinctive will be given the max-

imum weight. The semantic vector of sentence thus

obtained is fed into NN as input vector, and the corre-

sponding output from NN is ranking of the sentence,

which is a real number between 0 and 1.

4 EXPERIMENTAL SETUP

Our experimental data set consists of 25 documents

containing 300 to 700 words each. The processing

of each document to generate a summary has been

carried out as follows:

4.1 Mapping of Words on Word Space

Model

We have implemented the mapping of words onto the

word space by three methods, namely Narrow Win-

dow Approach, Extended Window Approach and Sen-

tence Context Approach. In ‘narrow window’ ap-

proach, each word in the document was initially as-

signed a unique randomly generated index vector of

the dimension 100 with ternary values (1,−1,0). The

index vectors were so constructed that each vector

of 100 units contained two randomly placed 1 and

two randomly placed −1s, rest of the units were as-

signed 0 values. Each word was also assigned an ini-

tially empty context vector of dimension 100. We de-

ﬁned the context by a 2 ×2 sliding window on the

focus word. The context of a given word was also

restricted in one sentence, i.e. across sentence win-

dows were not considered. Experiments conducted at

SICS, Sweden (Karlgren and Sahlgren, 2001) have in-

dicate that a 2 ×2 window is preferable for acquiring

semantic information. Once context vectors for words

are generated, we then generate semantic vectors for

individual sentences as described earlier. The second

approach with ‘extended window’, considers a sliding

window of size 4 ×4 instead of 2 ×2. Thus the prob-

lem with the kind of words which have same immedi-

ate context words but different meanings is resolved

by taking larger window. For example consider the

two sentences, Doing good is humane and Doing bad

is inhumane. Here, good and bad have same im-

mediate context words, but have different meanings.

However if we extend the window for context words,

then we will have much better approximation of their

meaning. In the third approach of ‘sentence context’,

we have updated the input vectors not from the se-

mantic vectors of sentences, but by realizing the se-

mantic vectors as context vectors for sentences, and

hence create the input vector by adding the 4 ×4 con-

text window’s context vectors.

We now have the semantic vectors of the sentences

of the document which act as the input pattern vectors

for our neural network. The sentence is selected or re-

jected on the basis of the output given by the network

for the corresponding semantic vector.

4.2 Text Summarization Process

The proposed approach summarizes given text doc-

uments through a two phase process, the details of

which are discussed below:

1. (One time) Training of the Neural Network.

2. Generating summary of a user deﬁned text.

The ﬁrst step involves one time training of a neu-

ral network through change in its weights to recog-

nize the type of sentences that should be included in

the summary. Once training is done, any number of

single document text ﬁles can be summarized in the

second step, which uses the modiﬁed neural network

from ﬁrst step to sieve the text to select only the highly

ranked sentences and create the required summary.

SINGLE DOCUMENT TEXT SUMMARIZATION USING RANDOM INDEXING AND NEURAL NETWORKS

173

4.2.1 Training of the Neural Network

The ﬁrst phase of the summarization process involves

training the neural networks to learn the types of sen-

tences that should be included in the summary. This is

accomplished by training the network with sentences

in several test paragraphs where each sentence is iden-

tiﬁed as to whether it should be included in the sum-

mary or not. For this purpose the training texts and the

corresponding summaries are provided by the user all

in separate text ﬁles after appropriate pre-processing,

viz. formatting the text such that every sentence starts

from a new line and there are no blank lines in be-

tween etc. The neural network learns the patterns in-

herent in sentences that should not be included in the

summary and those that should be included. We use

a three layered Feed-Forward neural network. It has

been proven to be a universal function approximator,

which is considered to be very efﬁcient in discovering

patterns and approximating the inherent function.

Fig. 1 shows the structure of the neural network

used in this work. As shown, the network contains

three layers- Input Layer, one Hidden Layer and the

Output Layer. The output layer of our network con-

tains only one neuron whose output gives the rank of

a sentence on the basis of which the sentence is se-

lected or rejected. However, the no. of neurons in the

input layer of our network is 101 (one for the bias and

the remaining for the 100 inputs each corresponding

to one dimension of the index vector) and the no. of

neurons in the hidden layer is 65. The no. of neu-

rons in the hidden layer has been selected arbitrarily.

However, that can also be calculated using the follow-

ing formula

numHid =

numInput

√

numPat (4)

where numHid = No. of Hidden layer neurons,

numInput = No. of inputs including bias and numPat

= No. of patterns to be trained. The number of hid-

den layer neurons should not be, in any case, less

than the number obtained from above formula, as it

leads to inconsistency in weights updating of larger

texts. However, more number of neurons will result

in slower training. Hence in order to strike a balance

between training time and consistency, we decided to

have 65 neurons in the hidden layer. In our case, the

training patterns are the semantic vectors of the sen-

tences of the training documents. The overall error is

calculated using the mean square error of individual

error generated by each training pattern. The max-

imum number of iterations kept for the algorithm is

10000 and the tolerance limit for the total error has

http://www.wardsystems.com/manuals/ neuroshell2

Figure 1: A schematic diagram of the neural network.

been kept as 0.0000003, which have been found out

to be optimal by repeated trials. The learning rates for

changing the weights between both input-output lay-

ers and hidden-output layers are 0.01. The activation

function used for the hidden layer’s neurons is the sig-

moid function and for output layer’s neuron it is linear

function.

4.2.2 Generating Summary of an Arbitrary Text

Once the network has been trained, it can be used as

a tool to ﬁlter sentences in any paragraph and deter-

mine whether each sentence should be included in the

summary or not. The program ﬁnds the semantic vec-

tors of the sentences of given document. Using the

weight values found in the training step the outputs

of all these semantic vectors are calculated. This out-

put of a semantic vector corresponds to the rank of

the corresponding sentence. The rank of a sentence is

directly proportional to its priority/importance within

the document. The sentences are then sorted using

modiﬁed quick-sort technique. On the basis of their

ranks the high priority sentences are selected to create

the summary. Please note that in the present case we

are selecting summaries based on the percentage of

total number of words in the document. Since in terms

of sentences it may not give an exact number, we have

chosen top few sentences according to their ranks, un-

til the total word count is not exceeding the percent-

age mentioned along with a leeway of further 10%.

For illustration, for 25% summaries, we selected top

ranked sentences until the total sum is not exceeding

27.5% of total no. of words in the text.

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

174

Table 1: Precision, recall and F values for tested ﬁles (rounded off to second decimal).

Text No. Narrow Window Extended Window Sentence Context Copernic MS Word

(Size of text) p r F p r F p r F p r F p r F

1 25% 0.50 0.50 0.50 0.50 0.75 0.60 0.75 0.80 0.77 0.33 0.60 0.43 0.50 0.50 0.50

(359) 50% 0.38 0.50 0.43 0.43 0.50 0.47 0.50 0.67 0.57 0.43 0.45 0.44 0.25 0.33 0.29

2 25% 1.00 0.50 0.67 0.83 0.70 0.76 0.41 0.44 0.42 0.50 0.67 0.57 0.10 0.50 0.17

(298) 50% 0.50 0.38 0.43 0.55 0.63 0.59 0.63 0.63 0.63 0.50 0.33 0.38 0.50 0.38 0.43

3 25% 0.00 0.00 NaN 0.78 0.78 0.78 0.60 0.67 0.63 0.43 0.60 0.50 0.67 0.50 0.57

(396) 50% 0.50 0.55 0.53 0.70 0.78 0.74 0.67 0.67 0.67 0.57 0.63 0.60 0.33 0.44 0.38

4 25% 0.10 0.40 0.16 0.33 0.30 0.31 1.00 0.89 0.93 0.25 0.50 0.33 0.00 0.00 NaN

(408) 50% 0.38 0.33 0.35 0.63 0.55 0.59 0.89 0.78 0.83 0.33 0.55 0.41 0.25 0.22 0.24

5 25% 0.25 0.25 0.25 0.20 0.40 0.27 0.80 0.67 0.73 0.63 0.50 0.56 0.40 0.50 0.44

(519) 50% 0.50 0.60 0.54 0.50 0.60 0.54 0.50 0.60 0.54 0.40 0.55 0.46 0.50 0.60 0.54

6 25% 0.40 0.50 0.44 0.70 0.89 0.78 0.67 0.33 0.44 0.40 0.60 0.48 1.00 1.00 1.00

(504) 50% 0.55 0.45 0.50 0.63 0.55 0.58 0.67 0.63 0.64 0.66 0.75 0.70 0.83 0.78 0.80

7 25% 0.55 0.67 0.59 0.80 0.67 0.73 0.83 1.00 0.91 0.50 0.43 0.46 0.50 0.43 0.46

(694) 50% 0.46 0.71 0.56 0.40 0.57 0.47 0.43 0.71 0.54 0.62 0.57 0.59 0.30 0.43 0.35

8 25% 0.33 0.60 0.43 0.50 0.50 0.50 0.75 0.60 0.67 0.55 0.55 0.55 0.20 0.33 0.25

(681) 50% 0.30 0.36 0.33 0.36 0.46 0.40 0.80 0.74 0.77 0.40 0.66 0.50 0.53 0.73 0.62

9 25% 0.67 1.00 0.80 0.25 0.50 0.33 0.20 0.33 0.25 0.50 0.50 0.50 0.40 0.67 0.50

(608) 50% 0.40 0.46 0.43 0.44 0.61 0.52 0.50 0.69 0.58 0.80 0.75 0.77 0.60 0.69 0.64

10 25% 0.33 0.33 0.33 0.67 0.67 0.67 1.00 0.75 0.86 0.67 0.67 0.67 0.67 0.50 0.57

(388) 50% 0.33 0.33 0.33 0.55 0.55 0.55 0.89 0.80 0.84 0.67 0.50 0.57 0.55 0.55 0.55

Average 25% 0.41 0.48 0.43 0.56 0.62 0.57 0.70 0.65 0.66 0.48 0.56 0.51 0.44 0.50 0.54

50% 0.43 0.47 0.44 0.52 0.58 0.55 0.65 0.69 0.66 0.54 0.57 0.54 0.46 0.52 0.48

5 RESULTS

We have ﬁrst trained our summarizer on 15 different

texts and their standard summary provided by DUC.

Each document of DUC 2002 corpus is accompanied

by two different abstracts manually created by pro-

fessional abstractors. For each abstract so created,

we made a corresponding extract summary, by replac-

ing restructured sentences with the closest sentence(s)

from the original document. Then we have taken their

union to have a reference summary (S

re f

). For evalu-

ation, our results have been compared with the refer-

ence summary thus created. We ran our experiments

on a different set of 10 texts and computed extracts

at 25% and 50% levels. We then compare our candi-

date summary, one from each approach, (denoted by

cand

) with the reference summary and compute the

precision (p), recall (r) and F values (Yates and Neto,

1999). We also compute the p, r and F values for the

summaries generated by Copernic and MS Word sum-

marizers. The values obtained for 10 test documents

only (due to space constraints) have been shown in

Table 1.

Table 1 shows the comparison of the 25% and

50% summaries of 10 randomly selected text ﬁles of

DUC 2002, outside the training texts. For each of

them we computed the p, r and F values for all the

three approaches proposed by us, and also for Coper-

nic and MS Word summarizers. Out of the 20 cases,

Sentence Context approach gave the best results in as

many as 13 cases. This was followed by the Extended

Window approach, which gives the best result in 3,

while MS Word, Copernic and Narrow Window ap-

proach gave the best result only in 2, 1 and 1 cases,

respectively. For both 25% and 50% summaries the

average F-values for the Sentence Context approach is

0.66. The next highest average F-values for 25% sum-

maries is 0.57 for the Extended Approach, whereas it

is 0.56 for the 50% summaries again for the Extended

Window approach.

The results of these limited experiments clearly

indicate that the summaries created by the proposed

scheme (in particular for the Sentence Context and

Extended Window approach) are very close to the hu-

man generated summaries, compared to the existing

summarizers at both 25% and 50% level. The results

are certainly very promising. However, it is too early

to predict how the neural network along with Ran-

SINGLE DOCUMENT TEXT SUMMARIZATION USING RANDOM INDEXING AND NEURAL NETWORKS

175

dom Indexing based scheme will work in general. A

lot more experiments need to be done to come to a

conclusion. We are currently working towards this

direction.

6 CONCLUSIONS

In this work we propose a scheme for text summa-

rization using Random Indexing and neural networks.

The approach exploits the similarity in the meaning of

the words by mapping them onto the word space and

removing less important sentences by sieving them

through a neural network already trained on a set of

summarized text.

The problem of high dimensionality of the seman-

tic space has been tackled by employing Random In-

dexing which is less costly in computation and mem-

ory consumption compared to other dimensionality

reduction approaches. The selection of features as

well as the selection of summary sentences by the hu-

man reader from the training paragraphs plays an im-

portant role in the performance of the network. The

network is trained according to the style of the hu-

man reader; and also recognizing to which sentences

in a paragraph the human reader puts more emphasis.

This, in fact, is an advantage that the proposed ap-

proach provides. This allows an individual reader to

train the neural network according to one’s own style.

Furthermore, the selected features can be modiﬁed to

reﬂect the reader’s needs and requirement. In future

we plan to create more complex neural network struc-

ture, involving better activation functions, so as to

smooth out some abruptness that we encountered in

the current schemes. Moreover, in our present eval-

uation we have used measures like precision, recall

and F which are used primarily in the context of in-

formation retrieval. In future we intend to use more

summarization-speciﬁc techniques to measure the ef-

ﬁcacy of our scheme.

In order to develop the proposed technique into an

efﬁcient summarization tool we need to answer quite

a few questions viz. what is the right value for R, the

dimension of the index vectors; what is the right num-

ber of nodes in the neural network etc. We are aiming

at ﬁnding optimum values for these parameters for the

proposed model.

In the present scheme we have used three differ-

ent approaches: Narrow Window, Extended Window

and Sentence Context. We have noticed that the on

the average the Sentence Context approach produces

the best result. But it is not uniformly best in several

cases other approaches have produced results better

than this approach. We need to do more experimen-

tation with the Sentence Context approach in order

to elevate its performance level. Alternatively, we

may have to suitably combine the three proposed ap-

proaches in order to produce good results uniformly.

We are currently working towards these goals.

REFERENCES

Bishop, C. M. (2005). Neural Networks for Pattern Recog-

nition. Oxford Press.

Chatterjee, N. and Mohan, S. (2007). Extraction-based

single-document summarization using random index-

ing. In ICTAI (2), pages 448–455.

Edmundson, H. P. (1969). New methods in automatic ex-

tracting. J. ACM, 16(2):264–285.

Higgins, D., Burstein, J., Marcu, D., and Gentile, C. (2004).

Evaluating multiple aspects of coherence in student

essays. In HLT-NAACL, pages 185–192.

Kaikhah, K. (2004). Text summarization using neural net-

works. WSEAS Transactions on Systems3, 3:960–963.

Kanerva, P. (1988). Sparse distributed memory. Cambridge:

MIT Press.

Karlgren, J. and Sahlgren, M. (2001). From words to un-

derstanding: Foundations of real-world intelligence.

CSLI Publications.

Kupiec, J., Pedersen, J. O., and Chen, F. (1995). A trainable

document summarizer. In SIGIR, pages 68–73.

Luhn, H. (1958). The automatic creation of literature ab-

stracts. IBM Journal of Research and Development.

Mani, I. and Bloedorn, E. (1999). Summarizing similarities

and differences among related documents. Inf. Retr.,

1(1-2):35–67.

Rojas, R. (1996). Neural Networks: A Systematic Introduc-

tion. Springer-Verlag, Berlin.

Sahlgren, M. (2005). An introduction to random index-

ing. In Proceedings of the Methods and Applications

of Semantic Indexing Workshop at the 7th Interna-

tional Conference on Terminology and Knowledge En-

gineering.

Sahlgren, M. (2006). The Word-Space Model: Us-

ing distributional analysis to represent syntagmatic

and paradigmatic relations between words in high-

dimensional vector spaces. Ph.D. dissertation, De-

partment of Linguistics, Stockholm University.

Yates, R. and Neto, B. (1999). Modern Information Re-

trieval. Pearson Education.

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

176