A Methodology to Measure the Semantic Similarity between Words

based on the Formal Concept Analysis

Yewon Jeong, Yiyeon Yoon, Dongkyu Jeon, Youngsang Cho

and Wooju Kim

Department of Information & Industiral Engineering, Yonsei University, Seoul, Korea

Keywords: Query Expansion, Formal Concept Analysis, Semantic Similarity, Keyword-based Web Documents.

Abstract: Recently, web users feel difficult to find the desired information on the internet despite a lot of useful

information since it takes more time and effort to find it. In order to solve this problem, the query expansion

is considered as a new alternative. It is the process of reformulating a query to improve retrieval

performance in information retrieval operations. Although there are a few techniques of query expansion,

synonym identification is one of them. Therefore, this paper proposes the method to measure the semantic

similarity between two words by using the keyword-based web documents. The formal concept analysis and

our proposed expansion algorithm are used to estimate the similarity between two words. To evaluate the

performance of our method, we conducted two experiments. As the results, the average of similarity

between synonym pairs is much higher than random pairs. Also, our method shows the remarkable

performance in comparison with other method. Therefore, the suggested method in this paper has the

contribution to find the synonym among a lot of candidate words.

1 INTRODUCTION

Recently, the useful information on the internet has

been increasing due to the rapid development of

web. However, users feel difficult to find the desired

information on the internet because it takes more

time and efforts. In order to solve this problem, the

query expansion is considered as a new alternative.

It helps user to find the desired results and improve

the effectiveness of retrieval. As the process of

reformulating a query, the query expansion improves

retrieval performance in information retrieval

operations (Vechtomova and Wang, 2006). Thus, in

the search engines, it involves evaluating a user's

input and expanding the search query to match

additional documents. Even if there are a few

techniques of the query expansion, the synonym

identification is one of them.

Finding synonym on the basis of subjective

intuitions is considered as a daunting task. This is

the reason of that it is hard to define the synonym

due to a property that has no clear-cut boundaries

(Baroni and Bisi, 2004). Therefore, this paper

proposes the method to automatically measure how

much two words have the semantically similar

relation by using keyword-based web documents.

There are a lot of web documents which have tagged

words like papers. Therefore, this paper applied the

paper keywords to calculate the similarity between

two words through the formal concept analysis

(FCA).

The next section introduces the related work of

the formal concept analysis and other similarity

measurements. The section 3 provides a detailed

explanation of methodology to measure similarity

between two words. The section 4 presents the result

of experiments to evaluate performance of our

method. Finally, we draw the conclusion and suggest

future work in the section 5.

2 RELATED WORKS

2.1 Formal Concept Analysis

The formal concept analysis is a mathematical

approach which is used for conceptual data analysis

(Ganter et al., 2005). It has been studied in diverse

fields such as data mining, conceptual modelling,

software engineering, social networking and the

semantic web (Alqadah and Bhatnagar, 2011). It is

good to analyse and manage structured data

313

Jeong Y., Yoon Y., Jeon D., Cho Y. and Kim W..

A Methodology to Measure the Semantic Similarity between Words based on the Formal Concept Analysis.

DOI: 10.5220/0004855603130321

In Proceedings of the 10th International Conference on Web Information Systems and Technologies (WEBIST-2014), pages 313-321

ISBN: 978-989-758-024-6

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

(Wormuth and Becker, 2004). Thus, it helps user to

structure an interest domain (Ganter et al., 1997,

Wille, 2009). It models the world of data through the

use of objects and attributes (Cole and Eklund,

1999). Ganter et al.(1999) applied the concept lattice

from the formal concept analysis. This approach has

an advantage that users can refine their query by

searching well-structured graphs. These graphs,

known as formal concept lattice, are composed of a

set of documents and a set of terms. Effectively, it

reduces the task of setting bound restrictions for

managing the number of documents to be retrieved

required (Tam, 2004).

2.2 Related Works of Similarity

Measure between Two Words

Traditionally, a number of approaches to find

synonym have been published. The methodology to

automatically discover synonym from large corpora

have been popular topic in a variety of language

processing (Sánchez and Moreno, 2005, Senellart

and Blondel, 2008, Blondel and Senellart, 2011, Van

der Plas and Tiedemann, 2006). There are two kinds

of approaches to identify synonyms.

The first kind of approaches uses a general

dictionary (Wu and Zhou, 2003). In the area of

synonym extraction, it is common to use lexical

information in dictionary (Veronis and Ide, 1990). In

dictionary-based case, a similarity is decided on

definition of each word in a dictionary. This kind of

approaches is conducted through learning algorithm

based on information in the dictionary (Lu et al.,

2010, Vickrey et al., 2010). Wu and Zhou (2003)

proposed a method of synonym identification by

using bilingual dictionary and corpus. The bilingual

approach works on as follows: Firstly, the bilingual

dictionary is used to translate the target word.

Secondly, the authors used two bilingual corpora

that mean precisely the same. And then, they

calculated the probability of the coincidence degree.

The result of the bilingual method is remarkable in

comparison with the monolingual cases. Another

research builds a graph of lexical information from a

dictionary. The method to compute similarity for

each word is limited to nearby words of graph. This

similarity measurement was evaluated on a set of

related terms (Ho and Fairon, 2004).

The second kind of approaches to identity

synonym considers context of the target word and

computes a similarity of lexical distributions from

corpus (Lin, 1998). In the case of distributional

approaches, a similarity is decided on context. Thus,

it is important to compute how much similar words

are in a corpus. The approach of distributional

similarity for synonym identification is used in order

to find related words (Curran and Moens, 2002).

There has been many works to measure similarity of

words, such as distributional similarity (Lin et al.,

2003). Landauer and Dumais (1997) proposed a

similarity measurement to solve TOEFL tests of

synonym by using latent semantic analysis

(Landauer and Dumais, 1997). Lin (1998) proposed

several methodologies to identify the most probable

candidate among similar words by using a few

distance measures. Turney (2001) presented PMI

and IR method which is calculated by data from the

web. He evaluated this measure on the TOEFL test

in which the system has to select the most probable

candidate of the synonym among 4 words. Lin et al.

(2003) proposed two ways of finding synonym

among distributional related words. The first way is

looking over the overlap in translated texts of

semantically similar words in multiple bilingual

dictionaries. The second is to look through designed

patterns so as to filter out antonyms.

There are a lot of researches for measuring

similarity to identify the synonym. However, the use

of dictionary has been applied to a specific task or

domain(Turney, 2001). Hence, these existing

researches are hard to be applied in the changeable

web. And, the context-based similarity method deals

with unstructured web documents and it takes much

time to analysis since it needs to pre-treatment such

as morphological analysis. Therefore, this paper

proposes a methodology to automatically measure

the semantically similar relation between two words

by using keyword-based structured data from web.

3 METHOD TO MEASURE

SIMILARITY

In this section, we demonstrate the method to

measure semantic similarity between two distinct

words. This paper defined the ‘query’ as the target

word that we would like to compute the semantic

similarity. A pair of queries is defined as 





,



 which is the set of two different words 



and 



The overall procedure to estimate semantic

similarity between two queries of Q is composed of

three phases as shown in the Figure 1; preprocessing,

analysis and calculation phase. In the preprocessing

phase, base data for the analysis are collected and

refined on each query. Let us assume that the query

pair is Q=(contamination, pollution). The set of web

WEBIST2014-InternationalConferenceonWebInformationSystemsandTechnologies

314

Figure 1: The overall procedure to calculate the semantic similarity between two queries.

documents for each queries contamination and

pollution are collected respectively. The formal

context for each query is constructed based on the

set of collected web documents, tags and binary

relations. Finally, the generated formal contexts are

refined according to two rules which are introduced

in the section 3.1.2. In the analysis phase, we apply

FCA and expansion algorithm to each refined formal

context. Implicit concepts from formal concept are

derived through the expansion algorithm which

helps us to compare queries in-depth. In the final

phase, we calculate the semantic similarity of the

pair of queries. On the basis of expanded formal

concepts, we can examine how many concepts are

duplicated by considering the matching concepts.

3.1 Preprocessing Phase

In order to measure the similarity between two

queries, web documents which have the keywords

should be collected on each query. And the

keywords of collected documents should include the

query. From these documents, we can get

information about relation between documents and

tagged words and also can make the formal context.

3.1.1 Generation of the Formal Context

A formal context is represented through a

two-dimensional matrix X. In general, the column

and row of X indicate objects and attributes

respectively. An object is a collected web document

and an attribute is one of the tagged words. Table 1

shows the example of the formal context given the Q

= (contamination, pollution). The checkmarks in the

table mean whether the object contains attributes or

not. In the case of 



 , as shown

in Table 1, the document 



has four attributes such

as contamination, insulators, solutions and

flashover. The each set of attributes and objects are

defined as follows:

{contamination, insulators, humidity,

solutions, flashover, power lines}

A 

(1)

12345

{, , , , }

O ddddd

(2)

{pollution, insulators, etching,

solutions, falshover, iron}

A 

(3)

12345

{, , , , }

O ddddd

(4)







is the set of attributes and







is the set of

objects when





is given.







is composed of tags

from the collected documents and







consists of the

documents which is represented





AMethodologytoMeasuretheSemanticSimilaritybetweenWordsbasedontheFormalConceptAnalysis

315

Table 1: Examples of formal contexts.





= contamination

contami

nation

Insulat

ors

humidit

solution

flashov

power

lines



√ √ √ √



√ √ √



√ √ √



√ √ √ √



√ √ √ √





= pollution

pollution

Insulat

ors

etching

solution

flashov

iron



√ √ √



√ √ √



√ √ √



√ √ √



√ √ √ √

3.1.2 Refinement of the Formal Context

After two formal contexts are generated, the

refinement procedure is required for two reasons.

Our research supposes that the more semantically

similar relation two queries have, the more matching

tagged words they have. This study ultimately wants

to know how many words are matched between

tagged words from two queries. Therefore, the

attribute which is the same with query is

unnecessary in this comparison procedure. The first

refinement rule is to remove ‘query’ from attribute

set A, and then, the second rule is to remove

attributes which are contained in less than two

documents. The reason is that these attributes have

relatively weak effects to this method, and also it is

helpful to save the process time and system cost by

reducing the size of formal context. The summary of

refinement procedure is as follows:

1. Removing the query from  (the set of attributes).

2. Removing the attributes contained in less than

two web documents.

Table 2 is an example of refined context when

the query is contamination and pollution. Because

the contamination is given by 



, the attribute

contamination is removed by rule 1. For the same

reason, the attribute pollution is also removed. Since

the number of web documents contained in etching

is less than 2, the attribute etching is removed by

rule 2.

Table 2: Examples of refined formal contexts.





= contamination

insulators humidity solutions flashover

power

lines



√ √ √



√ √



√ √



√ √ √



√ √ √





= pollution

insulators solutions flashover iron



√



√ √



√ √



√ √



√ √ √

3.2 Analysis Phase

In this section, we introduce the analysis phase of

this method. First, the formal concept analysis is

conducted based on each formal context on 



and





. However, a concept from the formal concept

analysis has only a few implicit concepts. Thus, we

expand the formal concepts through our proposed

expansion algorithm.

3.2.1 Formal Concept Analysis

To measure the similarity between the two queries,

formal concept analysis should be performed on

each formal context of 



and 



. According to these

analysis procedures, two sets of formal concepts are

generated by using formal concept analysis (Ganter

et al., 1997). When the query 



is given, a set of

formal concepts is generated by formal concept

analysis as follows:









,,,

1, ,

iiii

qqqq

SFC FC FC FC

where k n





(5)

In this equation, S







 is the set of formal

concepts and 







is the k th formal concept. And, n

is the number of formal concepts from the formal

context. A formal concept is composed of an intent

and extent as demonstrated in (6):





,1,,

iii

qqq

kkk

CIEwherek n

(6)

In this formula, 







is an intent of the 







and 







an extent. The intent is subset of the attribute set 





which is the keyword set. And, extent is subset of

WEBIST2014-InternationalConferenceonWebInformationSystemsandTechnologies

316

object set 





which is the set of documents. Every

object in 







has every attribute in 







by the

property of formal concept analysis. Thus, 







is a

concept that implicates that the objects in 







have

the common attributes in 







From a set of formal concepts, we can get each

set of intent on certain query. A set of 







is denoted

as 











,,,

iii i

qqq q

III I

where I P A







(7)

An element of 





is subset of 





and intent of each

formal concepts. This set of intents is used when we

calculate similarity between two set of formal

concepts.

3.2.2 Expansion Algorithm

There are a few implicit concepts in a formal

concept. Let us assume that a concept has the

subsets of intent of other concepts. If it has the same

extent each other, it is not generated by formal

concept analysis. Therefore, we need to expand

formal concept in order to compare them in depth.

The detail procedure is as follows:

1. Find a formal concept (FC) which has the most

size of intent from the set of formal concepts (FCS).

2. Get an extent (EXT) and intent (INT) from the FC.

3. Generate the subset of FC of which size is n-1

when the size of intent is n, and define it as INTS.

4. Confirm whether INTS[i](an element of INTS) is

in the FCS.

5. If it isn’t, add the expanded concept which has

INTS [i] and EXT.

6. Repeat this procedure until all of the formal

concepts are expanded.

Firstly, the algorithm finds a formal concept

which has the largest intent size. It is represented by

the dotted outline in result of FCA in Figure 2. The

intent size of this concept is 3, so generate subset of

which size is 2. Then 3 subsets of an intent like

{solutions, flashover}, {solutions, insulators} and

{flashover, insulators} are made. Among these

subsets, a subset {solutions, insulators} doesn’t exist

in original set of formal concepts. Therefore, a new

concept which consists of {solutions, insulators},

{



} could be generated.

If the formal concepts go through expansion

procedure, some concepts are generated. The Figure

3 shows examples of the expanded concepts lattice.

The coloured boxes are the newly generated

concepts. In this figure, (a) is a concept lattice of a

context when the query 



is contamination. There

are 6 concepts made by expansion. And, (b) is a

concept lattice of a context when the query 



pollution. Two concepts are generated. The

expansion of formal concepts is helpful to compare

them because implicit concepts can be found.

3.3 Calculation Phase

Suppose that there are the two queries denoted by 



and 



. The semantic similarity between 



and 



calculated based on comparison of two sets of

formal concepts. To compare them, we need to find

the duplicated formal concepts.

3.3.1 Matching Formal Concepts

If there are two sets of formal concepts, the concepts

which have the same intent are called to ‘matching

Figure 2: An example of expansion process.

AMethodologytoMeasuretheSemanticSimilaritybetweenWordsbasedontheFormalConceptAnalysis

317

concepts’. In other word, it means that concepts

have the same intent from 







and 







respectively. In Figure 3, the concepts marked as

bold outline are the matching concepts. When two

queries, 



and 



, are given, the set of matching

concepts is as follows:





() ,,,

ij ij ij ij

qqqq

SMC MC MC MC 

(8)









 is a set of matching concepts and









is the z th matching concept. And, c is the

number of matching concepts. A matching concept

is composed of an intent and two extents as follows:





ij j

zzzz

MC I E E

(9)



∗

is the intersection of 





and 





. 



∗



is the

extent when the intent is 



∗

and the 



is given.

Also, 



∗



is the extent when the intent is also 



∗

and the 



is given. The function MapFunc is a

function to find an extent corresponding with a

certain intent given query. The formulas are as

follows:





III 

(10)

(,)

E MapFunc I q

(11)

3.3.2 Calculation of Semantic Similarity

If we gain the set of matching concepts, we can

estimate the similarity between two queries, 



and





. A measure of similarity is defined as:







(, )

100

zz zz

xx y y

Similarity q q

IE IE

IE I E











 





(12)

It is the measure to calculate how many concepts are

duplicated. In this formula, we multiply the number

of intent elements by the number of extent

elements

because the concepts that have the bigger size of an

intent or extent have a great effect on measure. This

similarity has range from zero to 100. If the all

concepts are the same the similarity is 100. And if

Figure 3: An example of expansion and matching concepts.

WEBIST2014-InternationalConferenceonWebInformationSystemsandTechnologies

318

there are not duplicated concepts, the result would

be zero.

4 EMPIRICAL EVALUATION

In order to evaluate the effectiveness of our method,

we had the two performance evaluations. Firstly, we

compared the similarity between two types of query

pairs; one is the set of synonym pairs and the other is

based on the randomly selected pairs. Secondly, we

used the type of TOEFL synonym questions to

verify the performance of this method.

4.1 Synonym Pairs Vs. Random Pairs

We prepared the 20 word pairs composed of 10

synonym pairs and 10 random pairs. In order to

make formal contexts about queries, we collected

papers tagged by each query from the IEEE Xplore

website.

This paper shows the result of 10 experiments

based on synonym pairs. The result of evaluation is

shown in Table 3. The best resulted synonym pair

scored as 5.22 is (optimization, optimisation). This

pair has six matched formal concepts. We could

know that it has the same meaning and significantly

similar relation. The worst resulted synonym pair

scored as 0.59 is (validation, verification) and has

three matching formal concepts. This pair has weak

similarity relation.

Table 3: The results of experiment (synonym pairs).

No.

Synonym pairs

Similarity

(



, 



)









1 partition partitioning

0.60

2 optimization optimisation

5.22

3 classification categorization

4.13

4 cryptography steganography

1.71

5 reliability dependability

1.17

6 cluster clustering

4.95

7 contamination pollution

0.87

8 validation verification

0.59

9 encoding encryption

1.45

10 experiment experimentation

3.93

Average 2.46

In addition, we have experiment with 10 random

pairs. The result is shown in Table 4. The average of

all of the random pairs is approximately 0.37. The

best resulted random pair scored as 0.99 is

(normalization, segmentation). It has six matching

formal concepts. Although this query pair is not

synonym, we can understand that they have a little

relevant relation. There are the three worst results

scored as zero and this pairs are composed of

completely unrelated tags. (integration, forecasting),

(lifetime, authorization) and (correlation, evolution)

are unrelated pairs of experiment results. They don’t

have any common concepts each other and we could

know that they don’t have any semantic relations

between them.

Table 4: The results of experiment (random pairs).

No.

Random pairs

Similarity

(



, 



)









1 aggregation android

0.62

2 calibration internet

0.25

3 transportation biometrics

0.35

4 context innovation

0.99

5 integration forecasting

0.00

6 lifetime authorization

0.00

7 visualization entropy

0.61

8 correlation evolution

0.00

9 normalization segmentation

0.67

10 sorting authentication

0.16

Average 0.37

While the average of similarity between synonym

pairs is about 2.46, the average of random pairs is

about 0.37. And it shows the remarkable difference

between two types of pairs. Therefore, the method to

measure similarity relation has the contribution to

find the synonym among a lot of candidates.

4.2 TOEFL Synonym Test

We prepared the 9 TOEFL synonym questions to

find the synonym of the target word. One question is

composed of a target word and four candidate words.

And, we measured the similarity between the target

word and each candidate word. In order to compare

the performance with the related works, we used the

AVMI(Baroni and Bisi, 2004) and cosine similarity

to compute similarity. In order to make contexts, we

also collect papers from the IEEE Xplore website.

And the result of experiments is shown as the Table

5. Our method has the 100 percentage of correct

answers, but the AVMI and cosine similarity had the

78%, 89% performance respectively. It is a

remarkable result in comparison with existing

researches.

AMethodologytoMeasuretheSemanticSimilaritybetweenWordsbasedontheFormalConceptAnalysis

319

Table 5: Result of TOEFL Synonym Test.

Target word

Candidate words

Our

method

AVMI

Cosine

similarit

partition

partitioning 0.597 -3.94 0.114

dependability 0.454 −∞ 0.037

android 0.000 -5.10 0.028

transportation 0.213 -7.17 0.033

optimization

optimisation 5.217 -4.16

0.065

calibration 0.542 -4.47

0.079

internet 0.000 -4.16 0.010

innovation 0.000 -6.24 0.007

classification

categorization 4.134

-4.49

0.405

transportation 0.135

-3.96

0.033

biometrics 0.675 -6.23 0.046

calibration 1.241 -5.11 0.058

cryptography

steganography

1.712 -2.54 0.202

context 0.408 -4.37 0.035

innovation 0.000 -7.23 0.010

android 0.662 -7.20 0.096

reliability

dependability 1.173

−∞

0.157

integration 0.483

-3.11

0.023

forecasting 0.192 -5.71 0.051

context 0.317 -5.44 0.048

cluster

clustering 4.952 -4.09 0.080

dependability 1.724 −∞ 0.070

authorization 0.000 -5.14 0.056

correlation 0.000 -4.94 0.049

contamination

pollution 0.871 -3.50 0.056

visualization 0.068 -6.02 0.021

entropy 0.000 −∞ 0.020

sorting 0.000 −∞ 0.007

encoding

encryption 1.452 -4.34 0.058

normalization 0.367 -4.79 0.050

segmentation 0.288 -4.65 0.025

lifetime 0.412 −∞ 0.026

experiment

experimentation 3.928 -4.92 0.186

sorting 0.000 -5.19 0.012

authentication 0.000 −∞ 0.009

aggregation 0.000 -5.02 0.037

5 CONCLUSIONS

This paper has presented a new method to measure

the similarity between two queries. The experiment

for evaluation shows that the effectiveness of this

method is quite persuasive by comparing the

semantic similarity of synonym and random pairs

and finding the synonym among four candidate

words. This method could be used to automatically

find synonym from a lot of candidate words. It could

cope with the changeable web since it uses the web

data.

In the future research, the more experiments

based on the larger sized dataset should be

conducted. Moreover, we will devise the

methodology to automatically generate candidate

words to find the correct synonym.

ACKNOWLEDGEMENTS

This research was supported by Basic Science

Research Program through the National Research

Foundation of Korea (NRF) funded by the Ministry

of Education, Science and Technology(2010-

0024532)

REFERENCES

Alqadah, F. & Bhatnagar, R. 2011. Similarity Measures In

Formal Concept Analysis. Annals Of Mathematics And

Artificial Intelligence, 61, 245-256.

Baroni, M. & Bisi, S. Using Cooccurrence Statistics And

The Web To Discover Synonyms In A Technical

Language. Lrec, 2004.

Blondel, V. D. & Senellart, P. P. 2011. Automatic

Extraction Of Synonyms In A Dictionary. Vertex, 1,

X1.

Cole, R. & Eklund, P. W. 1999. Scalability In Formal

Concept Analysis. Computational Intelligence, 15, 11-

27.

Curran, J. R. & Moens, M. Improvements In Automatic

Thesaurus Extraction. Proceedings Of The Acl-02

Workshop On Unsupervised Lexical Acquisition-

Volume 9, 2002. Association For Computational

Linguistics, 59-66.

Ganter, B., Stumme, G. & Wille, R. 2005. Formal

Concept Analysis: Foundations And Applications,

Springer.

Ganter, B., Wille, R. & Franzke, C. 1997. Formal Concept

Analysis: Mathematical Foundations, Springer-Verlag

New York, Inc.

Ho, N.-D. & Fairon, C. Lexical Similarity Based On

Quantity Of Information Exchanged-Synonym

Extraction. Rivf, 2004. Citeseer, 193-198.

Landauer, T. K. & Dumais, S. T. 1997. A Solution To

Plato's Problem: The Latent Semantic Analysis Theory

Of Acquisition, Induction, And Representation Of

Knowledge. Psychological Review, 104, 211.

Lin, D. Automatic Retrieval And Clustering Of Similar

Words. Proceedings Of The 17th International

Conference On Computational Linguistics-Volume 2,

1998. Association For Computational Linguistics,

768-774.

Lin, D., Zhao, S., Qin, L. & Zhou, M. Identifying

Synonyms Among Distributionally Similar Words.

Ijcai, 2003. 1492-1493.

Lu, Z., Liu, Y., Zhao, S. & Chen, X. Study On Feature

Selection And Weighting Based On Synonym Merge

In Text Categorization. Future Networks, 2010.

Icfn'10. Second International Conference On, 2010.

Ieee, 105-109.

S Nchez, D. & Moreno, A. Automatic Discovery Of

Synonyms And Lexicalizations From The Web. Ccia,

2005. 205-212.

Senellart, P. & Blondel, V. D. 2008. Automatic Discovery

WEBIST2014-InternationalConferenceonWebInformationSystemsandTechnologies

320

Of Similarwords. Survey Of Text Mining Ii. Springer.

Tam, G. K. Focas–Formal Concept Analysis And Text

Similarity. Proceedings Of The 2nd International

Conference On Formal Concept Analysis, 2004.

Turney, P. 2001. Mining The Web For Synonyms: Pmi-Ir

Versus Lsa On Toefl.

Van Der Plas, L. & Tiedemann, J. Finding Synonyms

Using Automatic Word Alignment And Measures Of

Distributional Similarity. Proceedings Of The

Coling/Acl On Main Conference Poster Sessions,

2006. Association For Computational Linguistics,

866-873.

Vechtomova, O. & Wang, Y. 2006. A Study Of The Effect

Of Term Proximity On Query Expansion. Journal Of

Information Science, 32, 324-333.

Veronis, J. & Ide, N. M. Word Sense Disambiguation

With Very Large Neural Networks Extracted From

Machine Readable Dictionaries. Proceedings Of The

13th Conference On Computational Linguistics-

Volume 2, 1990. Association For Computational

Linguistics, 389-394.

Vickrey, D., Kipersztok, O. & Koller, D. An Active

Learning Approach To Finding Related Terms.

Proceedings Of The Acl 2010 Conference Short

Papers, 2010. Association For Computational

Linguistics, 371-376.

Wille, R. 2009. Restructuring Lattice Theory: An

Approach Based On Hierarchies Of Concepts,

Springer.

Wormuth, B. & Becker, P. Introduction To Formal

Concept Analysis. 2nd International Conference Of

Formal Concept Analysis February, 2004.

Wu, H. & Zhou, M. Optimizing Synonym Extraction

Using Monolingual And Bilingual Resources.

Proceedings Of The Second International Workshop

On Paraphrasing-Volume 16, 2003. Association For

Computational Linguistics, 72-79.

AMethodologytoMeasuretheSemanticSimilaritybetweenWordsbasedontheFormalConceptAnalysis

321