Computing Semantic Textual Similarity based on Partial Textual

Entailment

Martin V

ıta

NLP Centre, Faculty of Informatics, Botanick

a 68a, 602 00 Brno, Czech Republic

1 INTRODUCTION

Nowadays, textual entailment is a well-founded no-

tion. There are several deﬁnitions of textual entail-

ment: for our purposes, we will use the following one

(Androutsopoulos and Malakasiotis, 2010):

“By textual entailment is understood a relation-

ship between coherent text T and a language expres-

sion H, which is considered as a hypothesis. T entails

H if the meaning of H as interpreted in context of T,

can be deduced from the meaning of T .”

Recognizing textual entailment (abbreviated of-

ten as RTE) is a decision problem whether T entails

H. During the last ten years textual entailment at-

tracts an intensive attention of the NLP community.

RTE is currently a deeply studied problem having

consequences for many different applications of NLP

– including multi-document summarization, machine

translation evaluation, student response analysis, etc.

RTE is closely related to a problem of paraphrases

recognizing. A paraphrase s

of a sentence s is a sen-

tence that has the same or almost the same meaning as

s in a given context. The relationship between para-

phrasing and textual entailment is straightforward: a

paraphrase can be considered as a mutual textual en-

tailment (s entails s

and s

entails s simultaneously).

Due to this fact methods for RTE and paraphrasing

are often treated together, although some differences

are taken into the account.

RTE is a binary decision problem. When obtain-

ing a negative result, it is not possible to say if T al-

most entails H, i. e. how close is H to a sentence that

is entailed by T . Roughly said, textual entailment is

crisp and rigid. The notion of partial textual entail-

ment is an attempt to incorporate situations such that

T “partially” entails H. According to (Levy et al.,

2013), we say that an ordered pair (T, H) forms a par-

tial textual entailment if a fragment of the hypothesis

H is entailed by T .

To obtain a better idea of the problem, we provide

an illustration on three examples:

1. Wonderworks Ltd. constructed the new bridge.

2. The new bridge was constructed by Wonderworks

Ltd.

3. Wonderworks Ltd. constructed the new bridge

over the river Thames.

4. A new bridge over the river Thames was con-

structed.

The ﬁrst two sentences are mutual paraphrases.

The ﬁrst (or the second one) and the third one form

a partial textual entailment. The last one is entailed

by the third one.

The main aim of the doctoral project is to investi-

gate new methods for recognizing partial textual en-

tailment in both mono- and cross-lingual setting, de-

scribe methods for computing semantic textual simi-

larity based on a partial textual entailment score, de-

velop a system for recognizing partial textual entail-

ment and implement functionality of computing a se-

mantic textual similarity within a real-world applica-

tion framework.

2 MOTIVATION

Although the topic is probably interesting from the

theoretical point of view, the motivation for the pro-

posed project arises mainly from practical issues cur-

rently being solved in the author’s practice (regarding

R&D policy and management).

• Recommendation of similar documents in R&D

domain (i. e. papers, patents, . . . ) related to

a given project (containing similar ideas) for a re-

viewer in order to provide an information support.

• Discovery of (potentially) duplicate projects -

projects with similar content, i. e. with propos-

als with a big amount of sentences that are mutual

(partial) paraphrases. The goal is straightforward

– to avoid ﬁnancing of the same or closely related

thing twice from the public sources.

• Identiﬁcation of groups of R&D results of a single

author based on a repeatedly used idea – for ex-

ample, papers describing an application of a cer-

Víta, M..

Computing Semantic Textual Similarity based on Partial Textual Entailment.

In Doctoral Consortium (DC3K 2015), pages 3-12

tain method on slightly different objects of re-

search or reusing ideas presented in conference

papers later in journal papers, publishing slightly

changed versions of papers in different languages,

etc. This phenomenon is typical for systems were

“funding depends on quantity” and it leads to dis-

tortions in measuring the real performance of an

individual or an institution. We will refer this is-

sue as “multiple reporting task”.

Another particular application of our system will

be in the ﬁeld of in-depth exploring (medical) curric-

ula in order to improve author’s results concerning

creating a balanced content of medical study. Parts

of the study – courses, disciplines, learning units etc.

– are represented as textual information in plaintext

ﬁles. (As we will mention in the further text, the

language of these representations has some important

features.) When exploring a given curricula or com-

paring courses of different faculties/universities, it is

important to answer questions like “if the content of a

given course is (partially) covered by another course”,

“what is the similarity of two courses” and identify

parts of overlapping contents. Current approaches are

based on a simple bag-of-words representation and a

cosine similarity (in some cases improved by LSA

application). These approaches are not able to cap-

ture valuable aspects such as paraphrasing of parts of

course descriptions.

According to the potentially practical utilization,

there are some limitations and requirements of the ﬁ-

nal application:

• maximally reduced usage of external tools, espe-

cially NLP tools (in except lemmatization and/or

stemming),

• language independence whenever possible.

The proposed approach should also enable further

extending towards cross-lingual setting (that allows us

to compute a semantic textual similarity of documents

in different languages).

As mentioned in (Nev

rilov

a, 2014b), “the typi-

cal attribute of the current state-of-the-art in this area

[textual entailment] is that a number of articles de-

scribe methods (with possible applications) whereas

few articles describe applications of the proposed

methods in large systems. If successful, the proposed

thesis can ﬁll this gab and turn methods into a real-

world application.

The hypothesis is that the approach proposed in

this paper based on the (partial) textual entailment

is signiﬁcantly better than LSA based methods (both

mono- and multilingual).

3 STATE-OF-THE-ART

This section provides a brief overview of state-of-the-

art: it describes the keystones of our approach – rec-

ognizing textual entailment, recognizing partial tex-

tual entailment and word2vec model. It also regards

semantic textual similarity and plagiarism detection.

3.1 Recognizing Textual Entailment

One of the deﬁnitions of the notion of textual en-

tailment was provided in the previous chapter. The

textual entailment differs from other kinds of en-

tailment (such as logical or analytical entailment)

– the description of distinctions among these types

of entailment is out of scope of this work since

it is application-oriented. The relevant discussion

concerning this topic is summarized in (Nev

rilov

2014b).

For reader’s convenience, we are going to intro-

duce a commonly used notation: T → H will be an

abbreviation of ”T entails H“ or equivalently ”H is

entailed by T. The other case when T → H does not

hold, we will use H 6→ T . As mentioned above, rec-

ognizing textual entailment task is a binary decision

task whether H → T or H 6→ T . In the further text we

will also write that an ordered pair (T, H) is a textual

entailment whenever T → H.

As mentioned in the previous section, the deﬁni-

tion of the textual entailment is strict (in a sense of an

arbitrary, but widely accepted deﬁnition): if T 6→ H,

there is no way how to measure how close is H to

some H

such that T → H

. In other words, from the

RTE viewpoint, a hypothesis H completely unrelated

to the text T is treated in a same way as a hypothesis

that is “almost entailed”.

Classiﬁcation of RTE Approaches

An up-to-date comprehensive classiﬁcation of RTE

approaches is provided in (Nev

rilov

a, 2014b). This

classiﬁcation arises from the classiﬁcation introduced

in an older survey (Androutsopoulos and Malakasio-

tis, 2010), but enriches it by adding a higher level of

classiﬁcation – methods are divided to basic and ad-

vanced methods.

The basic approaches are characterized by deal-

ing with sequences of words. To this class we assign:

methods based on:

• B-o-W (bag-of-words) approaches: these meth-

ods are based on surface string similarity, in some

cases after certain preprocessing is applied. The

main idea is to match words in H with “most suit-

able” words in T . Several string similarity mea-

DC3K 2015 - Doctoral Consortium on Knowledge Discovery, Knowledge Engineering and Knowledge Management

sures are used (e. g. edit distance). These ap-

proaches are usually straightforward in case of

paraphrasing detection since sentences involved

have approximately the same length.

• Vector space approaches: these methods deal with

vector representation of T and H and comput-

ing their similarity (often by the cosine distance).

This approach was successfully used on para-

phrasing task by (Erk and Pad

o, 2009).

As mentioned in (Nev

rilov

a, 2014b), the main

advantage of these approaches is their relative lan-

guage independence. Their main disadvantage is the

inability to handle expressions that do not preserve

the truth value, e. g. negations if expressed as sepa-

rate words.

In contrast, advanced approaches deal with the

structure of the text.

To this class belong the following methods:

• Logic-based approaches: the core of these ap-

proaches is a mapping of T and H to logical ex-

pressions Φ

and Φ

(for each possible read-

ing of T, H, respectively) and checking the log-

ical entailment, usually in form (Φ

∧ B) |= Φ

where B stands for corresponding logical repre-

sentation of common knowledge. This part of

the task is done using a theorem prover. Logi-

cal formalisms being taken into the account, con-

tain mainly ﬁrst order logic capturing even tem-

poral aspects (Tatu and Moldovan, 2006) and de-

scription logics (de Salvo Braz et al., 2006). The

crucial question of these approaches is obtain-

ing the common knowledge. Typical knowledge

bases used as starting point are WordNet (Fell-

baum, 1998), Extended WordNet (Moldovan and

Rus, 2001), FrameNet and VerbNet.

• Syntactic similarity approaches: these methods

deal with (dependency) tree representations and

use more or less sophisticated computations rang-

ing from simple common edge count (Malakasi-

otis and Androutsopoulos, 2007) to tree edit dis-

tance (Kouylekov and Magnini, 2005), sometimes

combined with lexical sources like in (Kouylekov

and Magnini, 2006). Other approaches compare

the parse tree of H with subtrees of T (Zanzotto

et al., 2009).

From our point of view there arises a natural question

whether these language constructions occur in “scientiﬁc

papers” or texts describing curricula, such as syllabi (that

are both important for our purposes) less or more often than

in corpora where experiments with RTE were performed. It

is expected that several collections of texts involved in our

work will not contain this kind of expressions – but this is a

hypothesis that should be proved.

• Approaches based on similarity measures over

symbolic meaning representations: in this case,

semantic representation of H is compared with a

semantic representation of T . Again, FrameNet or

WordNet knowledge bases are used.

• Approaches based on decoding: idea of these ap-

proaches is the application of transformation rules

like replacing synonyms, hyponyms/hypernyms

replacements, paraphrase patterns etc. Such trans-

formations can be associated with conﬁdence

scores learned from the corpus and (T, H) is de-

cided to be a textual entailment in case of the sum

of maximum-score sequence is greater or equal

than a given threshold. This approach – combined

with probabilistic methods – was used in (Harmel-

ing, 2009).

Approaches based on machine learning methods

have a separate category in (Androutsopoulos and

Malakasiotis, 2010) that cannot be simply transferred

to this hierarchy since their background varies from

using simple surface strings features to advanced fea-

tures derived from semantic representations.

For completeness let us mention that RTE is only

one of several tasks connected with textual entail-

ment. Other ones are namely textual entailment gen-

eration and textual entailment extraction. Indeed,

these tasks are not relevant from our perspective.

Selection of Existing Systems, Test Suites and

Corpora

Although the textual entailment tasks are widely stud-

ied at least in the last ﬁve years, the number of RTE

systems is relatively low.

A respectable source of existing functional RTE

systems is the ACL Web Wiki page. In the period

of writing this proposal, six functional systems were

presented, namely: VENSES (based on two subsys-

tems: a reduced version of GETARUN which pro-

duces the semantics from complete linguistic rep-

resentations and a partial robust constituency-based

parser), Nutcracker (a system using ﬁrst order logic

– a theorem prover and a ﬁnite model builder), ED-

ITS – Edit Distance Textual Entailment Suite, BI-

UTEE – Bar-Ilan University Textual Entailment En-

gine, formerly separate application, now a part of the

EOP, based on dealing with dependency trees and

performing knowledge-based transformations (Stern

and Dagan, 2012), EXCITEMENT Open Platform

(EOP - a generic architecture and a comprehensive

implementation for textual inference in multiple lan-

guages. The platform includes state-of-art algorithms,

a large number of knowledge resources, and facilities

for experimenting and testing innovative approaches

Computing Semantic Textual Similarity based on Partial Textual Entailment

(?) and TIFMO, a system based on Dependency-

based Compositional Semantics (DCS) and logical in-

ference (Tian et al., 2014).

The BIUTEE system will be recalled in the next

section.

Since RTE become popular NLP task, there ap-

pear several test suites or corpora in order to com-

pare results of different systems. Well known collec-

tion of RTE test suites were prepared for RTE work-

shops. During 2004-2013, eight of these workshops

took place – ﬁrst as Pascal RTE Challenges then as

tracks of Text Analysis Conference and the last as a

track of SemEval challenge. Links to these datasets

are provided within ACL Wiki, some of them are

available for direct download, some are freely avail-

able upon a request. We also should point out two

other corpora: Microsoft Research Paraphrase Cos-

pus (MSR) and Boeing-Princeton-ISI (BPI). The ﬁrst

one is – after (Nev

rilov

a, 2014a) – most widely used

benchmark for paraphrase recognition. It contains

more than 5000 sentences from which more than half

is annotated as paraphrases. The second one is fo-

cused on textual entailment. Compared with Pascal

RTE suites, according to (Clark, 2006), BPI is sim-

pler in terms of syntax but more challenging in the se-

mantic viewpoint, with the intention of focusing more

on the knowledge rather than just linguistic require-

ments. Other corpora are again listed in ACL Wiki in

Textual Entailment Resource Pool section.

For our purposes, the most interesting collec-

tion of test suites comes from The Joint Student Re-

sponse Analysis and 8th Recognizing Textual Entail-

ment Challenge at SemEval-2013 Task 7. They were

inspired by developments of tutorial dialogue systems

(Dzikovska et al., 2012). It contains the SciEntsBank

corpus (Dzikovska et al., 2013), that was originally

developed to assess student answers in a very ﬁne-

grained level. Moreover, SciEntsBank Extra (Nielsen

et al., 2008) contains additional annotations that break

down answers into “facets” or low-level concepts and

relationships connecting them.

3.2 Partial Textual Entailment

The core of our work is a development of a system for

recognizing partial textual entailment. In this section

we introduce the notion of a partial textual entailment

and a related notion of a faceted textual entailment.

Then, we describe one existing system – approach to

recognizing partial textual entailment that will serve

us as a starting point for our research.

The Notion of Partial Textual Entailment and Its

Motivation

The fragments of an idea of partial textual entailment

were introduced in (Nielsen et al., 2009), although

this notion was not explicitly mentioned in the paper.

Partial textual entailment began to be elaborated in re-

cent years by Omer Levy (Levy et al., 2013). It is a

“response” to above-mentioned rigidity of textual en-

tailment.

Following up the paper (Levy et al., 2013), let us

consider a couple of sentences:

• T := Muscles generate movement in the body.

• H := The main job of muscles is to move bones.

Obviously, T does not entail H. Nevertheless we

“feel there is some relationship between T and H”,

such that H is almost entailed by T . Thus, it is rea-

sonable to ask how close is T and H to entailment. So

there arises a need for a graded approach.

Recall the previously mentioned deﬁnition, we

say that an ordered pair (T, H) forms a partial tex-

tual entailment if a fragment of the hypothesis H is

entailed by T .

To distinguish these both forms of textual entail-

ment, for previously deﬁned deﬁned notion we will

use the expression complete textual entailment. Triv-

ially, if (T, H) forms a complete textual entailment,

then (T, H) forms a partial textual entailment – con-

verse generally does not hold, i.e. the condition that

each fragment of H is entailed by T does not neces-

sarily ensure the complete textual entailment, when

the fragments have bounded length (for example: ex-

pressing of sequences of actions where ordering is im-

portant.)

In (Nielsen et al., 2009) Nielsen at al. have deﬁned

the notion of a facet in this setting. Given a hypoth-

esis H, a facet is an ordered pair of words (w

, w

)

that are both contained in H accompanied with the di-

rect semantic relation between w

, w

. In (Levy et al.,

2013), they use a simpliﬁed model when facet is con-

sidered as the pair of words w

, w

without an explicit

expression of the semantic relation. This simpliﬁed

model will be also suitable for our purposes due to

certain characteristics of word2vec model. We will

recall this remark in the following section.

Now we are able to state a deﬁnition of recogniz-

ing faceted entailment. Recognizing faceted entail-

ment is a binary classiﬁcation task whether the facet

, w

) contained in the hypothesis H is expressed

or unaddressed by the text T (Levy et al., 2013).

Returning to our example, the facet (muscles,

move) refers to the agent role in H and is expressed

in the text T , whereas (move, bones) not.

DC3K 2015 - Doctoral Consortium on Knowledge Discovery, Knowledge Engineering and Knowledge Management

Obviously, the faceted entailment – in this simpli-

ﬁed version omitting the semantic relation between

parts of the facet – is a partial textual entailment.

System for Recognizing Faceted Entailment and

Its Modules

In (Levy et al., 2013) it is also proposed a system for

recognizing faceted entailment. This system will be

also a starting point for our improvements.

It consists of three independent modules such that

each one for given inputs – a text T and a facet

, w

) – returns the result of recognizing faceted en-

tailment:

• Exact Match – T is represented as a B-o-W con-

taining all lemmas and tokens from T . If both

lemmas of w

and w

are contained in this B-o-

W, than the decision is positive, otherwise not.

Such exact match was used as a baseline in several

textual entailment challenges (Bentivogli et al.,

2011).

• Lexical Inference – By this module it is checked

whether both words w

and w

or semantically

related words appear in T . Similarity score of

given words is computed using Resnik similarity

measure (Resnik, 1995) over WordNet (Fellbaum,

1998). If the value of similarity score between w

and a word t

from T is greater or equal to a given

threshold (authors empirically set this threshold to

0.9), than the match of w

and t

is accepted. If

both words from the facet have their matches in

T , then the decision is positive, otherwise not.

• Syntactic Inference – This module is based on the

previously mentioned BIUTEE system. It oper-

ates on dependency trees and applies a sequence

of knowledge-based transformations converting T

to H. The entailment is determined depending on

the “cost” of generating hypothesis from the text.

The BIUTEE deals with dependency trees, thus

both T and the given facet must be parsed. To

obtain the dependency tree of the facet, the fol-

lowing steps are processed: parsing H (obtaining

dependency tree of H), locating nodes referring to

and w

, ﬁnding their lowest common ancestor

a within the H’s dependency tree and selecting a

path from w

to w

via a. This path is a depen-

dency tree of the facet and it is transferred along

with the dependency tree of T to BIUTEE as in-

puts. The BIUTEE result is taken as decision of

recognizing faceted entailment involved.

This system was examined in different conﬁgura-

tions (employing different combinations of its three

modules) over the SciEntsBank corpus where the

facet decomposition was already made, i. e. cor-

responding facets were provided – they were auto-

matically extracted from the corpus and manually se-

lected/checked.

These conﬁgurations were:

1. Baseline: ExactM

2. BaseLex: ExactM ∨ LexicalI

3. BaseSyn: ExactM ∨ SyntacticI

4. Disjunction: ExactM ∨ LexicalI ∨ SyntacticI

5. Majority: ExactM ∨ (LexicalI ∧ SyntacticI)

In different scenarios (i. e. different subsets of the

corpus), the Majority conﬁguration outperforms all

other conﬁgurations – using F

measure it achieved

results from 0.765 to 0.816 (depending on the sce-

nario), since the Baseline result varies from 0.670 to

0.713 and BaseLex from 0.710 to 0.760. The com-

plete table of results is provided in (Levy et al., 2013).

As shown by the authors (Levy et al., 2013), a sys-

tem for recognizing partial textual entailment can be

used even for RTE. The process contains three conse-

quent steps:

1. Decompose the hypothesis into facets.

2. Determine whether each facet is entailed.

3. Aggregate the individual facet results and decide

on complete textual entailment accordingly.

Since the authors used already prepared facets, the

ﬁrst step was obtained “free of charge” (facets were

prepared as a part of training/testing data). When

building a system for RTE based on recognizing par-

tial textual entailment, an auxilary application for

facet decomposition (implementing the ﬁrst step) has

a big inﬂuence on the overall result, i. e. wrong de-

composition may lead to inferior results.

Current Systems based on Partial Textual

Entailment

The idea of partial textual entailment and/or faceted

entailment is so far used in a few systems – focused

mainly on student response analysis or grading (Bur-

rows et al., 2015). There are also intentions to use this

concept in text summarization (Gupta et al., 2014) or

attempts to use it when processing tweets (Rudrapal

and Bhattacharya, 2014).

3.3 Related Issues: Semantic Text

Similarity and Plagiarism Detection

Both above-mentioned notions of textual entailment

and partial textual entailment are related to the prob-

lem of semantic textual similarity. We will follow

Computing Semantic Textual Similarity based on Partial Textual Entailment

the meaning of this notion presented in (Agirre et al.,

2015):

“Given two snippets of text, semantic textual sim-

ilarity (STS) captures the notion that some texts are

more similar than others, measuring their degree of

semantic equivalence. Textual similarity can range

from complete unrelatedness to exact semantic equiv-

alence, and a graded similarity score intuitively cap-

tures the notion of intermediate shades of similarity,

as pairs of text may differ from some minor nuanced

aspects of meaning to relatively important semantic

differences, to sharing only some details, or to simply

unrelated in meaning.”

STS and textual entailment differ in several prop-

erties: STS is a bidirectional graded equivalence of

text snippets (Agirre et al., 2015), whereas textual

entailment deals with “direction” and this notion is

not graded. Indeed, partial textual entailment can

serve as a starting point for establishing the STS re-

lation between text snippets as we propose. Simi-

larly as textual entailment, STS attracts attention of

NLP community due to wide range of potential ap-

plications, containing among others plagiarism detec-

tion, dialogue systems etc. These topics are probably

the inspiration of furthercoming SemEval-2016 Task

1 challenge.

Plagiarism detection task is often considered as

a possible application of textual entailment. In

Merriam-Webster-Online-Dictionary, the meaning of

plagiarism is:

• to steal and pass off (the ideas or words of an-

other) as ones own

• to use (anothers production) without crediting the

source

• to commit literary theft

• to present as new and original an idea or product

derived from an existing source.

Obviously, discovering plagiarism of certain types

is in principle equivalent to solving our issues men-

tioned in the Introduction, namely the “duplicity task”

and “multiple reporting”. Hence, methods we are go-

ing to investigate, are potentially useful in plagiarism

detection.

Let us notice that the ﬁrst and/or last bullet also

cover translating existing works and the last one also

contains “self-plagiarism”. As mentioned in (Re-

hurek, 2008), plagiarism is an act of crime. Hence,

it is a conscious act. In R&D funding, the assumption

of consciousness is not so important – it is also neces-

sary to detect similar projects independently proposed

by different institutions. Indeed, (our) way of solving

this issue rely only on the semantic content not on the

background.

The borderline between a fair treatment and a pla-

giarism is not crisp – plagiarism is “a fuzzy notion”:

as previously mentioned, turning conference contri-

butions to regular journal papers is more-or-less ac-

ceptable practice.

Employing scoring based on a partial textual

entailment when computing STS and a conse-

quent application in a real-world system for duplic-

ity/plagiarism detection was not investigated yet.

3.4 Keystone of a Furthercoming

Approach: word2vec Model

Word embeddings are low-dimensional vector repre-

sentations of words. Nowadays, word2vec model be-

longs to the most popular word embedding model –

according to number of practical applications used in

various semantic tasks including machine translation

or sentiment analysis.

Word2vec model arises from the idea of predict-

ing the neighbours of a word using a neural network.

The (vector) representations of words are learned us-

ing the distributed Skip-gram or Continuous Bag-

of-Words (CBOW), (Mikolov et al., 2013a). The

CBOW idea is to predict the word “in the middle”

from the surrounding words, whereas in Skip-gram

model the training objective is to learn vector repre-

sentations that are good at predicting its context in the

same sentence. Because of its simplicity, the Skip-

gram and CBOW models can be trained on a large

amount of text data: in a parallelized implementation

(code.google.com/p/word2vec) can learn a model

from billions of words in hours (Mikolov et al.,

2013b).

Word2vec model belong to a class of distributed

representations for words. The main attribute of

distributed representations (proposed relatively long

time ago, in the second half of 80th in (Williams and

Hinton, 1986)), is that the representations of (seman-

tically) similar words are close in the vector space.

Word2vec representations capture many linguistic

regularities and many types of similarities that

can be expressed as linear translations, (Mikolov

et al., 2013c). As an illustration we provide a

well known example: representation(king) −

representation(man) + representation(women) is

close to representation(queen).

The vectors represent relationships between con-

cepts via linear operations. For example, vec-

tor representation(France) − representation(Paris)

is close to the vector representation(Italy) −

representation(Rome), (Mikolov et al., 2013b).

This model has a solid mathematical/computer

science background, we are going to use some of the

DC3K 2015 - Doctoral Consortium on Knowledge Discovery, Knowledge Engineering and Knowledge Management

characteristics of this model “from the user’s view”,

omitting formalization of optimization tasks being

solved when learning the neural network.

Word2vec provides two basic tools to use with

these vector representations: distance and analogy.

The distance tool returns a list of the closest neigh-

bours of a given word w.r.t. cosine similarity over

the vector representation. The analogy tool allows

us to query for regularities captured in the vector

model through simple vector subtraction and addition,

(Mi

narro-Gim

enez et al., 2015).

Word2vec model has already been employed in

solving semantic similarity tasks (but in the different

manner as proposed here), for example in WHUHJP

system for estimating similarity of tweets (Xu et al.,

2015), word2vec representations were also used as

features for ML approaches to recognizing textual en-

tailment (Bjerva et al., 2014).

We expect that in the ﬁnal version of Ph.D. the-

sis, a comparison of systems for semantic similarity

presented in SemEval will be presented.

Employing word2vec Model in a Cross-lingual

Environment

Word2vec model can be successfully used in a bilin-

gual environment in terms of generating and extend-

ing dictionaries and phrase tables (Mikolov et al.,

2013b). The basic idea is relatively simple having

very little assumptions about the languages involved:

missing (unknown) translations are obtained by learn-

ing language structures over large monolingual data

and mapping between languages on a small domain

(in terms of the mapping). In other words, having

a set of concepts/notions, their word representations

have a similar geometric arrangements in both vec-

tor spaces (corresponding with source and target lan-

guage).The authors achieve almost 90 % precision@5

for translation between English and Spanish, more in

(Mikolov et al., 2013b) where also several visual, self-

explanatory representations are provided.

More formally, let us have n word pairs and their

vector representation (x

, z

)

i=1

, where x ∈ R

is a

vector representation of i-th word in the source lan-

guage and z ∈ R

a vector representation of its trans-

lation. The goal is to ﬁnd a matrix W such that W x

approximates z

. The matrix W is obtained as a solu-

tion of an optimization problem:

min

∑

i=1

kWx

− z

In (Mikolov et al., 2013b) it is solved with stochastic

gradient descent. When a translation of a new word

is needed, then we take its vector representation x in

source language space and compute z = W x. The last

step is to ﬁnd out the word such that its representation

(in the target vector space) is the closest to z in sense

of cosine similarity.

4 CURRENT WORK – BASELINE

APPROACH

For recommending similar documents and duplicity

task as well as exploring medical curricula, a solution

based on latent semantic analysis (LSA) is being cur-

rently tested.

The overall principle is simple: documents that

are taken into the account are transformed into a

plaintext form, then a document-term-matrix (DTM)

with tf-idf weighting is created. Dimensionality of

this DTM is consequently reduced by LSA. For simi-

larity computations cosine distance is used.

Pairs of documents where cosine similarity is

greater or equal to a given threshold are returned as

potentially duplicate. In recommendation task, top n

most similar documents are obtained for a given doc-

ument. Results of these “traditional” approaches will

serve as a baseline for our experiments further with

duplicates and recommending similar documents.

5 AIM OF THE DOCTORAL

PROJECT

In this section we are going to break the proposed

project into a set of consequent issues and describe

their main ideas.

5.1 Main Issues – Plan of the Work

The ﬁrst issue is an investigation of new methods for

recognizing partial textual entailment. The starting

point will be the architecture of the previously men-

tioned system (Levy et al., 2013). The idea is to

modify the Lexical Inference module in the sense of

replacing the former calculation of word similarity

based on WordNet by dealing with distances obtained

from word2vec model. No knowledge technologies

(like WordNet) will be used. This task also requires

(training and testing) data sets to be prepared. This

issue will be ﬁnished by the comparison of original

Levy’s system and the new one.

The second issue is extending the architecture to-

wards cross-lingual setting. The task is to decide

whether the text T in the source language entails a

facet (w

, w

) in the target language. The proposed

Computing Semantic Textual Similarity based on Partial Textual Entailment

method arises from the Lexical Inference module,

again within the word2vec framework: having a word

in the target language – assuming we already have

computed the liner mapping ϕ (represented by the ma-

trix W – see the previous chapter). For w

we calcu-

late ϕ

−1

) and compute its cosine similarity to rep-

resentation t

of a word from the hypothesis. We will

consider such t

from the hypothesis, such that cosine

similarity is the lowest. If the similarity is higher than

a given threshold and for w

so, then T and (w

, w

)

constitutes a (cross-lingual) partial textual entailment,

otherwise not. This issue will be analogously as the

previous one completed by a comparison – in this

case, we will compare results of the monolingual vari-

ant and the bilingual.

The third issue is an automatic extraction of facets.

In both issues above, it was assumed (similarly as

in (Levy et al., 2013)) that facets are obtained exter-

nally/manually from the hypothesis. Thus, a natural

question arises: whether (and how) this process can

be automated by ML methods. At ﬁrst, we would like

to check if it is possible to use only the features de-

rived from the word2vec representation or it is neces-

sary to employ features that come from the syntactic

parsing of the hypothesis. This issue requires propos-

ing of suitable evaluations of the process of generat-

ing facets from a given hypothesis.

The fourth issue is the development of the scor-

ing method for STS task. Having two sentences S

and T , we can compute the percentage of facets from

T that are entailed by S. Given two text snippets A

and B and a sentence S from A, we are able to ﬁnd

out a sentence T from B such that this percentage is

the highest among all sentences in B. We obtain an

entailment score with respect to (A, B) by averaging

this percentages over all sentences in A. In a similar

way we can deﬁne a paraphrase score with respect to

(A, B). Therefore we obtain a quadruple of values:

1. an entailment score with respect to (A, B)

2. an entailment score with respect to (B, A)

3. a paraphrase score with respect to (A, B)

4. a paraphrase score with respect to (B, A)

This quadruple of values models the “entailment”-

relationship between documents. For instance, if A is

a summarization of B, then the ﬁrst and third value

will be greater or equal to the second and fourth value.

The particular goal is to investigate how to turn these

scores (using probably also other features of A and

B) into a single value that will be used for estimat-

ing semantic textual similarity of a pair of documents

and also implement a web service that will implement

these computations.

The ﬁfth issue is a development of a decoding

module. As shown in (Mi

narro-Gim

enez et al., 2015),

word2vec can be used for capturing different seman-

tic relations, e.g. hypernym/hyponym, membership,

etc. The main idea of this issue is to replace Syntactic

inference module of the system described in the pre-

vious chapter by analogous module that will not deal

with dependency trees but with sequences of vectors.

According to the basic classiﬁcation, this approach

belongs to decoding methods. Operations such as re-

placing a word by its hypernym will be processed as

replacing the corresponding vector representation in

order to transform the initial text T representation to

an ordered pair of vectors close to the vector repre-

sentation of considered facet.

The sixth issue is a development and evaluation of

an application for “duplicate task” and “recommen-

dation task”. These applications will have a form of

web services. The evaluation will be performed using

the same metrics traditionally employed in the evalua-

tion of STS systems (particularly SemEval-2016 Task

1), i. e., mean Pearson correlation between the system

output and the gold standard annotations. The goal is

to present a system that will provide better results in

recommending semantically similar documents than

currently developed system that uses LSA.

It should be adjusted that the primary goal

of this work is to create a real-word applica-

tion(s)/components solving mentioned issues rather

than achieving good results in standardized bench-

marks. Non-excellent results on standardized tasks

can be balanced by the simplicity of the entire system

and consequent easy maintenance. Nevertheless, we

expect that results of proposed R(P)TE system in stan-

dard evaluations will be comparable with other up-to-

day recognizing (partial) textual entailment systems.

6 CONCLUSION

In this work, we recall and discuss the notion of

(partial and faceted) textual entailment and we pro-

pose a system for recognizing partial textual entail-

ment based on the word2vec model. We present an

idea how to extend this proposed system for recog-

nizing partial textual entailment to multilingual envi-

ronment. The aim of the doctoral project is to use

this system to calculate STS based on scores derived

from partial textual entailment features (among multi-

lingual documents). We want to achieve better results

than standard methods based on LSA.

DC3K 2015 - Doctoral Consortium on Knowledge Discovery, Knowledge Engineering and Knowledge Management

REFERENCES

Agirre, E., Banea, C., et al. (2015). Semeval-2015 task 2:

Semantic textual similarity, english, s-panish and pi-

lot on interpretability. In Proceedings of the 9th Inter-

national Workshop on Semantic Evaluation (SemEval

2015), June.

Androutsopoulos, I. and Malakasiotis, P. (2010). A survey

of paraphrasing and textual entailment methods. Jour-

nal of Artiﬁcial Intelligence Research, pages 135–187.

Bentivogli, L., Clark, P., Dagan, I., Dang, H., and Giampic-

colo, D. (2011). The seventh pascal recognizing tex-

tual entailment challenge. Proceedings of TAC, 2011.

Bjerva, J., Bos, J., van der Goot, R., and Nissim, M. (2014).

The meaning factory: Formal semantics for recogniz-

ing textual entailment and determining semantic sim-

ilarity. SemEval 2014, page 642.

Burrows, S., Gurevych, I., and Stein, B. (2015). The eras

and trends of automatic short answer grading. Interna-

tional Journal of Artiﬁcial Intelligence in Education,

25(1):60–117.

Clark, Fellbaum, H. (2006). The Boeing-Princeton-

ISI (BPI) textual entailment test suite.

http://www.cs.utexas.edu/ pclark/bpi-test-suite/.

de Salvo Braz, R., Girju, R., Punyakanok, V., Roth, D.,

and Sammons, M. (2006). An inference model for

semantic entailment in natural language. In Machine

Learning Challenges. Evaluating Predictive Uncer-

tainty, Visual Object Classiﬁcation, and Recognising

Tectual Entailment, pages 261–286. Springer.

Dzikovska, M. O., Nielsen, R. D., and Brew, C. (2012). To-

wards effective tutorial feedback for explanation ques-

tions: A dataset and baselines. In Proceedings of the

2012 Conference of the North American Chapter of

the Association for Computational Linguistics: Hu-

man Language Technologies, pages 200–210. Associ-

ation for Computational Linguistics.

Dzikovska, M. O., Nielsen, R. D., Brew, C., Leacock, C.,

Giampiccolo, D., Bentivogli, L., Clark, P., Dagan, I.,

and Dang, H. T. (2013). Semeval-2013 task 7: The

joint student response analysis and 8th recognizing

textual entailment challenge. Technical report, DTIC

Document.

Erk, K. and Pad

o, S. (2009). Paraphrase assessment in

structured vector space: Exploring parameters and

datasets. In Proceedings of the Workshop on Geomet-

rical Models of Natural Language Semantics, pages

57–65. Association for Computational Linguistics.

Fellbaum, C. (1998). WordNet. Wiley Online Library.

Gupta, A., Kaur, M., Singh, A., Goel, A., and Mirkin, S.

(2014). Text summarization through entailment-based

minimum vertex cover. Lexical and Computational

Semantics (* SEM 2014), page 75.

Harmeling, S. (2009). Inferring textual entailment with a

probabilistically sound calculus. Natural Language

Engineering, 15(04):459–477.

Kouylekov, M. and Magnini, B. (2005). Recognizing tex-

tual entailment with tree edit distance. In Proceedings

of the PASCAL RTE Challenge, pages 17–20.

Kouylekov, M. and Magnini, B. (2006). Combining lex-

ical resources with tree edit distance for recogniz-

ing textual entailment. In Machine Learning Chal-

lenges. Evaluating Predictive Uncertainty, Visual Ob-

ject Classiﬁcation, and Recognising Tectual Entail-

ment, pages 217–230. Springer.

Levy, O., Zesch, T., Dagan, I., and Gurevych, I. (2013).

Recognizing partial textual entailment. In ACL (2),

pages 451–455.

Malakasiotis, P. and Androutsopoulos, I. (2007). Learn-

ing textual entailment using svms and string similarity

measures. In Proceedings of the ACL-PASCAL Work-

shop on Textual Entailment and Paraphrasing, pages

42–47. Association for Computational Linguistics.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a).

Efﬁcient estimation of word representations in vector

space. arXiv preprint arXiv:1301.3781.

Mikolov, T., Le, Q. V., and Sutskever, I. (2013b). Exploiting

similarities among languages for machine translation.

arXiv preprint arXiv:1309.4168.

Mikolov, T., Yih, W.-t., and Zweig, G. (2013c). Linguistic

regularities in continuous space word representations.

In HLT-NAACL, pages 746–751.

narro-Gim

enez, J. A., Mar

ın-Alonso, O., and Samwald,

M. (2015). Applying deep learning techniques on

medical corpora from the world wide web: a pro-

totypical system and evaluation. arXiv preprint

arXiv:1502.03682.

Moldovan, D. I. and Rus, V. (2001). Logic form transfor-

mation of wordnet and its applicability to question an-

swering. In Proceedings of the 39th Annual Meeting

on Association for Computational Linguistics, pages

402–409. Association for Computational Linguistics.

Nev

rilov

a, Z. (2014a). Paraphrase and textual entailment

generation. In Text, Speech and Dialogue, pages 293–

300. Springer.

Nev

rilov

a, Z. (2014b). Paraphrase and Textual Entailment

Generation in Czech [online]. PhD thesis, Faculty of

Informatics, Masaryk University Brno.

Nielsen, R. D., Ward, W., and Martin, J. H. (2009). Recog-

nizing entailment in intelligent tutoring systems. Nat-

ural Language Engineering, 15(04):479–501.

Nielsen, R. D., Ward, W., Martin, J. H., and Palmer, M.

(2008). Annotating students understanding of science

concepts. In In Proc. LREC.

Rehurek, R. (2008). Semantic-based plagiarism detection

[online]. Ph.d. thesis proposal, Faculty of Informatics,

Masaryk University Brno.

Resnik, P. (1995). Using information content to evaluate se-

mantic similarity in a taxonomy. arXiv preprint cmp-

lg/9511007.

Rudrapal, D. and Bhattacharya, B. (2014). Recognition of

partial textual entailment for bengali tweets. Social-

India 2014, 2014:29.

Stern, A. and Dagan, I. (2012). Biutee: A modular open-

source system for recognizing textual entailment. In

Proceedings of the ACL 2012 System Demonstrations,

pages 73–78. Association for Computational Linguis-

tics.

Computing Semantic Textual Similarity based on Partial Textual Entailment

Tatu, M. and Moldovan, D. (2006). A logic-based seman-

tic approach to recognizing textual entailment. In

Proceedings of the COLING/ACL on Main conference

poster sessions, pages 819–826. Association for Com-

putational Linguistics.

Tian, R., Miyao, Y., and Matsuzaki, T. (2014). Logical

inference on dependency-based compositional seman-

tics. In Proceedings of ACL, pages 79–89.

Williams, D. R. G. H. R. and Hinton, G. (1986). Learning

representations by back-propagating errors. Nature,

pages 523–533.

Xu, W., Callison-Burch, C., and Dolan, W. B. (2015).

Semeval-2015 task 1: Paraphrase and semantic sim-

ilarity in twitter (pit). In Proceedings of the 9th In-

ternational Workshop on Semantic Evaluation (Se-

mEval).

Zanzotto, F., Pennacchiotti, M., and Moschitti, A. (2009).

A machine learning approach to textual entail-

ment recognition. Natural Language Engineering,

15(04):551–582.

DC3K 2015 - Doctoral Consortium on Knowledge Discovery, Knowledge Engineering and Knowledge Management