Computing Semantic Textual Similarity based on Partial Textual
Entailment
Martin V
´
ıta
NLP Centre, Faculty of Informatics, Botanick
´
a 68a, 602 00 Brno, Czech Republic
1 INTRODUCTION
Nowadays, textual entailment is a well-founded no-
tion. There are several definitions of textual entail-
ment: for our purposes, we will use the following one
(Androutsopoulos and Malakasiotis, 2010):
“By textual entailment is understood a relation-
ship between coherent text T and a language expres-
sion H, which is considered as a hypothesis. T entails
H if the meaning of H as interpreted in context of T,
can be deduced from the meaning of T .
Recognizing textual entailment (abbreviated of-
ten as RTE) is a decision problem whether T entails
H. During the last ten years textual entailment at-
tracts an intensive attention of the NLP community.
RTE is currently a deeply studied problem having
consequences for many different applications of NLP
– including multi-document summarization, machine
translation evaluation, student response analysis, etc.
RTE is closely related to a problem of paraphrases
recognizing. A paraphrase s
0
of a sentence s is a sen-
tence that has the same or almost the same meaning as
s in a given context. The relationship between para-
phrasing and textual entailment is straightforward: a
paraphrase can be considered as a mutual textual en-
tailment (s entails s
0
and s
0
entails s simultaneously).
Due to this fact methods for RTE and paraphrasing
are often treated together, although some differences
are taken into the account.
RTE is a binary decision problem. When obtain-
ing a negative result, it is not possible to say if T al-
most entails H, i. e. how close is H to a sentence that
is entailed by T . Roughly said, textual entailment is
crisp and rigid. The notion of partial textual entail-
ment is an attempt to incorporate situations such that
T “partially” entails H. According to (Levy et al.,
2013), we say that an ordered pair (T, H) forms a par-
tial textual entailment if a fragment of the hypothesis
H is entailed by T .
To obtain a better idea of the problem, we provide
an illustration on three examples:
1. Wonderworks Ltd. constructed the new bridge.
2. The new bridge was constructed by Wonderworks
Ltd.
3. Wonderworks Ltd. constructed the new bridge
over the river Thames.
4. A new bridge over the river Thames was con-
structed.
The first two sentences are mutual paraphrases.
The first (or the second one) and the third one form
a partial textual entailment. The last one is entailed
by the third one.
The main aim of the doctoral project is to investi-
gate new methods for recognizing partial textual en-
tailment in both mono- and cross-lingual setting, de-
scribe methods for computing semantic textual simi-
larity based on a partial textual entailment score, de-
velop a system for recognizing partial textual entail-
ment and implement functionality of computing a se-
mantic textual similarity within a real-world applica-
tion framework.
2 MOTIVATION
Although the topic is probably interesting from the
theoretical point of view, the motivation for the pro-
posed project arises mainly from practical issues cur-
rently being solved in the author’s practice (regarding
R&D policy and management).
Recommendation of similar documents in R&D
domain (i. e. papers, patents, . . . ) related to
a given project (containing similar ideas) for a re-
viewer in order to provide an information support.
Discovery of (potentially) duplicate projects -
projects with similar content, i. e. with propos-
als with a big amount of sentences that are mutual
(partial) paraphrases. The goal is straightforward
– to avoid financing of the same or closely related
thing twice from the public sources.
Identification of groups of R&D results of a single
author based on a repeatedly used idea for ex-
ample, papers describing an application of a cer-
Víta, M..
Computing Semantic Textual Similarity based on Partial Textual Entailment.
In Doctoral Consortium (DC3K 2015), pages 3-12
3
tain method on slightly different objects of re-
search or reusing ideas presented in conference
papers later in journal papers, publishing slightly
changed versions of papers in different languages,
etc. This phenomenon is typical for systems were
“funding depends on quantity” and it leads to dis-
tortions in measuring the real performance of an
individual or an institution. We will refer this is-
sue as “multiple reporting task”.
Another particular application of our system will
be in the field of in-depth exploring (medical) curric-
ula in order to improve author’s results concerning
creating a balanced content of medical study. Parts
of the study courses, disciplines, learning units etc.
are represented as textual information in plaintext
files. (As we will mention in the further text, the
language of these representations has some important
features.) When exploring a given curricula or com-
paring courses of different faculties/universities, it is
important to answer questions like “if the content of a
given course is (partially) covered by another course”,
“what is the similarity of two courses” and identify
parts of overlapping contents. Current approaches are
based on a simple bag-of-words representation and a
cosine similarity (in some cases improved by LSA
application). These approaches are not able to cap-
ture valuable aspects such as paraphrasing of parts of
course descriptions.
According to the potentially practical utilization,
there are some limitations and requirements of the fi-
nal application:
maximally reduced usage of external tools, espe-
cially NLP tools (in except lemmatization and/or
stemming),
language independence whenever possible.
The proposed approach should also enable further
extending towards cross-lingual setting (that allows us
to compute a semantic textual similarity of documents
in different languages).
As mentioned in (Nev
ˇ
e
ˇ
rilov
´
a, 2014b), “the typi-
cal attribute of the current state-of-the-art in this area
[textual entailment] is that a number of articles de-
scribe methods (with possible applications) whereas
few articles describe applications of the proposed
methods in large systems. If successful, the proposed
thesis can fill this gab and turn methods into a real-
world application.
The hypothesis is that the approach proposed in
this paper based on the (partial) textual entailment
is significantly better than LSA based methods (both
mono- and multilingual).
3 STATE-OF-THE-ART
This section provides a brief overview of state-of-the-
art: it describes the keystones of our approach rec-
ognizing textual entailment, recognizing partial tex-
tual entailment and word2vec model. It also regards
semantic textual similarity and plagiarism detection.
3.1 Recognizing Textual Entailment
One of the definitions of the notion of textual en-
tailment was provided in the previous chapter. The
textual entailment differs from other kinds of en-
tailment (such as logical or analytical entailment)
the description of distinctions among these types
of entailment is out of scope of this work since
it is application-oriented. The relevant discussion
concerning this topic is summarized in (Nev
ˇ
e
ˇ
rilov
´
a,
2014b).
For reader’s convenience, we are going to intro-
duce a commonly used notation: T H will be an
abbreviation of ”T entails H“ or equivalently ”H is
entailed by T. The other case when T H does not
hold, we will use H 6→ T . As mentioned above, rec-
ognizing textual entailment task is a binary decision
task whether H T or H 6→ T . In the further text we
will also write that an ordered pair (T, H) is a textual
entailment whenever T H.
As mentioned in the previous section, the defini-
tion of the textual entailment is strict (in a sense of an
arbitrary, but widely accepted definition): if T 6→ H,
there is no way how to measure how close is H to
some H
0
such that T H
0
. In other words, from the
RTE viewpoint, a hypothesis H completely unrelated
to the text T is treated in a same way as a hypothesis
H
0
that is “almost entailed”.
Classification of RTE Approaches
An up-to-date comprehensive classification of RTE
approaches is provided in (Nev
ˇ
e
ˇ
rilov
´
a, 2014b). This
classification arises from the classification introduced
in an older survey (Androutsopoulos and Malakasio-
tis, 2010), but enriches it by adding a higher level of
classification methods are divided to basic and ad-
vanced methods.
The basic approaches are characterized by deal-
ing with sequences of words. To this class we assign:
methods based on:
B-o-W (bag-of-words) approaches: these meth-
ods are based on surface string similarity, in some
cases after certain preprocessing is applied. The
main idea is to match words in H with “most suit-
able” words in T . Several string similarity mea-
DC3K 2015 - Doctoral Consortium on Knowledge Discovery, Knowledge Engineering and Knowledge Management
4
sures are used (e. g. edit distance). These ap-
proaches are usually straightforward in case of
paraphrasing detection since sentences involved
have approximately the same length.
Vector space approaches: these methods deal with
vector representation of T and H and comput-
ing their similarity (often by the cosine distance).
This approach was successfully used on para-
phrasing task by (Erk and Pad
´
o, 2009).
As mentioned in (Nev
ˇ
e
ˇ
rilov
´
a, 2014b), the main
advantage of these approaches is their relative lan-
guage independence. Their main disadvantage is the
inability to handle expressions that do not preserve
the truth value, e. g. negations if expressed as sepa-
rate words.
1
In contrast, advanced approaches deal with the
structure of the text.
To this class belong the following methods:
Logic-based approaches: the core of these ap-
proaches is a mapping of T and H to logical ex-
pressions Φ
T
and Φ
H
(for each possible read-
ing of T, H, respectively) and checking the log-
ical entailment, usually in form (Φ
T
B) |= Φ
H
,
where B stands for corresponding logical repre-
sentation of common knowledge. This part of
the task is done using a theorem prover. Logi-
cal formalisms being taken into the account, con-
tain mainly first order logic capturing even tem-
poral aspects (Tatu and Moldovan, 2006) and de-
scription logics (de Salvo Braz et al., 2006). The
crucial question of these approaches is obtain-
ing the common knowledge. Typical knowledge
bases used as starting point are WordNet (Fell-
baum, 1998), Extended WordNet (Moldovan and
Rus, 2001), FrameNet and VerbNet.
Syntactic similarity approaches: these methods
deal with (dependency) tree representations and
use more or less sophisticated computations rang-
ing from simple common edge count (Malakasi-
otis and Androutsopoulos, 2007) to tree edit dis-
tance (Kouylekov and Magnini, 2005), sometimes
combined with lexical sources like in (Kouylekov
and Magnini, 2006). Other approaches compare
the parse tree of H with subtrees of T (Zanzotto
et al., 2009).
1
From our point of view there arises a natural question
whether these language constructions occur in “scientific
papers” or texts describing curricula, such as syllabi (that
are both important for our purposes) less or more often than
in corpora where experiments with RTE were performed. It
is expected that several collections of texts involved in our
work will not contain this kind of expressions – but this is a
hypothesis that should be proved.
Approaches based on similarity measures over
symbolic meaning representations: in this case,
semantic representation of H is compared with a
semantic representation of T . Again, FrameNet or
WordNet knowledge bases are used.
Approaches based on decoding: idea of these ap-
proaches is the application of transformation rules
like replacing synonyms, hyponyms/hypernyms
replacements, paraphrase patterns etc. Such trans-
formations can be associated with confidence
scores learned from the corpus and (T, H) is de-
cided to be a textual entailment in case of the sum
of maximum-score sequence is greater or equal
than a given threshold. This approach – combined
with probabilistic methods was used in (Harmel-
ing, 2009).
Approaches based on machine learning methods
have a separate category in (Androutsopoulos and
Malakasiotis, 2010) that cannot be simply transferred
to this hierarchy since their background varies from
using simple surface strings features to advanced fea-
tures derived from semantic representations.
For completeness let us mention that RTE is only
one of several tasks connected with textual entail-
ment. Other ones are namely textual entailment gen-
eration and textual entailment extraction. Indeed,
these tasks are not relevant from our perspective.
Selection of Existing Systems, Test Suites and
Corpora
Although the textual entailment tasks are widely stud-
ied at least in the last five years, the number of RTE
systems is relatively low.
A respectable source of existing functional RTE
systems is the ACL Web Wiki page. In the period
of writing this proposal, six functional systems were
presented, namely: VENSES (based on two subsys-
tems: a reduced version of GETARUN which pro-
duces the semantics from complete linguistic rep-
resentations and a partial robust constituency-based
parser), Nutcracker (a system using first order logic
a theorem prover and a finite model builder), ED-
ITS Edit Distance Textual Entailment Suite, BI-
UTEE Bar-Ilan University Textual Entailment En-
gine, formerly separate application, now a part of the
EOP, based on dealing with dependency trees and
performing knowledge-based transformations (Stern
and Dagan, 2012), EXCITEMENT Open Platform
(EOP - a generic architecture and a comprehensive
implementation for textual inference in multiple lan-
guages. The platform includes state-of-art algorithms,
a large number of knowledge resources, and facilities
for experimenting and testing innovative approaches
Computing Semantic Textual Similarity based on Partial Textual Entailment
5
(?) and TIFMO, a system based on Dependency-
based Compositional Semantics (DCS) and logical in-
ference (Tian et al., 2014).
The BIUTEE system will be recalled in the next
section.
Since RTE become popular NLP task, there ap-
pear several test suites or corpora in order to com-
pare results of different systems. Well known collec-
tion of RTE test suites were prepared for RTE work-
shops. During 2004-2013, eight of these workshops
took place first as Pascal RTE Challenges then as
tracks of Text Analysis Conference and the last as a
track of SemEval challenge. Links to these datasets
are provided within ACL Wiki, some of them are
available for direct download, some are freely avail-
able upon a request. We also should point out two
other corpora: Microsoft Research Paraphrase Cos-
pus (MSR) and Boeing-Princeton-ISI (BPI). The first
one is – after (Nev
ˇ
e
ˇ
rilov
´
a, 2014a) – most widely used
benchmark for paraphrase recognition. It contains
more than 5000 sentences from which more than half
is annotated as paraphrases. The second one is fo-
cused on textual entailment. Compared with Pascal
RTE suites, according to (Clark, 2006), BPI is sim-
pler in terms of syntax but more challenging in the se-
mantic viewpoint, with the intention of focusing more
on the knowledge rather than just linguistic require-
ments. Other corpora are again listed in ACL Wiki in
Textual Entailment Resource Pool section.
For our purposes, the most interesting collec-
tion of test suites comes from The Joint Student Re-
sponse Analysis and 8th Recognizing Textual Entail-
ment Challenge at SemEval-2013 Task 7. They were
inspired by developments of tutorial dialogue systems
(Dzikovska et al., 2012). It contains the SciEntsBank
corpus (Dzikovska et al., 2013), that was originally
developed to assess student answers in a very fine-
grained level. Moreover, SciEntsBank Extra (Nielsen
et al., 2008) contains additional annotations that break
down answers into “facets” or low-level concepts and
relationships connecting them.
3.2 Partial Textual Entailment
The core of our work is a development of a system for
recognizing partial textual entailment. In this section
we introduce the notion of a partial textual entailment
and a related notion of a faceted textual entailment.
Then, we describe one existing system approach to
recognizing partial textual entailment that will serve
us as a starting point for our research.
The Notion of Partial Textual Entailment and Its
Motivation
The fragments of an idea of partial textual entailment
were introduced in (Nielsen et al., 2009), although
this notion was not explicitly mentioned in the paper.
Partial textual entailment began to be elaborated in re-
cent years by Omer Levy (Levy et al., 2013). It is a
“response” to above-mentioned rigidity of textual en-
tailment.
Following up the paper (Levy et al., 2013), let us
consider a couple of sentences:
T := Muscles generate movement in the body.
H := The main job of muscles is to move bones.
Obviously, T does not entail H. Nevertheless we
“feel there is some relationship between T and H”,
such that H is almost entailed by T . Thus, it is rea-
sonable to ask how close is T and H to entailment. So
there arises a need for a graded approach.
Recall the previously mentioned definition, we
say that an ordered pair (T, H) forms a partial tex-
tual entailment if a fragment of the hypothesis H is
entailed by T .
To distinguish these both forms of textual entail-
ment, for previously defined defined notion we will
use the expression complete textual entailment. Triv-
ially, if (T, H) forms a complete textual entailment,
then (T, H) forms a partial textual entailment con-
verse generally does not hold, i.e. the condition that
each fragment of H is entailed by T does not neces-
sarily ensure the complete textual entailment, when
the fragments have bounded length (for example: ex-
pressing of sequences of actions where ordering is im-
portant.)
In (Nielsen et al., 2009) Nielsen at al. have defined
the notion of a facet in this setting. Given a hypoth-
esis H, a facet is an ordered pair of words (w
1
, w
2
)
that are both contained in H accompanied with the di-
rect semantic relation between w
1
, w
2
. In (Levy et al.,
2013), they use a simplified model when facet is con-
sidered as the pair of words w
1
, w
2
without an explicit
expression of the semantic relation. This simplified
model will be also suitable for our purposes due to
certain characteristics of word2vec model. We will
recall this remark in the following section.
Now we are able to state a definition of recogniz-
ing faceted entailment. Recognizing faceted entail-
ment is a binary classification task whether the facet
(w
1
, w
2
) contained in the hypothesis H is expressed
or unaddressed by the text T (Levy et al., 2013).
Returning to our example, the facet (muscles,
move) refers to the agent role in H and is expressed
in the text T , whereas (move, bones) not.
DC3K 2015 - Doctoral Consortium on Knowledge Discovery, Knowledge Engineering and Knowledge Management
6
Obviously, the faceted entailment – in this simpli-
fied version omitting the semantic relation between
parts of the facet – is a partial textual entailment.
System for Recognizing Faceted Entailment and
Its Modules
In (Levy et al., 2013) it is also proposed a system for
recognizing faceted entailment. This system will be
also a starting point for our improvements.
It consists of three independent modules such that
each one for given inputs a text T and a facet
(w
1
, w
2
) returns the result of recognizing faceted en-
tailment:
Exact Match T is represented as a B-o-W con-
taining all lemmas and tokens from T . If both
lemmas of w
1
and w
2
are contained in this B-o-
W, than the decision is positive, otherwise not.
Such exact match was used as a baseline in several
textual entailment challenges (Bentivogli et al.,
2011).
Lexical Inference By this module it is checked
whether both words w
1
and w
2
or semantically
related words appear in T . Similarity score of
given words is computed using Resnik similarity
measure (Resnik, 1995) over WordNet (Fellbaum,
1998). If the value of similarity score between w
i
and a word t
j
from T is greater or equal to a given
threshold (authors empirically set this threshold to
0.9), than the match of w
i
and t
j
is accepted. If
both words from the facet have their matches in
T , then the decision is positive, otherwise not.
Syntactic Inference – This module is based on the
previously mentioned BIUTEE system. It oper-
ates on dependency trees and applies a sequence
of knowledge-based transformations converting T
to H. The entailment is determined depending on
the “cost” of generating hypothesis from the text.
The BIUTEE deals with dependency trees, thus
both T and the given facet must be parsed. To
obtain the dependency tree of the facet, the fol-
lowing steps are processed: parsing H (obtaining
dependency tree of H), locating nodes referring to
w
1
and w
2
, finding their lowest common ancestor
a within the Hs dependency tree and selecting a
path from w
1
to w
2
via a. This path is a depen-
dency tree of the facet and it is transferred along
with the dependency tree of T to BIUTEE as in-
puts. The BIUTEE result is taken as decision of
recognizing faceted entailment involved.
This system was examined in different configura-
tions (employing different combinations of its three
modules) over the SciEntsBank corpus where the
facet decomposition was already made, i. e. cor-
responding facets were provided they were auto-
matically extracted from the corpus and manually se-
lected/checked.
These configurations were:
1. Baseline: ExactM
2. BaseLex: ExactM LexicalI
3. BaseSyn: ExactM SyntacticI
4. Disjunction: ExactM LexicalI SyntacticI
5. Majority: ExactM (LexicalI SyntacticI)
In different scenarios (i. e. different subsets of the
corpus), the Majority configuration outperforms all
other configurations using F
1
measure it achieved
results from 0.765 to 0.816 (depending on the sce-
nario), since the Baseline result varies from 0.670 to
0.713 and BaseLex from 0.710 to 0.760. The com-
plete table of results is provided in (Levy et al., 2013).
As shown by the authors (Levy et al., 2013), a sys-
tem for recognizing partial textual entailment can be
used even for RTE. The process contains three conse-
quent steps:
1. Decompose the hypothesis into facets.
2. Determine whether each facet is entailed.
3. Aggregate the individual facet results and decide
on complete textual entailment accordingly.
Since the authors used already prepared facets, the
first step was obtained “free of charge” (facets were
prepared as a part of training/testing data). When
building a system for RTE based on recognizing par-
tial textual entailment, an auxilary application for
facet decomposition (implementing the first step) has
a big influence on the overall result, i. e. wrong de-
composition may lead to inferior results.
Current Systems based on Partial Textual
Entailment
The idea of partial textual entailment and/or faceted
entailment is so far used in a few systems focused
mainly on student response analysis or grading (Bur-
rows et al., 2015). There are also intentions to use this
concept in text summarization (Gupta et al., 2014) or
attempts to use it when processing tweets (Rudrapal
and Bhattacharya, 2014).
3.3 Related Issues: Semantic Text
Similarity and Plagiarism Detection
Both above-mentioned notions of textual entailment
and partial textual entailment are related to the prob-
lem of semantic textual similarity. We will follow
Computing Semantic Textual Similarity based on Partial Textual Entailment
7
the meaning of this notion presented in (Agirre et al.,
2015):
“Given two snippets of text, semantic textual sim-
ilarity (STS) captures the notion that some texts are
more similar than others, measuring their degree of
semantic equivalence. Textual similarity can range
from complete unrelatedness to exact semantic equiv-
alence, and a graded similarity score intuitively cap-
tures the notion of intermediate shades of similarity,
as pairs of text may differ from some minor nuanced
aspects of meaning to relatively important semantic
differences, to sharing only some details, or to simply
unrelated in meaning.
STS and textual entailment differ in several prop-
erties: STS is a bidirectional graded equivalence of
text snippets (Agirre et al., 2015), whereas textual
entailment deals with “direction” and this notion is
not graded. Indeed, partial textual entailment can
serve as a starting point for establishing the STS re-
lation between text snippets as we propose. Simi-
larly as textual entailment, STS attracts attention of
NLP community due to wide range of potential ap-
plications, containing among others plagiarism detec-
tion, dialogue systems etc. These topics are probably
the inspiration of furthercoming SemEval-2016 Task
1 challenge.
Plagiarism detection task is often considered as
a possible application of textual entailment. In
Merriam-Webster-Online-Dictionary, the meaning of
plagiarism is:
to steal and pass off (the ideas or words of an-
other) as ones own
to use (anothers production) without crediting the
source
to commit literary theft
to present as new and original an idea or product
derived from an existing source.
Obviously, discovering plagiarism of certain types
is in principle equivalent to solving our issues men-
tioned in the Introduction, namely the “duplicity task”
and “multiple reporting”. Hence, methods we are go-
ing to investigate, are potentially useful in plagiarism
detection.
Let us notice that the first and/or last bullet also
cover translating existing works and the last one also
contains “self-plagiarism”. As mentioned in (Re-
hurek, 2008), plagiarism is an act of crime. Hence,
it is a conscious act. In R&D funding, the assumption
of consciousness is not so important – it is also neces-
sary to detect similar projects independently proposed
by different institutions. Indeed, (our) way of solving
this issue rely only on the semantic content not on the
background.
The borderline between a fair treatment and a pla-
giarism is not crisp plagiarism is “a fuzzy notion”:
as previously mentioned, turning conference contri-
butions to regular journal papers is more-or-less ac-
ceptable practice.
Employing scoring based on a partial textual
entailment when computing STS and a conse-
quent application in a real-world system for duplic-
ity/plagiarism detection was not investigated yet.
3.4 Keystone of a Furthercoming
Approach: word2vec Model
Word embeddings are low-dimensional vector repre-
sentations of words. Nowadays, word2vec model be-
longs to the most popular word embedding model
according to number of practical applications used in
various semantic tasks including machine translation
or sentiment analysis.
Word2vec model arises from the idea of predict-
ing the neighbours of a word using a neural network.
The (vector) representations of words are learned us-
ing the distributed Skip-gram or Continuous Bag-
of-Words (CBOW), (Mikolov et al., 2013a). The
CBOW idea is to predict the word “in the middle”
from the surrounding words, whereas in Skip-gram
model the training objective is to learn vector repre-
sentations that are good at predicting its context in the
same sentence. Because of its simplicity, the Skip-
gram and CBOW models can be trained on a large
amount of text data: in a parallelized implementation
(code.google.com/p/word2vec) can learn a model
from billions of words in hours (Mikolov et al.,
2013b).
Word2vec model belong to a class of distributed
representations for words. The main attribute of
distributed representations (proposed relatively long
time ago, in the second half of 80th in (Williams and
Hinton, 1986)), is that the representations of (seman-
tically) similar words are close in the vector space.
Word2vec representations capture many linguistic
regularities and many types of similarities that
can be expressed as linear translations, (Mikolov
et al., 2013c). As an illustration we provide a
well known example: representation(king)
representation(man) + representation(women) is
close to representation(queen).
The vectors represent relationships between con-
cepts via linear operations. For example, vec-
tor representation(France) representation(Paris)
is close to the vector representation(Italy)
representation(Rome), (Mikolov et al., 2013b).
This model has a solid mathematical/computer
science background, we are going to use some of the
DC3K 2015 - Doctoral Consortium on Knowledge Discovery, Knowledge Engineering and Knowledge Management
8
characteristics of this model “from the user’s view”,
omitting formalization of optimization tasks being
solved when learning the neural network.
Word2vec provides two basic tools to use with
these vector representations: distance and analogy.
The distance tool returns a list of the closest neigh-
bours of a given word w.r.t. cosine similarity over
the vector representation. The analogy tool allows
us to query for regularities captured in the vector
model through simple vector subtraction and addition,
(Mi
˜
narro-Gim
´
enez et al., 2015).
Word2vec model has already been employed in
solving semantic similarity tasks (but in the different
manner as proposed here), for example in WHUHJP
system for estimating similarity of tweets (Xu et al.,
2015), word2vec representations were also used as
features for ML approaches to recognizing textual en-
tailment (Bjerva et al., 2014).
We expect that in the final version of Ph.D. the-
sis, a comparison of systems for semantic similarity
presented in SemEval will be presented.
Employing word2vec Model in a Cross-lingual
Environment
Word2vec model can be successfully used in a bilin-
gual environment in terms of generating and extend-
ing dictionaries and phrase tables (Mikolov et al.,
2013b). The basic idea is relatively simple having
very little assumptions about the languages involved:
missing (unknown) translations are obtained by learn-
ing language structures over large monolingual data
and mapping between languages on a small domain
(in terms of the mapping). In other words, having
a set of concepts/notions, their word representations
have a similar geometric arrangements in both vec-
tor spaces (corresponding with source and target lan-
guage).The authors achieve almost 90 % precision@5
for translation between English and Spanish, more in
(Mikolov et al., 2013b) where also several visual, self-
explanatory representations are provided.
More formally, let us have n word pairs and their
vector representation (x
i
, z
i
)
n
i=1
, where x R
d
1
is a
vector representation of i-th word in the source lan-
guage and z R
d
2
a vector representation of its trans-
lation. The goal is to find a matrix W such that W x
i
approximates z
i
. The matrix W is obtained as a solu-
tion of an optimization problem:
min
W
n
i=1
kWx
i
z
i
k
2
.
In (Mikolov et al., 2013b) it is solved with stochastic
gradient descent. When a translation of a new word
is needed, then we take its vector representation x in
source language space and compute z = W x. The last
step is to find out the word such that its representation
(in the target vector space) is the closest to z in sense
of cosine similarity.
4 CURRENT WORK – BASELINE
APPROACH
For recommending similar documents and duplicity
task as well as exploring medical curricula, a solution
based on latent semantic analysis (LSA) is being cur-
rently tested.
The overall principle is simple: documents that
are taken into the account are transformed into a
plaintext form, then a document-term-matrix (DTM)
with tf-idf weighting is created. Dimensionality of
this DTM is consequently reduced by LSA. For simi-
larity computations cosine distance is used.
Pairs of documents where cosine similarity is
greater or equal to a given threshold are returned as
potentially duplicate. In recommendation task, top n
most similar documents are obtained for a given doc-
ument. Results of these “traditional” approaches will
serve as a baseline for our experiments further with
duplicates and recommending similar documents.
5 AIM OF THE DOCTORAL
PROJECT
In this section we are going to break the proposed
project into a set of consequent issues and describe
their main ideas.
5.1 Main Issues – Plan of the Work
The first issue is an investigation of new methods for
recognizing partial textual entailment. The starting
point will be the architecture of the previously men-
tioned system (Levy et al., 2013). The idea is to
modify the Lexical Inference module in the sense of
replacing the former calculation of word similarity
based on WordNet by dealing with distances obtained
from word2vec model. No knowledge technologies
(like WordNet) will be used. This task also requires
(training and testing) data sets to be prepared. This
issue will be finished by the comparison of original
Levy’s system and the new one.
The second issue is extending the architecture to-
wards cross-lingual setting. The task is to decide
whether the text T in the source language entails a
facet (w
1
, w
2
) in the target language. The proposed
Computing Semantic Textual Similarity based on Partial Textual Entailment
9
method arises from the Lexical Inference module,
again within the word2vec framework: having a word
w
1
in the target language – assuming we already have
computed the liner mapping ϕ (represented by the ma-
trix W see the previous chapter). For w
1
we calcu-
late ϕ
1
(w
i
) and compute its cosine similarity to rep-
resentation t
j
of a word from the hypothesis. We will
consider such t
j
from the hypothesis, such that cosine
similarity is the lowest. If the similarity is higher than
a given threshold and for w
2
so, then T and (w
1
, w
2
)
constitutes a (cross-lingual) partial textual entailment,
otherwise not. This issue will be analogously as the
previous one completed by a comparison in this
case, we will compare results of the monolingual vari-
ant and the bilingual.
The third issue is an automatic extraction of facets.
In both issues above, it was assumed (similarly as
in (Levy et al., 2013)) that facets are obtained exter-
nally/manually from the hypothesis. Thus, a natural
question arises: whether (and how) this process can
be automated by ML methods. At first, we would like
to check if it is possible to use only the features de-
rived from the word2vec representation or it is neces-
sary to employ features that come from the syntactic
parsing of the hypothesis. This issue requires propos-
ing of suitable evaluations of the process of generat-
ing facets from a given hypothesis.
The fourth issue is the development of the scor-
ing method for STS task. Having two sentences S
and T , we can compute the percentage of facets from
T that are entailed by S. Given two text snippets A
and B and a sentence S from A, we are able to find
out a sentence T from B such that this percentage is
the highest among all sentences in B. We obtain an
entailment score with respect to (A, B) by averaging
this percentages over all sentences in A. In a similar
way we can define a paraphrase score with respect to
(A, B). Therefore we obtain a quadruple of values:
1. an entailment score with respect to (A, B)
2. an entailment score with respect to (B, A)
3. a paraphrase score with respect to (A, B)
4. a paraphrase score with respect to (B, A)
This quadruple of values models the “entailment”-
relationship between documents. For instance, if A is
a summarization of B, then the first and third value
will be greater or equal to the second and fourth value.
The particular goal is to investigate how to turn these
scores (using probably also other features of A and
B) into a single value that will be used for estimat-
ing semantic textual similarity of a pair of documents
and also implement a web service that will implement
these computations.
The fifth issue is a development of a decoding
module. As shown in (Mi
˜
narro-Gim
´
enez et al., 2015),
word2vec can be used for capturing different seman-
tic relations, e.g. hypernym/hyponym, membership,
etc. The main idea of this issue is to replace Syntactic
inference module of the system described in the pre-
vious chapter by analogous module that will not deal
with dependency trees but with sequences of vectors.
According to the basic classification, this approach
belongs to decoding methods. Operations such as re-
placing a word by its hypernym will be processed as
replacing the corresponding vector representation in
order to transform the initial text T representation to
an ordered pair of vectors close to the vector repre-
sentation of considered facet.
The sixth issue is a development and evaluation of
an application for “duplicate task” and “recommen-
dation task”. These applications will have a form of
web services. The evaluation will be performed using
the same metrics traditionally employed in the evalua-
tion of STS systems (particularly SemEval-2016 Task
1), i. e., mean Pearson correlation between the system
output and the gold standard annotations. The goal is
to present a system that will provide better results in
recommending semantically similar documents than
currently developed system that uses LSA.
It should be adjusted that the primary goal
of this work is to create a real-word applica-
tion(s)/components solving mentioned issues rather
than achieving good results in standardized bench-
marks. Non-excellent results on standardized tasks
can be balanced by the simplicity of the entire system
and consequent easy maintenance. Nevertheless, we
expect that results of proposed R(P)TE system in stan-
dard evaluations will be comparable with other up-to-
day recognizing (partial) textual entailment systems.
6 CONCLUSION
In this work, we recall and discuss the notion of
(partial and faceted) textual entailment and we pro-
pose a system for recognizing partial textual entail-
ment based on the word2vec model. We present an
idea how to extend this proposed system for recog-
nizing partial textual entailment to multilingual envi-
ronment. The aim of the doctoral project is to use
this system to calculate STS based on scores derived
from partial textual entailment features (among multi-
lingual documents). We want to achieve better results
than standard methods based on LSA.
DC3K 2015 - Doctoral Consortium on Knowledge Discovery, Knowledge Engineering and Knowledge Management
10
REFERENCES
Agirre, E., Banea, C., et al. (2015). Semeval-2015 task 2:
Semantic textual similarity, english, s-panish and pi-
lot on interpretability. In Proceedings of the 9th Inter-
national Workshop on Semantic Evaluation (SemEval
2015), June.
Androutsopoulos, I. and Malakasiotis, P. (2010). A survey
of paraphrasing and textual entailment methods. Jour-
nal of Artificial Intelligence Research, pages 135–187.
Bentivogli, L., Clark, P., Dagan, I., Dang, H., and Giampic-
colo, D. (2011). The seventh pascal recognizing tex-
tual entailment challenge. Proceedings of TAC, 2011.
Bjerva, J., Bos, J., van der Goot, R., and Nissim, M. (2014).
The meaning factory: Formal semantics for recogniz-
ing textual entailment and determining semantic sim-
ilarity. SemEval 2014, page 642.
Burrows, S., Gurevych, I., and Stein, B. (2015). The eras
and trends of automatic short answer grading. Interna-
tional Journal of Artificial Intelligence in Education,
25(1):60–117.
Clark, Fellbaum, H. (2006). The Boeing-Princeton-
ISI (BPI) textual entailment test suite.
http://www.cs.utexas.edu/ pclark/bpi-test-suite/.
de Salvo Braz, R., Girju, R., Punyakanok, V., Roth, D.,
and Sammons, M. (2006). An inference model for
semantic entailment in natural language. In Machine
Learning Challenges. Evaluating Predictive Uncer-
tainty, Visual Object Classification, and Recognising
Tectual Entailment, pages 261–286. Springer.
Dzikovska, M. O., Nielsen, R. D., and Brew, C. (2012). To-
wards effective tutorial feedback for explanation ques-
tions: A dataset and baselines. In Proceedings of the
2012 Conference of the North American Chapter of
the Association for Computational Linguistics: Hu-
man Language Technologies, pages 200–210. Associ-
ation for Computational Linguistics.
Dzikovska, M. O., Nielsen, R. D., Brew, C., Leacock, C.,
Giampiccolo, D., Bentivogli, L., Clark, P., Dagan, I.,
and Dang, H. T. (2013). Semeval-2013 task 7: The
joint student response analysis and 8th recognizing
textual entailment challenge. Technical report, DTIC
Document.
Erk, K. and Pad
´
o, S. (2009). Paraphrase assessment in
structured vector space: Exploring parameters and
datasets. In Proceedings of the Workshop on Geomet-
rical Models of Natural Language Semantics, pages
57–65. Association for Computational Linguistics.
Fellbaum, C. (1998). WordNet. Wiley Online Library.
Gupta, A., Kaur, M., Singh, A., Goel, A., and Mirkin, S.
(2014). Text summarization through entailment-based
minimum vertex cover. Lexical and Computational
Semantics (* SEM 2014), page 75.
Harmeling, S. (2009). Inferring textual entailment with a
probabilistically sound calculus. Natural Language
Engineering, 15(04):459–477.
Kouylekov, M. and Magnini, B. (2005). Recognizing tex-
tual entailment with tree edit distance. In Proceedings
of the PASCAL RTE Challenge, pages 17–20.
Kouylekov, M. and Magnini, B. (2006). Combining lex-
ical resources with tree edit distance for recogniz-
ing textual entailment. In Machine Learning Chal-
lenges. Evaluating Predictive Uncertainty, Visual Ob-
ject Classification, and Recognising Tectual Entail-
ment, pages 217–230. Springer.
Levy, O., Zesch, T., Dagan, I., and Gurevych, I. (2013).
Recognizing partial textual entailment. In ACL (2),
pages 451–455.
Malakasiotis, P. and Androutsopoulos, I. (2007). Learn-
ing textual entailment using svms and string similarity
measures. In Proceedings of the ACL-PASCAL Work-
shop on Textual Entailment and Paraphrasing, pages
42–47. Association for Computational Linguistics.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a).
Efficient estimation of word representations in vector
space. arXiv preprint arXiv:1301.3781.
Mikolov, T., Le, Q. V., and Sutskever, I. (2013b). Exploiting
similarities among languages for machine translation.
arXiv preprint arXiv:1309.4168.
Mikolov, T., Yih, W.-t., and Zweig, G. (2013c). Linguistic
regularities in continuous space word representations.
In HLT-NAACL, pages 746–751.
Mi
˜
narro-Gim
´
enez, J. A., Mar
´
ın-Alonso, O., and Samwald,
M. (2015). Applying deep learning techniques on
medical corpora from the world wide web: a pro-
totypical system and evaluation. arXiv preprint
arXiv:1502.03682.
Moldovan, D. I. and Rus, V. (2001). Logic form transfor-
mation of wordnet and its applicability to question an-
swering. In Proceedings of the 39th Annual Meeting
on Association for Computational Linguistics, pages
402–409. Association for Computational Linguistics.
Nev
ˇ
e
ˇ
rilov
´
a, Z. (2014a). Paraphrase and textual entailment
generation. In Text, Speech and Dialogue, pages 293–
300. Springer.
Nev
ˇ
e
ˇ
rilov
´
a, Z. (2014b). Paraphrase and Textual Entailment
Generation in Czech [online]. PhD thesis, Faculty of
Informatics, Masaryk University Brno.
Nielsen, R. D., Ward, W., and Martin, J. H. (2009). Recog-
nizing entailment in intelligent tutoring systems. Nat-
ural Language Engineering, 15(04):479–501.
Nielsen, R. D., Ward, W., Martin, J. H., and Palmer, M.
(2008). Annotating students understanding of science
concepts. In In Proc. LREC.
Rehurek, R. (2008). Semantic-based plagiarism detection
[online]. Ph.d. thesis proposal, Faculty of Informatics,
Masaryk University Brno.
Resnik, P. (1995). Using information content to evaluate se-
mantic similarity in a taxonomy. arXiv preprint cmp-
lg/9511007.
Rudrapal, D. and Bhattacharya, B. (2014). Recognition of
partial textual entailment for bengali tweets. Social-
India 2014, 2014:29.
Stern, A. and Dagan, I. (2012). Biutee: A modular open-
source system for recognizing textual entailment. In
Proceedings of the ACL 2012 System Demonstrations,
pages 73–78. Association for Computational Linguis-
tics.
Computing Semantic Textual Similarity based on Partial Textual Entailment
11
Tatu, M. and Moldovan, D. (2006). A logic-based seman-
tic approach to recognizing textual entailment. In
Proceedings of the COLING/ACL on Main conference
poster sessions, pages 819–826. Association for Com-
putational Linguistics.
Tian, R., Miyao, Y., and Matsuzaki, T. (2014). Logical
inference on dependency-based compositional seman-
tics. In Proceedings of ACL, pages 79–89.
Williams, D. R. G. H. R. and Hinton, G. (1986). Learning
representations by back-propagating errors. Nature,
pages 523–533.
Xu, W., Callison-Burch, C., and Dolan, W. B. (2015).
Semeval-2015 task 1: Paraphrase and semantic sim-
ilarity in twitter (pit). In Proceedings of the 9th In-
ternational Workshop on Semantic Evaluation (Se-
mEval).
Zanzotto, F., Pennacchiotti, M., and Moschitti, A. (2009).
A machine learning approach to textual entail-
ment recognition. Natural Language Engineering,
15(04):551–582.
DC3K 2015 - Doctoral Consortium on Knowledge Discovery, Knowledge Engineering and Knowledge Management
12