purpose). This will be used to evaluate the overall
accuracy of the approach. Overall results of the
classification on testing MRPC dataset are
highlighted in Table 2.
Table 1: Average similarity score for paraphrase and non-
paraphrase cases in MRPC dataset
Method Paraphrase Non-paraphrase
Our method 88% 46%
Quantification (2)
If-Idf Cosine Sim
Method [5]
Table 2: Overall classification accuracy on MRPC testing
Method Accuracy rate
Our method 84%
Quantification (2)
If-Idf Cosine Sim
Method [5]
Results highlighted in Table 1 and Table 2 testify of
the usefulness of the proposed approach that fruitfully
combine Wikipedia based measure, WordNet based
semantic similarity and double checking model on the
top extracted snippets of the queries in order to infer
enhanced similarity measure. Future work involves
study of algebraical and asymptotical properties of
the elaborated measure as well as testing on
alternative corpus. Especially, it is easy to see that
expression (8) will require further refinements in the
case where the presence of false negative is dominant
in the dataset.
This paper contributes to the ongoing research of
developing efficient tools for paraphrase detection.
The approach advocates a web-based approach where
the snippets of the search are analyzed using
WordNet semantic based measure and Normalized-
based distance Wikipedia based measure. The
proposal has been designed in order to accommodate
a prudent attitude like reasoning. The test using
Microsoft Research Paraphrase Corpus has shown
good results with respect to some of state of the art
approaches. Although, the complexity of web search
outcome is well documented, the proposal opens
news ways to explore the timely availability of the
search results by exploring the similarity of the search
outcomes regardless of the accuracy of single search
