Paraphrase Detection based on Vector Space Model: A Study of 
Utilization of Semantic Network for Improving Information 
Nurwati,Yudi Santoso,Krisna Adiyarta 
Universitas Budiluhur 
Keywords:  Paraphrasing, Vector Space Model, Precision and Recall 
Abstract:    Paraphrasing if seen in plain view, does not look like it, so we need a technique or model that can measure 
the level of similarity between documents that will compare these documents. Vector Space Model is a 
standard approach model used to find similarities between documents. This study aims to find a system 
model that can be applied to paraphrase detection applications by utilizing semantic information as a tool 
that is integrated with the Vector Space Model. This study will use prototyping research strategies. The 
approach taken in conducting this investigation is to compare the performance of the system prototype 
developed according to the research hypothesis with a standard prototype that is built according to a 
standard framework. In its investigation, this research will use Confusion matrix as the most popular tool in 
evaluating system performance using accuracy performance criteria, namely Precision and Recall. In this 
way, it is expected that the semantic network model, data structure model, and algorithm that can be 
integrated with the vector space model to produce a paraphrase detection system that has a perfect 
performance is expected.
 
 
1 INTRODUCTION 
Paraphrasing is a linguistic term which means re-
expressing a concept in another way in the same 
language, but without changing its meaning. 
Paraphrasing gives the author the possibility to 
emphasize somewhat differently from the original 
author.
 
 
The rapid development of technology makes it 
easy for information users to find and find the 
information needed. The internet is one of the most 
widely used technology products for information 
seekers by providing sources of information from 
the authors themselves as well as duplicating with or 
without including the original authors. The ease of 
getting information and documents through internet 
media creates new problems because it turns out that 
documents are still found without mentioning the 
source of the document's author. It was not known 
intentionally or accidentally. Posts or pieces of 
writing taken from other people's writings, 
intentionally or unintentionally, if not correctly and 
adequately referenced, can be categorized as 
plagiarism, according to (Isa et al. 2014).
 
 
The task of identifying paraphrases has become 
mainstream in the research area for natural language 
processing. The success of application development 
that utilizes semantic similarities is very dependent 
on the ability of the system (algorithm) to determine  
whether or not there is a semantic relationship 
between two words or terms. In this study we see the 
problem of paraphrasing in two forms, say A and B, 
it is viewed as semantic quantification, the 
relationship between two texts, for example, to what 
extent text A has the same meaning as text B 
(paraphrase relationship) or the extent of text A part 
of semantic text B (entailment relationship). 
Considering this fact, the formulation of the problem 
from this study is  It is difficult to build a perfect 
paraphrase detection system that can detect text by 
considering the physics of the two texts.
 
 
This study focuses on the use of semantic 
network, which is used as a tool to represent 
knowledge. This study focuses on the use of 
semantic networks which are used as a tool to 
represent paraphrase knowledge as a tool to improve 
the performance of paraphrase detection systems. 
Then the results of the vector space model with the 
semantic network are expected to produce a 
paraphrase detection system that is better than the 
existing one.