Paraphrase Detection based on Vector Space Model: A Study of
Utilization of Semantic Network for Improving Information
Nurwati,Yudi Santoso,Krisna Adiyarta
Universitas Budiluhur
Keywords: Paraphrasing, Vector Space Model, Precision and Recall
Abstract: Paraphrasing if seen in plain view, does not look like it, so we need a technique or model that can measure
the level of similarity between documents that will compare these documents. Vector Space Model is a
standard approach model used to find similarities between documents. This study aims to find a system
model that can be applied to paraphrase detection applications by utilizing semantic information as a tool
that is integrated with the Vector Space Model. This study will use prototyping research strategies. The
approach taken in conducting this investigation is to compare the performance of the system prototype
developed according to the research hypothesis with a standard prototype that is built according to a
standard framework. In its investigation, this research will use Confusion matrix as the most popular tool in
evaluating system performance using accuracy performance criteria, namely Precision and Recall. In this
way, it is expected that the semantic network model, data structure model, and algorithm that can be
integrated with the vector space model to produce a paraphrase detection system that has a perfect
performance is expected.
1 INTRODUCTION
Paraphrasing is a linguistic term which means re-
expressing a concept in another way in the same
language, but without changing its meaning.
Paraphrasing gives the author the possibility to
emphasize somewhat differently from the original
author.
The rapid development of technology makes it
easy for information users to find and find the
information needed. The internet is one of the most
widely used technology products for information
seekers by providing sources of information from
the authors themselves as well as duplicating with or
without including the original authors. The ease of
getting information and documents through internet
media creates new problems because it turns out that
documents are still found without mentioning the
source of the document's author. It was not known
intentionally or accidentally. Posts or pieces of
writing taken from other people's writings,
intentionally or unintentionally, if not correctly and
adequately referenced, can be categorized as
plagiarism, according to (Isa et al. 2014).
The task of identifying paraphrases has become
mainstream in the research area for natural language
processing. The success of application development
that utilizes semantic similarities is very dependent
on the ability of the system (algorithm) to determine
whether or not there is a semantic relationship
between two words or terms. In this study we see the
problem of paraphrasing in two forms, say A and B,
it is viewed as semantic quantification, the
relationship between two texts, for example, to what
extent text A has the same meaning as text B
(paraphrase relationship) or the extent of text A part
of semantic text B (entailment relationship).
Considering this fact, the formulation of the problem
from this study is It is difficult to build a perfect
paraphrase detection system that can detect text by
considering the physics of the two texts.
This study focuses on the use of semantic
network, which is used as a tool to represent
knowledge. This study focuses on the use of
semantic networks which are used as a tool to
represent paraphrase knowledge as a tool to improve
the performance of paraphrase detection systems.
Then the results of the vector space model with the
semantic network are expected to produce a
paraphrase detection system that is better than the
existing one.