Lastly, we compare our method with a citation-
based one calculated by counting the same cited doc-
uments:
cbm(q,d) = |citing(q) ∩ citing(d)| (24)
4.5 Evaluation Results
In this section, we discuss the results from the rank-
ing and the classification scenario. We also discuss
the error and successful cases that happened in the
classification scenario.
Table 2 shows the results from the ranking sce-
nario for R, P, and F1. According to the R scores, the
baseline method bowm, cbm, and cnc achieved a sig-
nificant high score. These happened because during
the dataset creation, (Pertile et al., 2016) pooled the
document pairs that have a significant content simi-
larity, i.e. the top-30 document pairs to be annotated.
Hence, improving their performances in this scenario
is quite difficult.
Among the baselines, the method cnc achieved the
best performance in R for the cut-off=30. While our
method achieved the same performance as cnc in R,
P, and F for every cut-off. At the cut-off=30, our
method and cnc improved the R scores for one and
three plagiarized documents when we compared them
with bowm and cbm, respectively.
According to the MAP scores shown on Table 3 at
cut-off=30, cnc is the best method among the baseline
methods, while ours achieves the best performance
of all the methods. Our method improved the rank-
ing position of one source document compared with
the baseline method bowm and cnc. The rank of this
source document in bowm, cnc, and our method is 32,
30, 24, respectively.
In this evaluation, we found the best λ for cnc was
.2. We also found the optimal α and β for our method
were between .1 and .3, and between .2 and .4, re-
spectively. These results suggest that the content sim-
ilarity of non-citing sentences should be prioritized,
but the similarity of citation behavior should not be
ignored. Additionally, using citing sentences as addi-
tional content for the cited document is also useful.
Table 4 shows the MAP scores for each document
similarity method that we proposed. Among them,
sim
nc
achieves the best MAP score, while sim
add
is
the lowest one. The reason for sim
add
to have the low-
est MAP scores is because some source documents
(20 of 40) do not have any sentences citing them, or
the citing sentences are not extracted or identified.
In the classification scenario, we used SVM (Sup-
port Vector Machine) algorithm
6
to perform this task.
6
http://scikit-learn.org/
Thus, α, β, and λ are decided automatically by the
SVM. We did stratified 10-fold cross-validation and
also searched for the optimum parameters in the
SVM, i.e. type of kernel, C, and γ.
Since the method (Lopez, 2009) failed to extract
some citing sentences and/or identify the cited doc-
uments in the ranking scenario, we manually per-
formed these tasks on both positive and negative pairs
for the classification scenario. Thus, we could give
the ideal situation for all the document similarity
methods except for sim
add
since it was not possible
to do these tasks manually on all documents in the
collection.
Table 5 shows the evaluation results in the clas-
sification scenario. Our similarity of citation behav-
ior (sim
cb
) achieved .3466, .17, .4228 and .338 higher
than cbm for P, R, F1 and A, respectively. These re-
sults indicate that citing sentences should be consid-
ered when comparing citations or reference lists.
Since we could not give ideal situation for sim
add
,
i.e. only 31 of 93 document pairs that the candi-
date source documents have sentences citing them,
its performance is the lowest among our three doc-
ument similarities in P, R, F1, and A. Despite its per-
formance, sim
add
is still useful when we consider the
situation where the content of candidate source docu-
ments are not available in the document collection.
The combination of sim
cb
and sim
nc
also scored
.0433, .045, .0446, and .0424 higher than cnc in the
terms of P, R, F1, and A, respectively. These results
suggest that citation anchors should also be consid-
ered when comparing citing sentences. Additionally,
this combination performed .0655, .0501, and .0433
higher than bowm for P, F1, and A, respectively. It
indicates that citing and non-citing sentences should
be distinguished when comparing documents.
According to F1 and A scores, our method
(sim
cb
,sim
nc
, sim
add
) is the best one. It performed
.0501, .4228, and .0585 higher than the baseline
method bowm, cbm, and cnc for F1, respectively. This
indicates that our document similarity methods com-
plement each others. In addition, our method also
made the least FN and FP compare with the baseline
methods according to Table 6.
Our method produced three FN and six FP accord-
ing to the table 6. Three FN happened because the
similar text fragment between these pairs are short.
They share less than three citing sentences, and a few
non-citing ones. Thus, to detect plagiarism for small
textual overlap remains difficult when documents are
long.
While the reason for the FP is because a few
shared citing sentences containing multiple citation
anchors. Typically, these citing sentences only list
A Method for Plagiarism Detection over Academic Citation Networks
279