ing tasks using linguistic knowledge which so far only
has qualitative descriptions. In addition to CSD-1 and
CSD-2, we believe that CSDs of other kinds may also
be possible that are awaiting to be explored.
We have demonstrated how to use CSD-1 to as-
sess article organization with high accuracy, and we
believe that other applications may also be possible.
For instance, we may use CSD-2 to identify the type
of a given article to help obtain more accurate rank-
ing of sentences. Incorporating sentence ranking to a
large language model such as GPT-3.5-turbo (Brown
et al., 2020), LLaMA (Touvron et al., 2023), and
PaLM (Chowdhery et al., 2022) is expected to help
generate a better summary for a given article.
Our approach of computing CSDs relies on met-
rics of comparing semantic similarities of a sub-text
block (note that a sentence is a special case of sub-text
block) to the article it is in. While MoverScore is ar-
guably the best choice at this time, computing Mover-
Scores incurs a cubic time complexity (Zhao et al.,
2019). Fortunately, this task is highly parallelizable
and we have implemented a parallel program to carry
out this task on a GPU, which provides much more ef-
ficient computation of CSD-1. Nevertheless, finding a
more effective and efficient measure for content simi-
larity is highly desirable for our tasks, particularly for
long articles.
We would also like to seek intuitions and math-
ematical explanations why the functions LC
x
(a,b |
α,β) resemble CSD-1 curves.
Finally, we would like to explore if CSDs may be
used to assess the overall quality of an article with
a single score with better accuracy than an early at-
tempt (Wang et al., 2022) using a multi-scale essay
representation that can be jointly learned, which em-
ploys multiple losses and transfer learning from out-
of-domain essays.
ACKNOWLEDGMENT
We would like to thank Jay Belanger for a valuable
suggestion on function transformation.
REFERENCES
Feedback prize - predicting effective argu-
ments. https://www.kaggle.com/competitions/
feedback-prize-effectiveness/data. Accessed: 2022.
Wikimedia downloads.
Attali, Y. and Burstein, J. (2006). Automated essay scoring
with e-rater® v. 2. The Journal of Technology, Learn-
ing and Assessment, 4(3).
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D.,
Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., et al. (2020). Language models are few-
shot learners. Advances in Neural Information Pro-
cessing Systems, 33:1877–1901.
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra,
G., Roberts, A., Barham, P., Chung, H. W., Sut-
ton, C., Gehrmann, S., et al. (2022). Palm: Scal-
ing language modeling with pathways. arXiv preprint
arXiv:2204.02311.
Cummins, R., Zhang, M., and Briscoe, T. (2016). Con-
strained multi-task learning for automated essay scor-
ing. In Proceedings of the 54th Annual Meeting of the
Association for Computational Linguistics (Volume 1:
Long Papers), pages 789–799.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2018). Bert: Pre-training of deep bidirectional trans-
formers for language understanding. arXiv preprint
arXiv:1810.04805.
Frey, B. J. and Dueck, D. (2007). Clustering by
passing messages between data points. science,
315(5814):972–976.
Jaccard, P. (1912). The distribution of the flora in the alpine
zone.1. New Phytologist, 11:37—-50.
Kry
´
sci
´
nski, W., Rajani, N., Agarwal, D., Xiong, C., and
Radev, D. (2021). Booksum: A collection of datasets
for long-form narrative summarization.
Levina, E. and Bickel, P. (2001). The earth mover’s distance
is the mallows distance: Some insights from statis-
tics. In Proceedings Eighth IEEE International Con-
ference on Computer Vision. ICCV 2001, volume 2,
pages 251–256. IEEE.
Liao, D., Xu, J., Li, G., and Wang, Y. (2021). Hierarchical
coherence modeling for document quality assessment.
In Proceedings of the AAAI Conference on Artificial
Intelligence, volume 35, pages 13353–13361.
Lin, C.-Y. (2004). Rouge: A package for automatic evalu-
ation of summaries. In Text summarization branches
out, pages 74–81.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).
Bleu: a method for automatic evaluation of machine
translation. In Proceedings of the 40th annual meet-
ing of the Association for Computational Linguistics,
pages 311–318.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark,
C., Lee, K., and Zettlemoyer, L. (1802). Deep con-
textualized word representations. corr abs/1802.05365
(2018). arXiv preprint arXiv:1802.05365.
Radev, Dragomir, e. a. (2003). Summbank. Radev,
Dragomir, et al. SummBank 1.0 LDC2003T16. Web
Download. Philadelphia: Linguistic Data Consortium,
2003.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sen-
tence embeddings using siamese bert-networks. arXiv
preprint arXiv:1908.10084.
Saleh, N. (2014). The Complete Guide to Article Writing:
How to Write Successful Articles for Online and Print
Markets. Writer’s Digest Books; Illustrated edition
(January 14, 2014).
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
130