Authors:
Miloš Košprdić
1
;
Adela Ljajić
1
;
Darija Medvecki
1
;
Bojana Bašaragin
1
and
Nikola Milošević
2
;
1
Affiliations:
1
The Institute for Artificial Intelligence Research and Development of Serbia, Fruškogorska 1, Novi Sad, Republic of Serbia
;
2
Bayer A.G. Reaserch and Development, Mullerstrasse 173, Berlin, Germany
Keyword(s):
Claim Verification, Deep Learning Models, Natural Language Inference, PubMed, SciFact Dataset.
Abstract:
This paper introduces the foundation for the third component of a pioneering open-source scientific question-answering system. The system is designed to provide referenced, automatically vetted, and verifiable answers in the scientific domain where hallucinations and misinformation are intolerable. This Verification Engine is based on models fine-tuned for the Natural Language Inference task using an additionally processed SciFact dataset. Our experiments, involving eight fine-tuned models based on RoBERTa Large, XLM RoBERTa Large, DeBERTa, and DeBERTa SQuAD, show promising results. Notably, the DeBERTa model fine-tuned on our dataset achieved the highest F1 score of 88%. Furthermore, evaluating our best model on the HealthVer dataset resulted in an F1 score of 48%, outperforming other models by more than 12%. Additionally, our model demonstrated superior performance with a 7% absolute increase in F1 score compared to the best-performing GPT-4 model on the same test set in a zero-sho
t regime. These findings suggest that our system can significantly enhance scientists’ productivity while fostering trust in the use of generative language models in scientific environments.
(More)