Automated Detection of Fake Biomedical Papers: A Machine Learning Perspective

Ahmar K. Hussain, Bernhard A. Sabel, Marcus Thiel, Andreas Nürnberger

2025

Abstract

In order to address the issue of fake papers in scientific literature, we propose a study focusing on the classification of fake papers based on certain features, by employing machine learning classifiers. A new dataset was collected, where the fake papers were acquired from the Retraction Watch database, while the non-fake papers were obtained from PubMed. The features extracted for classification included metadata, journal-related features as well and textual features from the respective abstracts, titles, and full texts of the papers. We used a variety of different models to generate features/word embeddings from the abstracts and texts of the papers, including TF-IDF and different variations of BERT trained on medical data. The study compared the results of different models and feature sets and revealed that the combination of metadata, journal data, and BioBERT embeddings achieved the best performance with an accuracy and recall of 86% and 83% respectively, using a gradient boosting classifier. Finally, this study presents the most important features acquired from the best performing classifier.

Download


Paper Citation


in Harvard Style

Hussain A., Sabel B., Thiel M. and Nürnberger A. (2025). Automated Detection of Fake Biomedical Papers: A Machine Learning Perspective. In Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-749-8, SciTePress, pages 662-670. DOI: 10.5220/0013482800003929


in Bibtex Style

@conference{iceis25,
author={Ahmar Hussain and Bernhard Sabel and Marcus Thiel and Andreas Nürnberger},
title={Automated Detection of Fake Biomedical Papers: A Machine Learning Perspective},
booktitle={Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2025},
pages={662-670},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013482800003929},
isbn={978-989-758-749-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Automated Detection of Fake Biomedical Papers: A Machine Learning Perspective
SN - 978-989-758-749-8
AU - Hussain A.
AU - Sabel B.
AU - Thiel M.
AU - Nürnberger A.
PY - 2025
SP - 662
EP - 670
DO - 10.5220/0013482800003929
PB - SciTePress