Prediction of Public Procurement Corruption Indices using Machine Learning Methods

Kornelije Rabuzin, Nikola Modrušan


The protection of citizens’ public financial resources through advanced corruption detection models in public procurement has become an almost inevitable topic and the subject of numerous studies. Since it almost always focuses on the prediction of corrupt competition, the calculation of various indices and indications of corruption to the data itself are very difficult to come by. These data sets usually have very few observations, especially accurately labelled ones. The prevention or detection of compromised public procurement processes is definitely a crucial step, related to the initial phase of public procurement, i.e., the phase of publication of the notice. The aim of this paper is to compare prediction models using text-mining techniques and machine-learning methods to detect suspicious tenders, and to develop a model to detect suspicious one-bid tenders. Consequently, we have analyzed tender documentation for particular tenders, extracted the content of interest about the levels of all bids and grouped it by procurement lots using machine-learning methods. A model that includes the aforementioned components uses the most common text classification algorithms for the purpose of prediction: naive Bayes, logistic regression and support vector machines. The results of the research showed that knowledge in the tender documentation can be used for detection suspicious tenders.


Paper Citation