Authors:
Egor Ershov
1
;
Artyom Panshin
1
;
Ivan Ermakov
1
;
Nikola Banić
2
;
Alex Savchik
3
and
Simone Bianco
4
Affiliations:
1
Institute for Information Transmission Problems, Russian Academy of Sciences, 119991 Moscow, Russia
;
2
Gideon Brothers, 10000 Zagreb, Croatia
;
3
ACMetric, Netherlands
;
4
University of Milano-Bicocca, 20126 Milan, Italy
Keyword(s):
Image Quality, Pairwise Comparison, Statistics, Stability, Aesthetics, Computational Aesthetics, Crowdsourcing.
Abstract:
Image quality assessment (IQA) is widely used to evaluate the results of image processing methods. While in recent years the development of objective IQA metrics has seen much progress, there are still many tasks where subjective IQA is significantly more preferred. Using subjective IQA has become even more attractive ever since crowdsourcing platforms such as Amazon Mechanical Turk and Toloka have become available. However, for some specific image processing tasks, there are still some questions related to subjective IQA that have not been solved in a satisfactory way. An example of such a task is the evaluation of image rendering styles where, unlike in the case of distortions, none of the evaluated styles is to be objectively regarded as a priori better or worse. The questions that have not been properly answered up until now are whether the scores for such a task obtained through crowdsourced subjective IQA are reliable and whether they remain stable, i.e., similar if the evaluat
ion is repeated over time. To answer these questions, in this paper first several images and styles are selected and defined, they are then evaluated by using crowdsourced subjective IQA on the Toloka platform, and the obtained scores are numerically analyzed. Experimental results confirm the reliability and stability of the crowdsourced subjective IQA for the problem in question. The experimental data is available at https://zenodo.org/records/10458531.
(More)