quent analysis and also the time aspect (whether as-
sessment is immediately possible). This applies to (i)
the initial ML model quality assessment (e.g., accu-
racy as described above), (ii) the mapping of model
quality to data quality through probabilistic models
as suggested, and (iii) root cause identification of data
quality deficiencies, also using probabilistic models.
It needs to be noted that some aspects such as
qualities of predictors and adaptors refer to future
events (an external event will have happened for pre-
dictors or a future system adaptation will have be-
come effective for adaptors). This still allows to make
quality assessments, but just not immediately. A de-
tailed coverage of this aspect is beyond the scope of
this paper and shall be addressed at a later stage.
6 CONCLUSIONS
Raw data is without additional processing of little
value. More and more, machine learning can help
with this processing to create meaningful information.
We developed here a quality framework that com-
bines quality aspects of the raw source data as well as
the quality of the machine-learned information mod-
els derived from the data, We provided a fine-granular
model covering a range of quality concerns organ-
ised around some common types of machine learning
function types.
The central contribution here is the mapping of
observable ML information model deficiencies to un-
derlying, possible hidden data quality problems. The
aim was a root cause analysis for observed symp-
toms. Furthermore, recommending remedial actions
for identified problems and causes is another part of
the framework.
Some open problems for future work emerge from
our discussion. The assessment of the information
model requires further exploration. We provide in-
formal definitions for all concepts, but all aspects be-
yond accuracy need to be fully formalised. The au-
tomation of assessment and analyses is a further con-
cern. In the paper here, we only covered the frame-
work from a conceptual perspective. A further part of
future work is to move the framework towards digital
twins. Digital twins is a concept that refers to a digital
replica of physical assets such as processes, locations,
systems and devices. These are often based on IoT-
generated data with enhances models and function
provided through machine learning. We plan to in-
vestigate deeper the complexity of these digital twins
and the respective quality concerns that would apply.
REFERENCES
Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Ka-
mar, E., Nagappan, N., Nushi, B., and Zimmermann,
T. Software engineering for machine learning: A case
study. In International Conference on Software Engi-
neering - Software Engineering in Practice.
Azimi, S. and Pahl, C. (2020a). A layered quality frame-
work in machine learning driven data and information
models. In 22nd International Conference on Enter-
prise Information Systems - ICEIS 2020. SciTePress.
Azimi, S. and Pahl, C. (2020b). Particle swarm optimiza-
tion for performance management in multi-cluster
iot edge architectures. In International Conference
on Cloud Computing and Services Science CLOSER.
SciTePress.
Caruana, R. and Niculescu-Mizil, A. (2006). An empiri-
cal comparison of supervised learning algorithms. In
Proceedings of the 23rd International Conference on
Machine Learning, page 161–168.
Casado-Vara, R., de la Prieta, F., Prieto, J., and Corchado,
J. M. (2018). Blockchain framework for iot data qual-
ity via edge computing. In Proceedings of the 1st
Workshop on Blockchain-Enabled Networked Sensor
Systems, page 19–24.
Ehrlinger, L., Haunschmid, V., Palazzini, D., and Lettner,
C. (2019). A daql to monitor data quality in ma-
chine learning applications. In Hartmann, S., Küng,
J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A. M.,
and Khalil, I., editors, Database and Expert Systems
Applications, pages 227–237.
Kleiman, R. and Page, D. (2019). Aucµ: A performance
metric for multi-class machine learning models. In In-
ternational Conference on Machine Learning, pages
3439–3447.
Nguyen, T. L. (2018). A framework for five big v’s of big
data and organizational culture in firms. In 2018 IEEE
International Conference on Big Data (Big Data),
pages 5411–5413. IEEE.
O’Brien, T., Helfert, M., and Sukumar, A. (2013). The value
of good data- a quality perspective a framework and
discussion. In ICEIS 2013 - 15th International Con-
ference on Enterprise Information Systems.
Pahl, C., Fronza, I., El Ioini, N., and Barzegar, H. R. (2019).
A review of architectural principles and patterns for
distributed mobile information systems. In Proceed-
ings of the 15th International Conference on Web In-
formation Systems and Technologies.
Pahl, C., Ioini, N. E., Helmer, S., and Lee, B. (2018). An ar-
chitecture pattern for trusted orchestration in iot edge
clouds. In Third International Conference on Fog and
Mobile Edge Computing (FMEC), pages 63–70.
Plewczynski, D., Spieser, S. A. H., and Koch, U. (2006).
Assessing different classification methods for virtual
screening. Journal of Chemical Information and Mod-
eling, 46(3):1098–1106.
Rajkomar, A., Hardt, M., Howell, M. D., Corrado, G., and
Chin, M. H. (2018). Ensuring fairness in machine
learning to advance health equity. Annals of internal
medicine, 169(12):866–872.
ICEIS 2020 - 22nd International Conference on Enterprise Information Systems
664