the authors in which models are described by their
inputs, models themselves and their outputs. The
framework can be used to study model performance,
explainability, fairness, and other factors that may
ultimately lead to end users’ trust and model
adoption. It emphasizes the concept of machine
learning that makes sense, in which the application of
machine learning results in correctly constructed and
evaluated models for which inputs mimic measurable
real-world characteristics or modeled objects, and
outputs directly correspond to outcomes of interest.
REFERENCES
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A., & Nielsen,
H. (2000). Assessing the accuracy of prediction algorithms
for classification: an overview. Bioinformatics, 16(5), 412-
424.
Batra, M., & Agrawal, R. (2018). Comparative analysis of
decision tree algorithms. In Nature inspired computing (pp.
31-36). Springer, Singapore.
Beaulieu-Jones, B. K., Yuan, W., Brat, G. A., Beam, A. L.,
Weber, G., Ruffin, M., & Kohane, I. S. (2021). Machine
learning for patient risk stratification: standing on, or looking
over, the shoulders of clinicians?. NPJ digital medicine, 4(1),
1-6.
Boyd, K., Eng, K. H., & Page, C. D. (2013, September). Area
under the precision-recall curve: point estimates and
confidence intervals. In Joint European conference on
machine learning and knowledge discovery in databases (pp.
451-466). Springer, Berlin, Heidelberg.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-
32.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (2017).
Classification and regression trees. Routledge.
Castellazzi, G., Cuzzoni, M. G., Cotta Ramusino, M., Martinelli,
D., Denaro, F., Ricciardi, A., ... & Gandini Wheeler-
Kingshott, C. A. (2020). A machine learning approach for the
differential diagnosis of Alzheimer and Vascular Dementia
Fed by MRI selected features. Frontiers in
neuroinformatics, 14, 25.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern
recognition letters, 27(8), 861-874.
Flach, P. (2019, July). Performance evaluation in machine
learning: the good, the bad, the ugly, and the way forward. In
Proceedings of the AAAI Conference on Artificial
Intelligence (Vol. 33, No. 01, pp. 9808-9814).
Friedman, J. H. (2001). Greedy function approximation: a
gradient boosting machine. Annals of statistics, 1189-1232.
Friedman, J. H. (2002). Stochastic gradient boosting.
Computational statistics & data analysis, 38(4), 367-378.
Goutte, C., & Gaussier, E. (2005, March). A probabilistic
interpretation of precision, recall and F-score, with
implication for evaluation. In European conference on
information retrieval (pp. 345-359). Springer, Berlin,
Heidelberg.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of
the area under a receiver operating characteristic (ROC)
curve. Radiology, 143(1), 29-36.
Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013).
Applied logistic regression (Vol. 398). John Wiley & Sons.
McDermott, M. B., Wang, S., Marinsek, N., Ranganath, R.,
Foschini, L., & Ghassemi, M. (2021). Reproducibility in
machine learning for health research: Still a ways to go.
Science Translational Medicine, 13(586).
McHugh, M. L. (2012). Interrater reliability: the kappa statistic.
Biochemia medica, 22(3), 276-282.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan,
A. (2021). A survey on bias and fairness in machine learning.
ACM Computing Surveys (CSUR), 54(6), 1-35.
Lee, E. A., & Sangiovanni-Vincentelli, A. (1998). A framework
for comparing models of computation. IEEE Transactions on
computer-aided design of integrated circuits and systems,
17(12), 1217-1229.
Powers, D. M. (2020). Evaluation: from precision, recall and F-
measure to ROC, informedness, markedness and correlation.
arXiv preprint arXiv:2010.16061.
Rahane, W., Dalvi, H., Magar, Y., Kalane, A., & Jondhale, S.
(2018, March). Lung cancer detection using image
processing and machine learning healthcare. In 2018
International Conference on Current Trends towards
Converging Technologies (ICCTCT) (pp. 1-5). IEEE.
Saha, P., Sadi, M. S., & Islam, M. M. (2021). EMCNet:
Automated COVID-19 diagnosis from X-ray images using
convolutional neural network and ensemble of machine
learning classifiers. Informatics in medicine unlocked, 22,
100505.
Tseng, Y. J., Wang, H. Y., Lin, T. W., Lu, J. J., Hsieh, C. H., &
Liao, C. T. (2020). Development of a machine learning
model for survival risk stratification of patients with advanced
oral cancer. JAMA network open, 3(8), e2011768-e2011768.
Vaccaro, M. G., Sarica, A., Quattrone, A., Chiriaco, C., Salsone,
M., Morelli, M., & Quattrone, A. (2021).
Neuropsychological assessment could distinguish among
different clinical phenotypes of progressive supranuclear
palsy: A Machine Learning approach. Journal of
Neuropsychology, 15(3), 301-318.
Wang, Q., Ma, Y., Zhao, K., & Tian, Y. (2020). A comprehensive
survey of loss functions in machine learning. Annals of Data
Science, 1-26.
Wojtusiak, J., Elashkar, E., & Nia, R. M. (2017, February). C-
Lace: Computational Model to Predict 30-Day Post-
Hospitalization Mortality. HEALTHINF 2017 (pp. 169-177).
Wojtusiak J. Reproducibility, Transparency and Evaluation of
Machine Learning in Health Applications. HEALTHINF
2021 (pp. 685-692).
Wojtusiak, J., Asadzaehzanjani, N., Levy, C., Alemi, F., &
Williams, A. E. (2021). Online Decision Support Tool that
Explains Temporal Prediction of Activities of Daily Living
(ADL). HEALTHINF 2021 (pp. 629-636).
Wojtusiak, J., Asadzadehzanjani, N., Levy, C., Alemi, F., &
Williams, A. E. (2021). Computational Barthel Index: an
automated tool for assessing and predicting activities of daily
living among nursing home patients. BMC medical
informatics and decision making, 21(1), 1-15.