Local vs. global interpretability of machine learning models in type 2 diabetes mellitus screening

Image credit: Springer

Abstract

Machine learning based predictive models have been used in different areas of everyday life for decades. However, with the recent availability of big data, new ways emerge on how to interpret the decisions of machine learning models. In addition to global interpretation focusing on the general prediction model decisions, this paper emphasizes the importance of local interpretation of predictions. Local interpretation focuses on specifics of each individual and provides explanations that can lead to a better understanding of the feature contribution in smaller groups of individuals that are often overlooked by the global interpretation techniques. In this paper, three machine learning based prediction models were compared: Gradient Boosting Machine (GBM), Random Forest (RF) and Generalized linear model with regularization (GLM). No significant differences in prediction performance, measured by mean average error, were detected: GLM: 0.573 (0.569 − 0.577); GBM: 0.579 (0.575 − 0.583); RF: 0.579 (0.575 − 0.583). Similar to other studies that used prediction models for screening in type 2 diabetes mellitus, we found a strong contribution of features like age, gender and BMI on the global interpretation level. On the other hand, local interpretation technique discovered some features like depression, smoking status or physical activity that can be influential in specific groups of patients. This study outlines the prospects of using local interpretation techniques to improve the interpretability of prediction models in the era of personalized healthcare. At the same time, we try to warn the users and developers of prediction models that prediction performance should not be the only criteria for model selection.

Publication
In Artificial Intelligence in Medicine: Knowledge Representation and Transparent and Explainable Systems. KR4HC 2019, TEAAM 2019. Lecture Notes in Computer Science, 11979, pp. 108-119
Leon Kopitar
Leon Kopitar
PhD student

Leon Kopitar is a senior researcher at the Faculty of health sciences, University of Maribor, Maribor, Slovenia, and a PhD student at The Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia. His research interest includes the applicability of machine learning methods in the healthcare domain.

Leona Cilar Budler
Leona Cilar Budler
PhD

My research interests include mental health, nursing research, and health informatics. Specific areas of interest include adolescent mental health, psychometric testing of questionnaires, questionnaire localization, and quantitative data analysis.

Primož Kocbek
Primož Kocbek
PhD Student

My research interests include statistical models and machine learning techniques with applications in healthcare. My specific areas of interest include temporal data analysis, interpretability of prediction models, stability of algorithms, advanced machine learning methods on massive datasets, e.g. deep neural networks.

Gregor Štiglic
Gregor Štiglic
Associate Professor and head of Research Institute

My research interests include predictive models in healthcare, interpretability of complex models.