Local vs. global interpretability of machine learning models in type 2 diabetes mellitus screening

Image credit: Springer

Abstrakt

Machine learning based predictive models have been used in different areas of everyday life for decades. However, with the recent availability of big data, new ways emerge on how to interpret the decisions of machine learning models. In addition to global interpretation focusing on the general prediction model decisions, this paper emphasizes the importance of local interpretation of predictions. Local interpretation focuses on specifics of each individual and provides explanations that can lead to a better understanding of the feature contribution in smaller groups of individuals that are often overlooked by the global interpretation techniques. In this paper, three machine learning based prediction models were compared: Gradient Boosting Machine (GBM), Random Forest (RF) and Generalized linear model with regularization (GLM). No significant differences in prediction performance, measured by mean average error, were detected: GLM: 0.573 (0.569 − 0.577); GBM: 0.579 (0.575 − 0.583); RF: 0.579 (0.575 − 0.583). Similar to other studies that used prediction models for screening in type 2 diabetes mellitus, we found a strong contribution of features like age, gender and BMI on the global interpretation level. On the other hand, local interpretation technique discovered some features like depression, smoking status or physical activity that can be influential in specific groups of patients. This study outlines the prospects of using local interpretation techniques to improve the interpretability of prediction models in the era of personalized healthcare. At the same time, we try to warn the users and developers of prediction models that prediction performance should not be the only criteria for model selection.

Tip publikacije
Publikacija
In Artificial Intelligence in Medicine: Knowledge Representation and Transparent and Explainable Systems. KR4HC 2019, TEAAM 2019. Lecture Notes in Computer Science, 11979, pp. 108-119
Leon Kopitar
Leon Kopitar
Doktorski študent

Leon Kopitar je zaposlen kot višji raziskovalec na Fakulteti za zdravstvene vede, na članici Univerze v Mariboru. Na Univerzi v Mariboru, Fakulteti za elektrotehniko, računalništvo in informatiko opravlja doktorski študij Računalništva in informatike. Njegov raziskovalni interes vključuje aplikativnost metod strojnega učenja na področju zdravstva.

Leona Cilar Budler
Leona Cilar Budler
Doktorantka

Moji raziskovalni interesi vključujejo področje duševnega zdravja, raziskovanje v zdravstveni negi in informatika v zdravstvu. Specifična področja, ki me zanimajo, vključujejo duševno zdravje mladostnikov, psihometično testiranje vprašalnikov, lokalizacijo vprašalnikov ter kvantitativno analizo podatkov.

Primož Kocbek
Primož Kocbek
Doktorski študent

Moji raziskovalni interesi vključujejo statistične modele in metode strojnega učenja z aplikacijami v zdravstvu. Specifična področja, ki me zanimajo, vključujejo časovno analizo podatkov, interpretacijo napovednih modelov, stabilnost algoritmov, napredne metode strojnega učenja na masivnih podatkovjih, npr. globoke nevronske mreže.

Gregor Štiglic
Gregor Štiglic
Izredni profesor in predstojnik raziskovalnega inštituta

Moji raziskovalni interesi vključujejo tehnike strojnega učenja z uporabo v zdravstvu. Specifična področja, ki me zanimajo, vključujejo razumljivost napovednih modelov, klasifikacija, ki temelji na človeški interakciji, stabilnost algoritmov za izbiro lastnosti, meta učenje in odkrivanje longitudinalnih pravil.