Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts

Naeimeh Atabaki-Pasdar, Mattias Ohlsson, Ana Viñuela, Francesca Frau, Hugo Pomares-Millan, Mark Haid, Angus G. Jones, E. Louise Thomas, Robert W. Koivula, Azra Kurbasic, Pascal M. Mutie, Hugo Fitipaldi, Juan Fernandez, Adem Y. Dawed, Giuseppe N. Giordano, Ian M. Forgie, Timothy J. McDonald, Femke Rutters, Henna Cederberg, Elizaveta ChabanovaMatilda Dale, Federico De Masi, Cecilia Engel Thomas, Kristine H. Allin, Tue H. Hansen, Alison Heggie, Mun-Gwan Hong, Petra J. M. Elders, Gwen Kennedy, Tarja Kokkola, Helle Krogh Pedersen, Anubha Mahajan, Donna McEvoy, Francois Pattou, Violeta Raverdy, Ragna S. Häussler, Sapna Sharma, Henrik S. Thomsen, Jagadish Vangipurapu, Henrik Vestergaard, Leen M 't Hart, Jerzy Adamski, Petra B. Musholt, Soren Brage, Søren Brunak, Emmanouil Dermitzakis, Gary Frost, Torben Hansen, Markku Laakso, Oluf Pedersen, Martin Ridderstråle, Hartmut Ruetten, Andrew T. Hattersley, Mark Walker, Joline W. J. Beulens, Andrea Mari, Jochen M. Schwenk, Ramneek Gupta, Mark I. McCarthy, Ewan R. Pearson, Jimmy D. Bell, Imre Pavo, Paul W. Franks

Research output: Contribution to journalArticle

9 Downloads (Pure)


Background: Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and3 causes serious health complications in type 2 diabetes (T2D) and beyond. Early4 diagnosis of NAFLD is important, as this can help prevent irreversible damage to the5 liver and ultimately hepatocellular carcinomas. We sought to expand etiological6 understanding and develop a diagnostic tool for NAFLD using machine learning.7

Methods and Findings: We utilized the baseline data from the IMI DIRECT, a8 multicenter prospective cohort study of 3029 European ancestry adults recently9 diagnosed with T2D (n=795) or at high risk of developing the disease (n=2234). Multi10omic (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes11 and other serological biomarkers, anthropometry, measures of beta-cell function,12 insulin sensitivity, and lifestyle) data comprised the key input variables. The models13 were trained on MRI image-derived liver fat content (<5% or ³5%) available for 151414 participants. We applied LASSO (least absolute shrinkage and selection operator) to15 select features from the different layers of omics data and Random Forest analysis to16 develop the models. The prediction models included clinical and omics variables17 separately or in combination. A model including all omics and clinical variables yielded18 a cross-validated receiver operator characteristic area under the curve (ROCAUC) of19 0.84 (95% confidence interval (CI)=0.82, 0.86, p-value<0.001), which compared with20 a ROCAUC of 0.82 (95% CI=0.81, 0.83, p-value<0.001) for a model including nine21 clinically-accessible variables. The IMI DIRECT prediction models out-performed22 existing non-invasive NAFLD prediction tools.23

These analyses have been performed in adults of European ancestry residing in24 northern Europe and it is unknown how well these findings will translate to people of25 other ancestries and exposed to environmental risk factors that contrast those of the6present cohort. Another key limitation of this study is that 26 the prediction was done on27 a binary outcome (<5% or ³5%) and not on the liver fat quantity.

Conclusions: In this study, we have developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: and made it available to the community.
Original languageEnglish
Article numbere1003149
Pages (from-to)1-27
Number of pages27
JournalPLoS Medicine
Issue number6
Publication statusPublished - 19 Jun 2020

Fingerprint Dive into the research topics of 'Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts'. Together they form a unique fingerprint.

  • Cite this

    Atabaki-Pasdar, N., Ohlsson, M., Viñuela, A., Frau, F., Pomares-Millan, H., Haid, M., Jones, A. G., Thomas, E. L., Koivula, R. W., Kurbasic, A., Mutie, P. M., Fitipaldi, H., Fernandez, J., Dawed, A. Y., Giordano, G. N., Forgie, I. M., McDonald, T. J., Rutters, F., Cederberg, H., ... Franks, P. W. (2020). Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts. PLoS Medicine, 17(6), 1-27. [e1003149].