TY - JOUR
T1 - Predicting and elucidating the etiology of fatty liver disease:
T2 - A machine learning modeling and validation study in the IMI DIRECT cohorts
AU - Atabaki-Pasdar, Naeimeh
AU - Ohlsson, Mattias
AU - Viñuela, Ana
AU - Frau, Francesca
AU - Pomares-Millan, Hugo
AU - Haid, Mark
AU - Jones, Angus G.
AU - Thomas, E. Louise
AU - Koivula, Robert W.
AU - Kurbasic, Azra
AU - Mutie, Pascal M.
AU - Fitipaldi, Hugo
AU - Fernandez, Juan
AU - Dawed, Adem Y.
AU - Giordano, Giuseppe N.
AU - Forgie, Ian M.
AU - McDonald, Timothy J.
AU - Rutters, Femke
AU - Cederberg, Henna
AU - Chabanova, Elizaveta
AU - Dale, Matilda
AU - Masi, Federico De
AU - Thomas, Cecilia Engel
AU - Allin, Kristine H.
AU - Hansen, Tue H.
AU - Heggie, Alison
AU - Hong, Mun-Gwan
AU - Elders, Petra J. M.
AU - Kennedy, Gwen
AU - Kokkola, Tarja
AU - Pedersen, Helle Krogh
AU - Mahajan, Anubha
AU - McEvoy, Donna
AU - Pattou, Francois
AU - Raverdy, Violeta
AU - Häussler, Ragna S.
AU - Sharma, Sapna
AU - Thomsen, Henrik S.
AU - Vangipurapu, Jagadish
AU - Vestergaard, Henrik
AU - 't Hart, Leen M
AU - Adamski, Jerzy
AU - Musholt, Petra B.
AU - Brage, Soren
AU - Brunak, Søren
AU - Dermitzakis, Emmanouil
AU - Frost, Gary
AU - Hansen, Torben
AU - Laakso, Markku
AU - Pedersen, Oluf
AU - Ridderstråle, Martin
AU - Ruetten, Hartmut
AU - Hattersley, Andrew T.
AU - Walker, Mark
AU - Beulens, Joline W. J.
AU - Mari, Andrea
AU - Schwenk, Jochen M.
AU - Gupta, Ramneek
AU - McCarthy, Mark I.
AU - Pearson, Ewan R.
AU - Bell, Jimmy D.
AU - Pavo, Imre
AU - Franks, Paul W.
N1 - Funding Information:
The work leading to this publication has received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement n 115317 (DIRECT), resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies' in kind contribution. NAP is supported in part by Henning och Johan Throne-Holsts Foundation, Hans Werth?n Foundation, an IRC award from the Swedish Foundation for Strategic Research and a European Research Council award ERC-2015-CoG - 681742_NASCENT. HPM is supported by an IRC award from the Swedish Foundation for Strategic Research and a European Research Council award ERC-2015-CoG - 681742_NASCENT. AGJ is supported by an NIHR Clinician Scientist award (17/0005624). RK is funded by the Novo Nordisk Foundation (NNF18OC0031650) as part of a postdoctoral fellowship, an IRC award from the Swedish Foundation for Strategic Research and a European Research Council award ERC-2015-CoG - 681742_NASCENT. AK, PM, HF, JF and GNG are supported by an IRC award from the Swedish Foundation for Strategic Research and a European Research Council award ERC-2015-CoG - 681742_NASCENT. TJM is funded by an NIHR clinical senior lecturer fellowship. S.Bru acknowledges support from the Novo Nordisk Foundation (grants NNF17OC0027594 and NNF14CC0001). ATH is a Wellcome Trust Senior Investigator and is also supported by the NIHR Exeter Clinical Research Facility. JMS acknowledges support from Science for Life Laboratory (Plasma Profiling Facility), Knut and Alice Wallenberg Foundation (Human Protein Atlas) and Erling-Persson Foundation (KTH Centre for Precision Medicine). MIM is supported by the following grants; Wellcome (090532, 098381, 106130, 203141, 212259); NIH (U01-DK105535). PWF is supported by an IRC award from the Swedish Foundation for Strategic Research and a European Research Council award ERC-2015-CoG - 681742_NASCENT. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
PY - 2020/6/19
Y1 - 2020/6/19
N2 - Background: Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning.Methods and findings: We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one.Conclusions: In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: https://www.predictliverfat.org/) and made it available to the community.Trial registration: ClinicalTrials.gov NCT03814915.
AB - Background: Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning.Methods and findings: We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one.Conclusions: In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: https://www.predictliverfat.org/) and made it available to the community.Trial registration: ClinicalTrials.gov NCT03814915.
UR - http://www.scopus.com/inward/record.url?scp=85086754493&partnerID=8YFLogxK
U2 - 10.1371/journal.pmed.1003149
DO - 10.1371/journal.pmed.1003149
M3 - Article
C2 - 32559194
SN - 1549-1277
VL - 17
JO - PLoS Medicine
JF - PLoS Medicine
IS - 6
M1 - e1003149
ER -