TY - JOUR
T1 - Seek COVER
T2 - using a disease proxy to rapidly develop and validate a personalized risk calculator for COVID-19 outcomes in an international network
AU - Williams, Ross D.
AU - Markus, Aniek F.
AU - Yang, Cynthia
AU - Duarte-Salles, Talita
AU - DuVall, Scott L.
AU - Falconer, Thomas
AU - Jonnagaddala, Jitendra
AU - Kim, Chungsoo
AU - Rho, Yeunsook
AU - Williams, Andrew E.
AU - Machado, Amanda Alberga
AU - An, Min Ho
AU - Aragón, María
AU - Areia, Carlos
AU - Burn, Edward
AU - Choi, Young Hwa
AU - Drakos, Iannis
AU - Abrahão, Maria Tereza Fernandes
AU - Fernández-Bertolín, Sergio
AU - Hripcsak, George
AU - Kaas-Hansen, Benjamin Skov
AU - Kandukuri, Prasanna L.
AU - Kors, Jan A.
AU - Kostka, Kristin
AU - Liaw, Siaw-Teng
AU - Lynch, Kristine E.
AU - Machnicki, Gerardo
AU - Matheny, Michael E.
AU - Morales, Daniel
AU - Nyberg, Fredrik
AU - Park, Rae Woong
AU - Prats-Uribe, Albert
AU - Pratt, Nicole
AU - Rao, Gowtham
AU - Reich, Christian G
AU - Rivera, Marcela
AU - Seinen, Tom
AU - Shoaibi, Azza
AU - Spotnitz, Matthew E.
AU - Steyerberg, Ewout W.
AU - Suchard, Marc A.
AU - You, Seng Chan
AU - Zhang, Lin
AU - Zhou, Lili
AU - Ryan, Patrick B.
AU - Prieto-Alhambra, Daniel
AU - Reps, Jenna M.
AU - Rijnbeek, Peter R.
N1 - This project has received support from the European Health Data and Evidence Network (EHDEN) project. EHDEN received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement No 806968. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA.
This project is funded by the Health Department from the Generalitat de Catalunya with a grant for research projects on SARS-CoV-2 and COVID-19 disease organized by the Direcció General de Recerca i Innovació en Salut.
The University of Oxford received a grant related to this work from the Bill & Melinda Gates Foundation (Investment ID INV-016201), and partial support from the UK National Institute for Health Research (NIHR) Oxford Biomedical Research Centre.
DPA is funded through a NIHR Senior Research Fellowship (Grant number SRF-2018-11-ST2–004). The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the National Institute for Health Research, or the Department of Health.
AP-U is supported by Fundacion Alfonso Martin Escudero and the Medical Research Council (grant numbers MR/K501256/1, MR/N013468/1).
BSKH is funded through Innovation Fund Denmark (5153-00002B) and the Novo Nordisk Foundation (NNF14CC0001).
This work was also supported by the Bio Industrial Strategic Technology Development Program (20001234) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea) and a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea [grant number: HI16C0992].
This project is part funded by the UNSW RIS grant.
This research received funding support from the US Department of Veterans Affairs and the VA Informatics and Computing Infrastructure (VA HSR RES 13–457). The views and opinions expressed are those of the authors and do not necessarily reflect those of the Department of Veterans Affairs or the United States Government.
© 2022. The Author(s).
PY - 2022/1/30
Y1 - 2022/1/30
N2 - Background: We investigated whether we could use influenza data to develop prediction models for COVID-19 to increase the speed at which prediction models can reliably be developed and validated early in a pandemic. We developed COVID-19 Estimated Risk (COVER) scores that quantify a patient's risk of hospital admission with pneumonia (COVER-H), hospitalization with pneumonia requiring intensive services or death (COVER-I), or fatality (COVER-F) in the 30-days following COVID-19 diagnosis using historical data from patients with influenza or flu-like symptoms and tested this in COVID-19 patients.Methods: We analyzed a federated network of electronic medical records and administrative claims data from 14 data sources and 6 countries containing data collected on or before 4/27/2020. We used a 2-step process to develop 3 scores using historical data from patients with influenza or flu-like symptoms any time prior to 2020. The first step was to create a data-driven model using LASSO regularized logistic regression, the covariates of which were used to develop aggregate covariates for the second step where the COVER scores were developed using a smaller set of features. These 3 COVER scores were then externally validated on patients with 1) influenza or flu-like symptoms and 2) confirmed or suspected COVID-19 diagnosis across 5 databases from South Korea, Spain, and the United States. Outcomes included i) hospitalization with pneumonia, ii) hospitalization with pneumonia requiring intensive services or death, and iii) death in the 30 days after index date.Results: Overall, 44,507 COVID-19 patients were included for model validation. We identified 7 predictors (history of cancer, chronic obstructive pulmonary disease, diabetes, heart disease, hypertension, hyperlipidemia, kidney disease) which combined with age and sex discriminated which patients would experience any of our three outcomes. The models achieved good performance in influenza and COVID-19 cohorts. For COVID-19 the AUC ranges were, COVER-H: 0.69-0.81, COVER-I: 0.73-0.91, and COVER-F: 0.72-0.90. Calibration varied across the validations with some of the COVID-19 validations being less well calibrated than the influenza validations.Conclusions: This research demonstrated the utility of using a proxy disease to develop a prediction model. The 3 COVER models with 9-predictors that were developed using influenza data perform well for COVID-19 patients for predicting hospitalization, intensive services, and fatality. The scores showed good discriminatory performance which transferred well to the COVID-19 population. There was some miscalibration in the COVID-19 validations, which is potentially due to the difference in symptom severity between the two diseases. A possible solution for this is to recalibrate the models in each location before use.
AB - Background: We investigated whether we could use influenza data to develop prediction models for COVID-19 to increase the speed at which prediction models can reliably be developed and validated early in a pandemic. We developed COVID-19 Estimated Risk (COVER) scores that quantify a patient's risk of hospital admission with pneumonia (COVER-H), hospitalization with pneumonia requiring intensive services or death (COVER-I), or fatality (COVER-F) in the 30-days following COVID-19 diagnosis using historical data from patients with influenza or flu-like symptoms and tested this in COVID-19 patients.Methods: We analyzed a federated network of electronic medical records and administrative claims data from 14 data sources and 6 countries containing data collected on or before 4/27/2020. We used a 2-step process to develop 3 scores using historical data from patients with influenza or flu-like symptoms any time prior to 2020. The first step was to create a data-driven model using LASSO regularized logistic regression, the covariates of which were used to develop aggregate covariates for the second step where the COVER scores were developed using a smaller set of features. These 3 COVER scores were then externally validated on patients with 1) influenza or flu-like symptoms and 2) confirmed or suspected COVID-19 diagnosis across 5 databases from South Korea, Spain, and the United States. Outcomes included i) hospitalization with pneumonia, ii) hospitalization with pneumonia requiring intensive services or death, and iii) death in the 30 days after index date.Results: Overall, 44,507 COVID-19 patients were included for model validation. We identified 7 predictors (history of cancer, chronic obstructive pulmonary disease, diabetes, heart disease, hypertension, hyperlipidemia, kidney disease) which combined with age and sex discriminated which patients would experience any of our three outcomes. The models achieved good performance in influenza and COVID-19 cohorts. For COVID-19 the AUC ranges were, COVER-H: 0.69-0.81, COVER-I: 0.73-0.91, and COVER-F: 0.72-0.90. Calibration varied across the validations with some of the COVID-19 validations being less well calibrated than the influenza validations.Conclusions: This research demonstrated the utility of using a proxy disease to develop a prediction model. The 3 COVER models with 9-predictors that were developed using influenza data perform well for COVID-19 patients for predicting hospitalization, intensive services, and fatality. The scores showed good discriminatory performance which transferred well to the COVID-19 population. There was some miscalibration in the COVID-19 validations, which is potentially due to the difference in symptom severity between the two diseases. A possible solution for this is to recalibrate the models in each location before use.
KW - COVID-19
KW - COVID-19 Testing
KW - Humans
KW - Influenza, Human/epidemiology
KW - Pneumonia
KW - SARS-CoV-2
KW - United States
KW - Patient-level prediction modelling
KW - Risk score
UR - http://www.scopus.com/inward/record.url?scp=85123905957&partnerID=8YFLogxK
U2 - 10.1186/s12874-022-01505-z
DO - 10.1186/s12874-022-01505-z
M3 - Article
C2 - 35094685
SN - 1471-2288
VL - 22
SP - 1
EP - 13
JO - BMC Medical Research Methodology
JF - BMC Medical Research Methodology
M1 - 35
ER -