An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis

Dominic Cushnan (Lead / Corresponding author), Oscar Bennett, Rosalind Berka, Ottavia Bertolli, Ashwin Chopra, Samie Dorgham, Alberto Favaro, Tara Ganepola, Mark Halling-Brown, Gergely Imreh, Joseph Jacob, Emily Jefferson, François Lemarchand, Daniel Schofield, Jeremy C. Wyatt,

Research output: Contribution to journalReview articlepeer-review

6 Citations (Scopus)
74 Downloads (Pure)


Background: The National COVID-19 Chest Imaging Database (NCCID) is a centralized database containing mainly chest X-rays and computed tomography scans from patients across the UK. The objective of the initiative is to support a better understanding of the coronavirus SARS-CoV-2 disease (COVID-19) and the development of machine learning technologies that will improve care for patients hospitalized with a severe COVID-19 infection. This article introduces the training dataset, including a snapshot analysis covering the completeness of clinical data, and availability of image data for the various use-cases (diagnosis, prognosis, longitudinal risk). An additional cohort analysis measures how well the NCCID represents the wider COVID-19-affected UK population in terms of geographic, demographic, and temporal coverage.

Findings: The NCCID offers high-quality DICOM images acquired across a variety of imaging machinery; multiple time points including historical images are available for a subset of patients. This volume and variety make the database well suited to development of diagnostic/prognostic models for COVID-associated respiratory conditions. Historical images and clinical data may aid long-term risk stratification, particularly as availability of comorbidity data increases through linkage to other resources. The cohort analysis revealed good alignment to general UK COVID-19 statistics for some categories, e.g., sex, whilst identifying areas for improvements to data collection methods, particularly geographic coverage.

Conclusion: The NCCID is a growing resource that provides researchers with a large, high-quality database that can be leveraged both to support the response to the COVID-19 pandemic and as a test bed for building clinically viable medical imaging models.

Original languageEnglish
Article numbergiab076
Number of pages20
Issue number11
Early online date25 Nov 2021
Publication statusPublished - Nov 2021


  • COVID-19
  • SARS-CoV2
  • machine learning
  • medical imaging
  • thoracic imaging

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications


Dive into the research topics of 'An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis'. Together they form a unique fingerprint.

Cite this