Supporting data for "An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis"

  • Dominic Cushnan (Creator)
  • Oscar Bennett (Creator)
  • Rosalind Berka (Creator)
  • Ottavia Bertolli (Creator)
  • Ashwin Chopra (Creator)
  • Samie Dorgham (Creator)
  • Alberto Favaro (Creator)
  • Tara Ganepola (Creator)
  • Mark Halling-Brown (Creator)
  • Gergely Imreh (Creator)
  • Joseph Jacob (Creator)
  • Emily Jefferson (Creator)
  • François Lemarchand (Creator)
  • Daniel Schofield (Creator)
  • Jeremy C. Wyatt (Creator)



The National COVID-19 Chest Imaging Database (NCCID) is a centralised database containing chest X-rays, Computed Tomography (CT) scans and cardiac Magnetic Resonance Images (MRI) from patients across the UK. The objective of the initiative is to support a better understanding of the coronavirus SARS-CoV-2 disease (COVID-19) and the development of machine learning technologies that will improve care for patients hospitalised with a severe COVID-19 infection. The NCCID is now accumulating data from 20 NHS sites across England and Wales, with a total contribution of approximately 25,000 imaging studies in the training set (at time of writing) and is actively being used as a research tool by several organisations.

This paper introduces the training dataset, including a snapshot analysis covering: the completeness of clinical data, and availability of image data for the various use-cases (diagnosis, prognosis, longitudinal risk). Findings suggests the NCCID is well suited for developing clinical models, but developers should take care to mitigate the common model confounders, e.g., equipment type, that are highlighted. In addition, a cohort analysis was performed to measure the representativeness of the NCCID to the wider COVID-19 affected population. Three major aspects were included: geographic, demographic and temporal coverage, revealing good alignment in some categories, e.g., sex, whilst also identifying areas for improvements to data collection methods, particularly with respect to geographic coverage.

The NCCID is a growing resource that provides researchers with a large, high-quality database that can be leveraged to support the response to the COVID-19 pandemic.
Date made available8 Sept 2021
PublisherGigaScience Database

Cite this