Prepare to Succeed: Real-World Image Datasets for Artificial Intelligence in Skin Cancer Triage

Gillian X. M. Chin (Lead / Corresponding author), Tamas Suveges, Jacob Carse, Sanaa Butt, Shareen Muthiah, Colin Morton, Emanuele Trucco, Charlotte Proby, Stephen McKenna, Colin Fleming

Research output: Contribution to journalMeeting abstractpeer-review

27 Downloads (Pure)


Most skin image-based artificial intelligence (AI) systems are trained on publicly available datasets that are of high resolution, well centred and low noise. These do not represent most real-world community-captured photographs. We are developing AI to triage community-captured images; early results demonstrate this may be useful in simple triage and have diagnostic utility. We have accumulated a repository of > 85 000 skin images referred to secondary care from primary care over 10 years. We report our experience preparing real-world data to train AI for skin cancer triage. Primary care images previously referred from primary to secondary care were extracted, encrypted, anonymized and stored with ethics approval. Each was manually reviewed to remove patient identifiers and annotated with ground-truth diagnoses. In 2021, we processed 6533 images. Fifty-three per cent (n = 3549) were suitable, while 47% (n = 3150) were excluded for clinical or technical reasons. Clinical reasons included (i) privacy (patient or biometric identifiers, facial features, tattoos); (ii) sensitive images (genitals); (iii) diagnostic uncertainty (nonattendance, inconclusive biopsy, clinician uncertainty); and (iv) nonskin images. Technical reasons included (i) quality (blurry, poor lighting); (ii) duplicates; and (iii) obstructed view (hair, tattoos). To date, 53·4% (n = 1897) of suitable images have been transferred for AI testing, with a focus on 11 classes of skin lesion. Data were not evenly distributed across classes. Diagnosis such as melanoma in situ (0·8%) was under-represented, with higher proportions of benign diagnoses such as seborrhoeic keratosis (23%). Ground-truth diagnoses were obtained from consultant-level image diagnoses (12%), consultant-level consultation (31%) and pathology (34%). The remaining 46·5% (n = 1652) consisted of controls and skin conditions outside these 11 classes. Current AI classifiers often generalize poorly across healthcare systems, acquisition protocols and populations. To illustrate, we trained Bayesian neural networks on images sourced from public datasets SD-260 and ISIC 2019 (contains dermoscopy) to distinguish seven skin lesion classes, and then tested on our dataset. The test misclassification rate was 28% lower with SD-260 training than with ISIC, consistent with closer match between our images and SD-260 (no dermoscopy). Considering the costs of different types of misclassifications, this translated into a 42% cost reduction. Our experience reveals the challenges of preparing these datasets, including high exclusion rates, significant clinician time and uneven class distribution. To merge AI diagnostic tools into real-world clinical workflows, it is important that algorithms are developed and validated on datasets representative of everyday dermatology workload. We are working to develop practical interoperable standards to prepare anonymized data for a national skin AI database.

Original languageEnglish
Article numberBT13
Pages (from-to)125
Number of pages1
JournalBritish Journal of Dermatology
Issue numberS1
Publication statusPublished - 5 Jul 2022
EventBritish Association of Dermatologists 102nd Annual Meeting - Glasgow, United Kingdom
Duration: 5 Jul 20227 Jul 2022


Dive into the research topics of 'Prepare to Succeed: Real-World Image Datasets for Artificial Intelligence in Skin Cancer Triage'. Together they form a unique fingerprint.

Cite this