TY - JOUR
T1 - Prepare to Succeed
T2 - British Association of Dermatologists 102nd Annual Meeting
AU - Chin, Gillian X. M.
AU - Suveges, Tamas
AU - Carse, Jacob
AU - Butt, Sanaa
AU - Muthiah, Shareen
AU - Morton, Colin
AU - Trucco, Emanuele
AU - Proby, Charlotte
AU - McKenna, Stephen
AU - Fleming, Colin
N1 - Copyright:
© 2022 British Association of Dermatologists.
PY - 2022/7/5
Y1 - 2022/7/5
N2 - Most skin image-based artificial intelligence (AI) systems are trained on publicly available datasets that are of high resolution, well centred and low noise. These do not represent most real-world community-captured photographs. We are developing AI to triage community-captured images; early results demonstrate this may be useful in simple triage and have diagnostic utility. We have accumulated a repository of > 85 000 skin images referred to secondary care from primary care over 10 years. We report our experience preparing real-world data to train AI for skin cancer triage. Primary care images previously referred from primary to secondary care were extracted, encrypted, anonymized and stored with ethics approval. Each was manually reviewed to remove patient identifiers and annotated with ground-truth diagnoses. In 2021, we processed 6533 images. Fifty-three per cent (n = 3549) were suitable, while 47% (n = 3150) were excluded for clinical or technical reasons. Clinical reasons included (i) privacy (patient or biometric identifiers, facial features, tattoos); (ii) sensitive images (genitals); (iii) diagnostic uncertainty (nonattendance, inconclusive biopsy, clinician uncertainty); and (iv) nonskin images. Technical reasons included (i) quality (blurry, poor lighting); (ii) duplicates; and (iii) obstructed view (hair, tattoos). To date, 53·4% (n = 1897) of suitable images have been transferred for AI testing, with a focus on 11 classes of skin lesion. Data were not evenly distributed across classes. Diagnosis such as melanoma in situ (0·8%) was under-represented, with higher proportions of benign diagnoses such as seborrhoeic keratosis (23%). Ground-truth diagnoses were obtained from consultant-level image diagnoses (12%), consultant-level consultation (31%) and pathology (34%). The remaining 46·5% (n = 1652) consisted of controls and skin conditions outside these 11 classes. Current AI classifiers often generalize poorly across healthcare systems, acquisition protocols and populations. To illustrate, we trained Bayesian neural networks on images sourced from public datasets SD-260 and ISIC 2019 (contains dermoscopy) to distinguish seven skin lesion classes, and then tested on our dataset. The test misclassification rate was 28% lower with SD-260 training than with ISIC, consistent with closer match between our images and SD-260 (no dermoscopy). Considering the costs of different types of misclassifications, this translated into a 42% cost reduction. Our experience reveals the challenges of preparing these datasets, including high exclusion rates, significant clinician time and uneven class distribution. To merge AI diagnostic tools into real-world clinical workflows, it is important that algorithms are developed and validated on datasets representative of everyday dermatology workload. We are working to develop practical interoperable standards to prepare anonymized data for a national skin AI database.
AB - Most skin image-based artificial intelligence (AI) systems are trained on publicly available datasets that are of high resolution, well centred and low noise. These do not represent most real-world community-captured photographs. We are developing AI to triage community-captured images; early results demonstrate this may be useful in simple triage and have diagnostic utility. We have accumulated a repository of > 85 000 skin images referred to secondary care from primary care over 10 years. We report our experience preparing real-world data to train AI for skin cancer triage. Primary care images previously referred from primary to secondary care were extracted, encrypted, anonymized and stored with ethics approval. Each was manually reviewed to remove patient identifiers and annotated with ground-truth diagnoses. In 2021, we processed 6533 images. Fifty-three per cent (n = 3549) were suitable, while 47% (n = 3150) were excluded for clinical or technical reasons. Clinical reasons included (i) privacy (patient or biometric identifiers, facial features, tattoos); (ii) sensitive images (genitals); (iii) diagnostic uncertainty (nonattendance, inconclusive biopsy, clinician uncertainty); and (iv) nonskin images. Technical reasons included (i) quality (blurry, poor lighting); (ii) duplicates; and (iii) obstructed view (hair, tattoos). To date, 53·4% (n = 1897) of suitable images have been transferred for AI testing, with a focus on 11 classes of skin lesion. Data were not evenly distributed across classes. Diagnosis such as melanoma in situ (0·8%) was under-represented, with higher proportions of benign diagnoses such as seborrhoeic keratosis (23%). Ground-truth diagnoses were obtained from consultant-level image diagnoses (12%), consultant-level consultation (31%) and pathology (34%). The remaining 46·5% (n = 1652) consisted of controls and skin conditions outside these 11 classes. Current AI classifiers often generalize poorly across healthcare systems, acquisition protocols and populations. To illustrate, we trained Bayesian neural networks on images sourced from public datasets SD-260 and ISIC 2019 (contains dermoscopy) to distinguish seven skin lesion classes, and then tested on our dataset. The test misclassification rate was 28% lower with SD-260 training than with ISIC, consistent with closer match between our images and SD-260 (no dermoscopy). Considering the costs of different types of misclassifications, this translated into a 42% cost reduction. Our experience reveals the challenges of preparing these datasets, including high exclusion rates, significant clinician time and uneven class distribution. To merge AI diagnostic tools into real-world clinical workflows, it is important that algorithms are developed and validated on datasets representative of everyday dermatology workload. We are working to develop practical interoperable standards to prepare anonymized data for a national skin AI database.
U2 - 10.1111/bjd.21386
DO - 10.1111/bjd.21386
M3 - Meeting abstract
SN - 0007-0963
VL - 187
SP - 125
JO - British Journal of Dermatology
JF - British Journal of Dermatology
IS - S1
M1 - BT13
Y2 - 5 July 2022 through 7 July 2022
ER -