From Data to Diagnosis: Skin Cancer Image Datasets for Artificial Intelligence

David Wen (Lead / Corresponding author), Andrew Soltan, Emanuele Trucco, Rubeta N Martin

Research output: Contribution to journalReview articlepeer-review


Artificial Intelligence (AI) solutions for skin cancer diagnosis continue to gain momentum, edging closer towards broad clinical use. These AI models, particularly deep learning architectures, require large digital image datasets for development. This review provides an overview of the datasets used to develop AI algorithms and highlights the importance of dataset transparency for evaluation of algorithm generalisability across varying populations and settings. Current challenges for curation of clinically valuable datasets are detailed, which include dataset shifts arising from demographic variations and differences in data collection methodologies, along with inconsistencies in labelling. These shifts can lead to differential algorithm performance, compromise of clinical utility, and the propagation of discriminatory biases when developed algorithms are implemented in mismatched populations. Limited representation of rare skin cancers and minoritised groups in existing datasets are highlighted which can further skew algorithm performance. Strategies to address these challenges are presented, which include improving transparency, representation and interoperability. Federated learning and generative methods, which may improve dataset size and diversity without compromising privacy, are also examined. Lastly, we discuss model-level techniques which may address biases entrained through the use of datasets derived from routine clinical care. As the role of AI in skin cancer diagnosis becomes more prominent, ensuring the robustness of underlying datasets is increasingly important.
Original languageEnglish
Article numberllae112
JournalClinical and Experimental Dermatology
Early online date29 Mar 2024
Publication statusE-pub ahead of print - 29 Mar 2024


Dive into the research topics of 'From Data to Diagnosis: Skin Cancer Image Datasets for Artificial Intelligence'. Together they form a unique fingerprint.

Cite this