Pseudonymization for artificial intelligence skin lesion datasets: a real-world feasibility study

Trisha Chin, Gillian Chin, James Sutherland, Andrew Coon, Colin Morton, Colin Fleming

Research output: Contribution to journalMeeting abstractpeer-review

37 Downloads (Pure)

Abstract

The use of patient data for artificial intelligence (AI) research should be transparent, rigorous and accountable. In the UK, the General Data Protection Regulation, Data Protection Act 2018 and General Medical Council govern data handling and patients’ rights to privacy. We report on our multistep pseudonymization protocol for real-world skin lesion datasets, in preparation for research within a trusted research environment (TRE). Firstly, patients referred from primary care are triaged for community locality and imaging centre (CLIC) suitability. There, trained healthcare professionals capture lesion images (dermoscopic, macroscopic and regional) and patient information using a mobile application on trust-certified devices. Training is standardized across all CLIC sites, with specific anonymization training on removing in-frame clothing and jewellery, device positioning, and magnification to minimize identifiable features like eyes, nose and ears. Lesion datasets (paired images and clinical information) are subsequently transferred to an image management system (IMS) hosted on our trust-secured network. Within the IMS, images are manually inspected, and those with identifiable tattoos and piercings are excluded. All regional images are also excluded from transfer to the TRE. Before transfer to the TRE, images undergo a further round of review. Data fields are manually checked for identifiable patient information, patient names are removed, and dates of birth are rounded to 3-month granularity. The job ID, patient’s hospital number, date of clinical episode and responsible photographer are replaced with randomly generated project-specific identifiers. In an initial study period, 658 of 963 (68%) captured lesion datasets have undergone IMS manual inspection. Of these, 24 lesion datasets were excluded for identifiable features, 10 (41%) for more than one-third of the face being visible, 9 (38%) for full iris visibility, and 5 (21%) for tattoos. On breakdown by anatomical location these images were of the face (19, 80%), torso (2, 8%), limbs (2, 8%) and neck (1, 4%). The remaining 634 datasets (96%) were securely transferred to the TRE, where a further 5% were excluded due to potential identifiability. Although full anonymization is desirable, it is usually achieved by aggregating patient data. Pseudonymization, which allows for future reidentification in a secured fashion, strikes the balance between patient data privacy and clinical governance, while retaining a level of granularity sufficient for meaningful analysis. Currently, this protocol is manually intensive with room to partly automate. Use of common standardized protocols will strengthen the public trust in clinical AI.
Original languageEnglish
Pages (from-to)i199
Number of pages1
JournalBritish Journal of Dermatology
Volume191
Issue number1
DOIs
Publication statusPublished - 28 Jun 2024
EventBritish Association of Dermatologists Annual Meeting: Teledermatology & Digital Dermatology Symposium - Manchester Central, Manchester, United Kingdom
Duration: 2 Jul 20244 Jul 2024
Conference number: 104

Fingerprint

Dive into the research topics of 'Pseudonymization for artificial intelligence skin lesion datasets: a real-world feasibility study'. Together they form a unique fingerprint.

Cite this