TY - JOUR
T1 - Pseudonymization for artificial intelligence skin lesion datasets
T2 - British Association of Dermatologists Annual Meeting
AU - Chin, Trisha
AU - Chin, Gillian
AU - Sutherland, James
AU - Coon, Andrew
AU - Morton, Colin
AU - Fleming, Colin
N1 - Conference code: 104
PY - 2024/6/28
Y1 - 2024/6/28
N2 - The use of patient data for artificial intelligence (AI) research should be transparent, rigorous and accountable. In the UK, the General Data Protection Regulation, Data Protection Act 2018 and General Medical Council govern data handling and patients’ rights to privacy. We report on our multistep pseudonymization protocol for real-world skin lesion datasets, in preparation for research within a trusted research environment (TRE). Firstly, patients referred from primary care are triaged for community locality and imaging centre (CLIC) suitability. There, trained healthcare professionals capture lesion images (dermoscopic, macroscopic and regional) and patient information using a mobile application on trust-certified devices. Training is standardized across all CLIC sites, with specific anonymization training on removing in-frame clothing and jewellery, device positioning, and magnification to minimize identifiable features like eyes, nose and ears. Lesion datasets (paired images and clinical information) are subsequently transferred to an image management system (IMS) hosted on our trust-secured network. Within the IMS, images are manually inspected, and those with identifiable tattoos and piercings are excluded. All regional images are also excluded from transfer to the TRE. Before transfer to the TRE, images undergo a further round of review. Data fields are manually checked for identifiable patient information, patient names are removed, and dates of birth are rounded to 3-month granularity. The job ID, patient’s hospital number, date of clinical episode and responsible photographer are replaced with randomly generated project-specific identifiers. In an initial study period, 658 of 963 (68%) captured lesion datasets have undergone IMS manual inspection. Of these, 24 lesion datasets were excluded for identifiable features, 10 (41%) for more than one-third of the face being visible, 9 (38%) for full iris visibility, and 5 (21%) for tattoos. On breakdown by anatomical location these images were of the face (19, 80%), torso (2, 8%), limbs (2, 8%) and neck (1, 4%). The remaining 634 datasets (96%) were securely transferred to the TRE, where a further 5% were excluded due to potential identifiability. Although full anonymization is desirable, it is usually achieved by aggregating patient data. Pseudonymization, which allows for future reidentification in a secured fashion, strikes the balance between patient data privacy and clinical governance, while retaining a level of granularity sufficient for meaningful analysis. Currently, this protocol is manually intensive with room to partly automate. Use of common standardized protocols will strengthen the public trust in clinical AI.
AB - The use of patient data for artificial intelligence (AI) research should be transparent, rigorous and accountable. In the UK, the General Data Protection Regulation, Data Protection Act 2018 and General Medical Council govern data handling and patients’ rights to privacy. We report on our multistep pseudonymization protocol for real-world skin lesion datasets, in preparation for research within a trusted research environment (TRE). Firstly, patients referred from primary care are triaged for community locality and imaging centre (CLIC) suitability. There, trained healthcare professionals capture lesion images (dermoscopic, macroscopic and regional) and patient information using a mobile application on trust-certified devices. Training is standardized across all CLIC sites, with specific anonymization training on removing in-frame clothing and jewellery, device positioning, and magnification to minimize identifiable features like eyes, nose and ears. Lesion datasets (paired images and clinical information) are subsequently transferred to an image management system (IMS) hosted on our trust-secured network. Within the IMS, images are manually inspected, and those with identifiable tattoos and piercings are excluded. All regional images are also excluded from transfer to the TRE. Before transfer to the TRE, images undergo a further round of review. Data fields are manually checked for identifiable patient information, patient names are removed, and dates of birth are rounded to 3-month granularity. The job ID, patient’s hospital number, date of clinical episode and responsible photographer are replaced with randomly generated project-specific identifiers. In an initial study period, 658 of 963 (68%) captured lesion datasets have undergone IMS manual inspection. Of these, 24 lesion datasets were excluded for identifiable features, 10 (41%) for more than one-third of the face being visible, 9 (38%) for full iris visibility, and 5 (21%) for tattoos. On breakdown by anatomical location these images were of the face (19, 80%), torso (2, 8%), limbs (2, 8%) and neck (1, 4%). The remaining 634 datasets (96%) were securely transferred to the TRE, where a further 5% were excluded due to potential identifiability. Although full anonymization is desirable, it is usually achieved by aggregating patient data. Pseudonymization, which allows for future reidentification in a secured fashion, strikes the balance between patient data privacy and clinical governance, while retaining a level of granularity sufficient for meaningful analysis. Currently, this protocol is manually intensive with room to partly automate. Use of common standardized protocols will strengthen the public trust in clinical AI.
U2 - 10.1093/bjd/ljae090.421
DO - 10.1093/bjd/ljae090.421
M3 - Meeting abstract
SN - 0007-0963
VL - 191
SP - i199
JO - British Journal of Dermatology
JF - British Journal of Dermatology
IS - 1
Y2 - 2 July 2024 through 4 July 2024
ER -