The accuracy and reliability of crowdsource annotations of digital retinal images

Danny Mitry (Lead / Corresponding author), Kris Zutis (Lead / Corresponding author), Baljean Dhillon, Tunde Peto, Shabina Hayat, Kay-Tee Khaw, James E. Morgan, Wendy Moncur, Emanuele Trucco, Paul J. Foster, UK Biobank Eye and Vision Consortium

Research output: Contribution to journalArticle

86 Downloads (Pure)

Abstract

PURPOSE: Crowdsourcing is based on outsourcing computationally intensive tasks to numerous individuals in the online community who have no formal training. Our aim was to develop a novel online tool designed to facilitate large-scale annotation of digital retinal images, and to assess the accuracy of crowdsource grading using this tool, comparing it to expert classification.

METHODS: We used 100 retinal fundus photograph images with predetermined disease criteria selected by two experts from a large cohort study. The Amazon Mechanical Turk Web platform was used to drive traffic to our site so anonymous workers could perform a classification and annotation task of the fundus photographs in our dataset after a short training exercise. Three groups were assessed: masters only, nonmasters only and nonmasters with compulsory training. We calculated the sensitivity, specificity, and area under the curve (AUC) of receiver operating characteristic (ROC) plots for all classifications compared to expert grading, and used the Dice coefficient and consensus threshold to assess annotation accuracy.

RESULTS: In total, we received 5389 annotations for 84 images (excluding 16 training images) in 2 weeks. A specificity and sensitivity of 71% (95% confidence interval [CI], 69%-74%) and 87% (95% CI, 86%-88%) was achieved for all classifications. The AUC in this study for all classifications combined was 0.93 (95% CI, 0.91-0.96). For image annotation, a maximal Dice coefficient (∼0.6) was achieved with a consensus threshold of 0.25.

CONCLUSIONS: This study supports the hypothesis that annotation of abnormalities in retinal images by ophthalmologically naive individuals is comparable to expert annotation. The highest AUC and agreement with expert annotation was achieved in the nonmasters with compulsory training group.

TRANSLATIONAL RELEVANCE: The use of crowdsourcing as a technique for retinal image analysis may be comparable to expert graders and has the potential to deliver timely, accurate, and cost-effective image analysis.

Original languageEnglish
Article number6
Number of pages9
JournalTranslational Vision Science and Technology
Volume5
Issue number5
Early online date21 Sep 2016
DOIs
Publication statusPublished - 21 Sep 2016

Fingerprint

Crowdsourcing
Area Under Curve
Confidence Intervals
Image analysis
Outsourced Services
Sensitivity and Specificity
Outsourcing
ROC Curve
Cohort Studies
Exercise
Costs and Cost Analysis

Cite this

Mitry, D., Zutis, K., Dhillon, B., Peto, T., Hayat, S., Khaw, K-T., ... UK Biobank Eye and Vision Consortium (2016). The accuracy and reliability of crowdsource annotations of digital retinal images. Translational Vision Science and Technology, 5(5), [6]. https://doi.org/10.1167/tvst.5.5.6
Mitry, Danny ; Zutis, Kris ; Dhillon, Baljean ; Peto, Tunde ; Hayat, Shabina ; Khaw, Kay-Tee ; Morgan, James E. ; Moncur, Wendy ; Trucco, Emanuele ; Foster, Paul J. ; UK Biobank Eye and Vision Consortium. / The accuracy and reliability of crowdsource annotations of digital retinal images. In: Translational Vision Science and Technology. 2016 ; Vol. 5, No. 5.
@article{9d1ba1cc63d84652badc68e04003096c,
title = "The accuracy and reliability of crowdsource annotations of digital retinal images",
abstract = "PURPOSE: Crowdsourcing is based on outsourcing computationally intensive tasks to numerous individuals in the online community who have no formal training. Our aim was to develop a novel online tool designed to facilitate large-scale annotation of digital retinal images, and to assess the accuracy of crowdsource grading using this tool, comparing it to expert classification.METHODS: We used 100 retinal fundus photograph images with predetermined disease criteria selected by two experts from a large cohort study. The Amazon Mechanical Turk Web platform was used to drive traffic to our site so anonymous workers could perform a classification and annotation task of the fundus photographs in our dataset after a short training exercise. Three groups were assessed: masters only, nonmasters only and nonmasters with compulsory training. We calculated the sensitivity, specificity, and area under the curve (AUC) of receiver operating characteristic (ROC) plots for all classifications compared to expert grading, and used the Dice coefficient and consensus threshold to assess annotation accuracy.RESULTS: In total, we received 5389 annotations for 84 images (excluding 16 training images) in 2 weeks. A specificity and sensitivity of 71{\%} (95{\%} confidence interval [CI], 69{\%}-74{\%}) and 87{\%} (95{\%} CI, 86{\%}-88{\%}) was achieved for all classifications. The AUC in this study for all classifications combined was 0.93 (95{\%} CI, 0.91-0.96). For image annotation, a maximal Dice coefficient (∼0.6) was achieved with a consensus threshold of 0.25.CONCLUSIONS: This study supports the hypothesis that annotation of abnormalities in retinal images by ophthalmologically naive individuals is comparable to expert annotation. The highest AUC and agreement with expert annotation was achieved in the nonmasters with compulsory training group.TRANSLATIONAL RELEVANCE: The use of crowdsourcing as a technique for retinal image analysis may be comparable to expert graders and has the potential to deliver timely, accurate, and cost-effective image analysis.",
author = "Danny Mitry and Kris Zutis and Baljean Dhillon and Tunde Peto and Shabina Hayat and Kay-Tee Khaw and Morgan, {James E.} and Wendy Moncur and Emanuele Trucco and Foster, {Paul J.} and {UK Biobank Eye and Vision Consortium}",
note = "Supported by grants from Fight for Sight (London), Special Trustees of Moorfields Eye Hospital and NIHR Biomedical Research Centre at Moorfields Eye Hospital, and UCL Institute of Ophthalmology. EPIC-Norfolk infrastructure and core functions are supported by grants from the Medical Research Council and Cancer Research UK. The clinic for the third health examination was funded by Research into Ageing. No author has any financial or proprietary interest in any product mentioned.",
year = "2016",
month = "9",
day = "21",
doi = "10.1167/tvst.5.5.6",
language = "English",
volume = "5",
journal = "Translational Vision Science and Technology",
issn = "2164-2591",
publisher = "Association for Research in Vision and Ophthalmology",
number = "5",

}

Mitry, D, Zutis, K, Dhillon, B, Peto, T, Hayat, S, Khaw, K-T, Morgan, JE, Moncur, W, Trucco, E, Foster, PJ & UK Biobank Eye and Vision Consortium 2016, 'The accuracy and reliability of crowdsource annotations of digital retinal images', Translational Vision Science and Technology, vol. 5, no. 5, 6. https://doi.org/10.1167/tvst.5.5.6

The accuracy and reliability of crowdsource annotations of digital retinal images. / Mitry, Danny (Lead / Corresponding author); Zutis, Kris (Lead / Corresponding author); Dhillon, Baljean; Peto, Tunde; Hayat, Shabina; Khaw, Kay-Tee; Morgan, James E.; Moncur, Wendy; Trucco, Emanuele; Foster, Paul J.; UK Biobank Eye and Vision Consortium.

In: Translational Vision Science and Technology, Vol. 5, No. 5, 6, 21.09.2016.

Research output: Contribution to journalArticle

TY - JOUR

T1 - The accuracy and reliability of crowdsource annotations of digital retinal images

AU - Mitry, Danny

AU - Zutis, Kris

AU - Dhillon, Baljean

AU - Peto, Tunde

AU - Hayat, Shabina

AU - Khaw, Kay-Tee

AU - Morgan, James E.

AU - Moncur, Wendy

AU - Trucco, Emanuele

AU - Foster, Paul J.

AU - UK Biobank Eye and Vision Consortium

N1 - Supported by grants from Fight for Sight (London), Special Trustees of Moorfields Eye Hospital and NIHR Biomedical Research Centre at Moorfields Eye Hospital, and UCL Institute of Ophthalmology. EPIC-Norfolk infrastructure and core functions are supported by grants from the Medical Research Council and Cancer Research UK. The clinic for the third health examination was funded by Research into Ageing. No author has any financial or proprietary interest in any product mentioned.

PY - 2016/9/21

Y1 - 2016/9/21

N2 - PURPOSE: Crowdsourcing is based on outsourcing computationally intensive tasks to numerous individuals in the online community who have no formal training. Our aim was to develop a novel online tool designed to facilitate large-scale annotation of digital retinal images, and to assess the accuracy of crowdsource grading using this tool, comparing it to expert classification.METHODS: We used 100 retinal fundus photograph images with predetermined disease criteria selected by two experts from a large cohort study. The Amazon Mechanical Turk Web platform was used to drive traffic to our site so anonymous workers could perform a classification and annotation task of the fundus photographs in our dataset after a short training exercise. Three groups were assessed: masters only, nonmasters only and nonmasters with compulsory training. We calculated the sensitivity, specificity, and area under the curve (AUC) of receiver operating characteristic (ROC) plots for all classifications compared to expert grading, and used the Dice coefficient and consensus threshold to assess annotation accuracy.RESULTS: In total, we received 5389 annotations for 84 images (excluding 16 training images) in 2 weeks. A specificity and sensitivity of 71% (95% confidence interval [CI], 69%-74%) and 87% (95% CI, 86%-88%) was achieved for all classifications. The AUC in this study for all classifications combined was 0.93 (95% CI, 0.91-0.96). For image annotation, a maximal Dice coefficient (∼0.6) was achieved with a consensus threshold of 0.25.CONCLUSIONS: This study supports the hypothesis that annotation of abnormalities in retinal images by ophthalmologically naive individuals is comparable to expert annotation. The highest AUC and agreement with expert annotation was achieved in the nonmasters with compulsory training group.TRANSLATIONAL RELEVANCE: The use of crowdsourcing as a technique for retinal image analysis may be comparable to expert graders and has the potential to deliver timely, accurate, and cost-effective image analysis.

AB - PURPOSE: Crowdsourcing is based on outsourcing computationally intensive tasks to numerous individuals in the online community who have no formal training. Our aim was to develop a novel online tool designed to facilitate large-scale annotation of digital retinal images, and to assess the accuracy of crowdsource grading using this tool, comparing it to expert classification.METHODS: We used 100 retinal fundus photograph images with predetermined disease criteria selected by two experts from a large cohort study. The Amazon Mechanical Turk Web platform was used to drive traffic to our site so anonymous workers could perform a classification and annotation task of the fundus photographs in our dataset after a short training exercise. Three groups were assessed: masters only, nonmasters only and nonmasters with compulsory training. We calculated the sensitivity, specificity, and area under the curve (AUC) of receiver operating characteristic (ROC) plots for all classifications compared to expert grading, and used the Dice coefficient and consensus threshold to assess annotation accuracy.RESULTS: In total, we received 5389 annotations for 84 images (excluding 16 training images) in 2 weeks. A specificity and sensitivity of 71% (95% confidence interval [CI], 69%-74%) and 87% (95% CI, 86%-88%) was achieved for all classifications. The AUC in this study for all classifications combined was 0.93 (95% CI, 0.91-0.96). For image annotation, a maximal Dice coefficient (∼0.6) was achieved with a consensus threshold of 0.25.CONCLUSIONS: This study supports the hypothesis that annotation of abnormalities in retinal images by ophthalmologically naive individuals is comparable to expert annotation. The highest AUC and agreement with expert annotation was achieved in the nonmasters with compulsory training group.TRANSLATIONAL RELEVANCE: The use of crowdsourcing as a technique for retinal image analysis may be comparable to expert graders and has the potential to deliver timely, accurate, and cost-effective image analysis.

UR - http://tvst.arvojournals.org/article.aspx?articleid=2589774

U2 - 10.1167/tvst.5.5.6

DO - 10.1167/tvst.5.5.6

M3 - Article

C2 - 27668130

VL - 5

JO - Translational Vision Science and Technology

JF - Translational Vision Science and Technology

SN - 2164-2591

IS - 5

M1 - 6

ER -