The quantitative data was collected during an online study completed using the Qualtrics survey platform. The study period was July and August 2021. Thirty Participants assessed 30 clinical cases of suspected retinal conditions with and without AI support. They gave a suggested diagnosis for each case from a list of nine options. They also gave their confidence in their assessment of each case using a 5-point Likert scale and reported their overall trust in AI outputs, with and without segmentations, using a 5-point Likert scale.
The data contains a summary of the participant responses based on the number of 'correct' diagnoses they gave, the number of cases where they agreed with AI suggestions and their trust in the AI support. This data was used for analysis in the paper linked to the data. The diagnostic confidence data for each case from the 30 participants is also given. The data contains a key to explain the values shown.
Artificial intelligence (AI) has great potential in ophthalmology; however, there has been limited clinical integration. Our study investigated how ambiguous outputs from an AI diagnostic support system (AI-DSS) affected diagnostic responses from optometrists when assessing cases of suspected retinal disease.
Thirty optometrists at Moorfields Eye Hospital (15 more experienced, 15 less) assessed 30 clinical cases in counterbalanced order. For ten cases, participants saw an optical coherence tomography (OCT) scan, basic clinical information and a retinal photograph (‘no AI’). For another ten, they were also given the AI-generated OCT-based probabilistic diagnosis (‘AI diagnosis’); and for ten, both AI-diagnosis and an AI-generated OCT segmentation (‘AI diagnosis + segmentation’) were provided. Cases were matched across the three types of presentation and were purposely selected to include 40% ambiguous and 20% incorrect AI outputs.
Optometrist diagnostic agreement with the predefined reference standard was lowest for the ‘AI diagnosis + segmentation’ presentation (204/300, 68%) compared to both ‘AI diagnosis’ (224/300, 75% p = 0·010), and ‘no Al’ (242/300, 81%, p = < 0·001). Agreement in the ‘AI diagnosis’ presentation was lower (p = 0·049) than in the ‘no AI’. Agreement with AI diagnosis consistent with the reference standard decreased (174/210 vs 199/210, p = 0·003), but participants trusted the AI more (p = 0·029) when segmentations were displayed. There was no significant effect of practitioner experience on diagnostic responses (p = 0·24). More experienced participants were more confident (p = 0·012) and trusted the AI less (p = 0·038). Our findings also highlighted issues around reference standard definition.