Performance of artificial intelligence in 7533 consecutive prevalent screening mammograms from the BreastScreen Australia program

John Waugh, Jill Evans, Miranda Miocevic, Darren Lockie, Parisa Aminzadeh, Anne Lynch, Robin J. Bell

Research output: Contribution to journalArticleResearchpeer-review

2 Citations (Scopus)

Abstract

Objectives: To assess the performance of an artificial intelligence (AI) algorithm in the Australian mammography screening program which routinely uses two independent readers with arbitration of discordant results. Methods: A total of 7533 prevalent round mammograms from 2017 were available for analysis. The AI program classified mammograms into deciles on the basis of breast cancer (BC) risk. BC diagnoses, including invasive BC (IBC) and ductal carcinoma in situ (DCIS), included those from the prevalent round, interval cancers, and cancers identified in the subsequent screening round two years later. Performance was assessed by sensitivity, specificity, positive and negative predictive values, and the proportion of women recalled by the radiologists and identified as higher risk by AI. Results: Radiologists identified 54 women with IBC and 13 with DCIS with a recall rate of 9.7%. In contrast, 51 of 54 of the IBCs and 12/13 cases of DCIS were within the higher AI score group (score 10), a recall equivalent of 10.6% (a difference of 0.9% (CI −0.03 to 1.89%, p = 0.06). When IBCs were identified in the 2017 round, interval cancers classified as false negatives or with minimal signs in 2017, and cancers from the 2019 round were combined, the radiologists identified 54/67 and 59/67 were in the highest risk AI category (sensitivity 80.6% and 88.06 % respectively, a difference that was not different statistically). Conclusions: As the performance of AI was comparable to that of expert radiologists, future AI roles in screening could include replacing one reader and supporting arbitration, reducing workload and false positive results. Clinical relevance statement: AI analysis of consecutive prevalent screening mammograms from the Australian BreastScreen program demonstrated the algorithm’s ability to match the cancer detection of experienced radiologists, additionally identifying five interval cancers (false negatives), and the majority of the false positive recalls. Key Points: • The AI program was almost as sensitive as the radiologists in terms of identifying prevalent lesions (51/54 for invasive breast cancer, 63/67 when including ductal carcinoma in situ). • If selected interval cancers and cancers identified in the subsequent screening round were included, the AI program identified more cancers than the radiologists (59/67 compared with 54/67, sensitivity 88.06 % and 80.6% respectively p = 0.24). • The high negative predictive value of a score of 1–9 would indicate a role for AI as a triage tool to reduce the recall rate (specifically false positives).

Original languageEnglish
Pages (from-to)3947–3957
Number of pages11
JournalEuropean Radiology
Volume34
DOIs
Publication statusPublished - 2024

Keywords

  • Artificial intelligence
  • Breast cancer
  • Cancer screening
  • Mammography

Cite this