Artificial intelligence (AI) could have the potential to accurately classify mammograms according to the presence or absence of radiological signs of breast cancer, replacing or supplementing human readers (radiologists). The UK National Screening Committee's assessments of the use of AI systems to examine screening mammograms continues to focus on maximising benefits and minimising harms to women screened, when deciding whether to recommend the implementation of AI into the Breast Screening Programme in the UK. Maintaining or improving programme specificity is important to minimise anxiety from false positive results. When considering cancer detection, AI test sensitivity alone is not sufficiently informative, and additional information on the spectrum of disease detected and interval cancers is crucial to better understand the benefits and harms of screening. Although large retrospective studies might provide useful evidence by directly comparing test accuracy and spectrum of disease detected between different AI systems and by population subgroup, most retrospective studies are biased due to differential verification (ie, the use of different reference standards to verify the target condition among study participants). Enriched, multiple-reader, multiple-case, test set laboratory studies are also biased due to the laboratory effect (ie, radiologists' performance in retrospective, laboratory, observer studies is substantially different to their performance in a clinical environment). Therefore, assessment of the effect of incorporating any AI system into the breast screening pathway in prospective studies is required as it will provide key evidence for the effect of the interaction of medical staff with AI, and the impact on women's outcomes.