Artificial intelligence-derived software to analyse chest X-rays for suspected lung cancer in primary care referrals: early value assessment
Closed for comments This consultation ended on at Request commenting lead permission
3 Committee discussion
The diagnostics advisory committee considered evidence on software with AI‑derived algorithms for detecting and measuring lung nodules and other abnormalities in chest X‑ray images from several sources, including an external assessment report and an overview of that report. Full details are in the project documents for this guidance.
Unmet need
There is an unmet need for fast chest X-ray reports
3.1 In primary care chest X‑ray referrals, there is an unmet need for quicker reporting of chest X‑rays. Sometimes, it takes a long time for chest X‑ray reports to be returned to a GP, which can have an impact on the time to CT scan, diagnosis and treatment of lung cancer. Factors leading to this delay in chest X‑ray reporting include a backlog of chest X‑rays that need review and not enough radiologist and reporting radiographer capacity. A clinical expert explained that in some areas lung cancer specialists would not accept a referral for a CT scan unless the chest X‑ray is done and reported. Software that provides triaging of images could prioritise images with abnormal features that suggest lung cancer for urgent review, which could result in faster referral to CT scan, diagnosis and treatment if needed.
Anxiety while waiting for final diagnosis
3.2 A patient expert said that people who are told they have a lung nodule or other abnormality may experience anxiety, especially if the chest X‑ray was done without any expectation of finding nodules or abnormalities suggesting cancer. Having more information as soon as possible is important and reduces anxiety for people with suspected lung cancer and their families. The patient expert explained that most people are happy to have all tests done quickly, because immediate results would turn a passive period of waiting into something more proactive and less uncertain. According to the patient expert, people would be happy to have AI‑derived software used as part of the diagnostic pathway as long as there was evidence to show that it is accurate.
Software capability
Different software have different capability
3.3 AI-derived software have different capabilities. Some technologies only provide CADe/CADx, that can detect or diagnose an abnormality on the X‑ray, whereas others provide both CADe/CADx and CAST (see section 2.4). The committee considered that software that can prioritise image review with abnormalities suggesting lung cancer might have the greatest benefit, because this could reduce time to CT scan, diagnosis and treatment (see section 3.1). It heard that AI-derived software for chest X‑rays are not able to compare a previous chest X‑ray with a new one. Comparison with previous chest X‑rays would be helpful because it could help determine whether a lung abnormality was of any clinical concern.
Clinical effectiveness
Technologies with no published evidence
3.4 The committee considered the summarised evidence. The external assessment group's (EAG's) review found no relevant published evidence for the population referred from primary care. The committee acknowledged the EAG's expanded review criteria, that included studies in populations where the referral criteria were unclear. It noted that there was no evidence for 11 of the 14 technologies: Auto Lung Nodule Detection, ChestLink Quality, ChestView, Chest X‑ray, ClearRead X‑ray Detect, InferRead DR Chest, Lunit INSIGHT CXR, Milvue Suite, qXR, SenseCare-Chest DR PRO and VUNO Med-Chest X‑Ray. The committee recommended more research on these technologies (see section 1.3).
Generalisability of evidence to clinical practice
3.5 There were no studies that looked at accuracy in the population of interest, that is, those referred from primary care. There were 6 studies that compared the accuracy of X‑ray review by a radiology specialist alongside AI-derived software with review by a radiology specialist alone, but it was unclear where the population had been referred from. The primary care referral population is likely to be different to populations from other settings, such as inpatients and people presenting to the emergency department. These populations could have a higher prevalence of disease and present at a more advanced stage, making it easier to detect abnormalities that indicate cancer. In most primary care referral cases, chest X‑rays requested by GPs are done to rule out lung cancer because it is unlikely that the person has lung cancer. So, AI-derived software trained and evaluated in populations from other settings or mixed settings are likely to perform differently in people referred from primary care. The committee agreed that the accuracy data was unlikely to be generalisable to a primary care population. The committee also acknowledged that only 1 UK study was identified and that diagnostic pathways in different countries may be different. The UK study assessed the diagnostic accuracy of red dot (Behold.ai), but the population was unclear.
Accuracy for detecting lung cancer and nodules
3.6 Only 1 study, assessing red dot (Behold.ai) in the UK from populations with unclear referral, reported accuracy to detect lung cancer. This study found AI‑derived software to have a statistically significantly higher sensitivity than X‑ray review without the software. The study found AI‑derived software had lower specificity but the difference was not statistically significant. The other 5 studies that reported accuracy to detect nodules did not report any statistically significant differences in accuracy. In practice, if specificity is lower, this would mean more false positive results. That is, more people who do not have lung cancer would go on to have CT scans, which are associated with anxiety and costs to the NHS. If sensitivity to detect lung cancer is higher, this would mean more true positive lung cancer cases would be detected and referred for CT scans, which could lead to treatment at an earlier disease stage and improved outcomes. In contrast, if sensitivity to detect lung cancer is lower, this could result in missing the opportunity to detect cancer early. Cancer would be identified at a later stage when the disease is more advanced, which may be associated with worse outcomes and more costly treatment. No studies reported technical failure rates. Because none of the studies looked at more than 1 software, a direct comparison between different software was not possible. Clinical experts highlighted the need for a benchmark test that could be used to compare the different software and identify those that reach a specified level of accuracy. The committee concluded that further research was needed on how using AI‑derived software alongside clinician review of chest X‑rays affects the accuracy of detecting lung cancer. Research is also needed on the technical failure rates of the software.
Time to read and report a chest X-ray
3.7 The EAG's review included 2 studies that looked at time to read and report a chest X‑ray. The comparisons in the studies were done at least partly in laboratory-like conditions, rather than in routine clinical practice. So, results may not be generalisable to current practice in the NHS. No studies suggested that read and report was faster with AI‑derived software than without. Because AI-derived software would be used in addition to review by a radiologist or reporting radiographer, it may not reduce time to read and report chest X‑rays. The time may also depend on how well the software integrates into the radiologists' workflow within the picture archiving and communication system (PACS). AI‑derived software may support less experienced or trainee radiologists and reporting radiographers. This is because the software may identify nodules or abnormalities that the less experienced reader may not pick up, and this could help to improve their skills initially. The committee was uncertain about whether using AI‑derived software would speed up reading and reporting a chest X‑ray in a UK clinical setting. Further evidence should be collected on the impact of AI‑derived software on time to read and report chest X‑rays and if this is different between readers with different levels of experience.
Triaging of images
3.8 There were no studies that reported time to CT referral or time to diagnosis. Some software can triage chest X‑rays, identifying images that have features that suggest lung cancer, which can be prioritised for review, and identify other images that are likely to be normal and could potentially be reviewed faster. This could help improve workflow by focusing radiology resources on the more urgent chest X‑rays, and potentially speeding up time to CT referral, diagnosis and treatment when needed. It could also enable a same day CT scan pathway in the population referred from primary care. The committee considered studies highlighted by behold.ai that reported on the accuracy of red dot software to identify high confidence normal X‑rays and X‑rays containing features suspicious for lung cancer. The EAG noted that these studies contain a mixed population rather than a population referred from primary care only. The committee concluded that some software could meet the unmet need for faster chest X‑ray reports, leading to faster CT referral, diagnosis and treatment if needed. It further concluded that more research is needed on how using AI‑derived software to triage chest X‑rays from people referred from primary care affects time to CT referral and time to diagnosis.
Populations that could particularly benefit from the technologies
3.9 The committee considered groups of people that could particularly benefit from the software. It recognised that detecting lung nodules and other abnormalities can be difficult in people with underlying lung conditions such as asthma, chronic obstructive pulmonary disease (COPD), people whose family background means they may be at a higher risk of having lung cancer, and younger women who do not smoke. If using the software helped to improve lung cancer detection, it would be particularly beneficial to these groups.
Cost effectiveness
Conceptual model structure
3.10 The EAG developed a conceptual decision analytic model to inform potential future full cost-effectiveness evaluation of AI‑derived software to analyse chest X‑rays to identify suspected lung cancer. A model structure was developed based on the chest X‑ray clinical pathway. The committee agreed that the conceptual model was a good basic framework. But it highlighted some issues that would need clarifying, for example, whether people with lung cancer and no abnormalities identified on a chest X‑ray would be picked up at a later point, such as presentation at an emergency department. Triaging and prioritising images with abnormal features that suggest lung cancer, and the impact of this on time to CT scan, diagnosis and treatment, would need to be captured in a future model. The committee agreed that a linked-evidence approach to economic modelling would be acceptable. That is, using diagnostic accuracy and time to diagnosis data linked to long-term outcome data from separate studies.
Software costs
3.11 The EAG considered the costs of introducing AI‑derived software alongside radiology specialist review of chest X‑rays by developing a simple budget impact analysis. The budget impact analysis considered one-off set up costs, annual subscription fee based on a volume of 16,945 images, total cost per year and the cost over the first 5 years. Because the literature reviews did not provide any evidence to show changes in resource use because of AI‑derived software, these were not considered in the budget impact calculations. Test costs varied between companies, but the EAG cautioned against direct comparison, because the AI‑derived software presented have varying capabilities and some may be used in different positions early in the diagnostic pathway.
Conclusions
Potential benefits and risks
3.12 The potential benefits associated with using AI-derived software could include:
A reduction in the time radiologist or diagnostic radiographers spend reviewing and reporting chest X‑rays, which could release staff resources and reduce the time to return a chest X‑ray report to a GP.
Improved workflow, resulting in a reduction of time from chest X‑ray to diagnosis and treatment, which could enable a same day CT pathway, improve patient outcomes and quality of life, and save resources in the NHS.
Higher sensitivity to detect cancerous nodules and other abnormalities that suggest cancer, which could result in more cancers being identified and treated at an earlier stage, improving patient outcomes and quality of life and saving resources in the NHS.
The potential risks associated with using AI-derived software could include:
The cost of the AI‑derived software, which includes a one-off set-up cost, may not be offset by cost and resource savings later in the pathway.
Lower specificity to detect cancerous nodules and other abnormalities that suggest cancer could result in more people without cancer having CT scans, which would have cost and disutility implications.
Lower sensitivity to detect cancerous nodules and other abnormalities that suggest cancer could result in lung cancer being missed and identified at a more advanced disease stage, which could lead to more costly treatment and worse patient outcomes.
Software may not reduce, or may increase, the time radiologists or diagnostic radiographers spend reviewing and reporting chest X‑rays, and so the workload and the time to return a chest X‑ray report to a GP would not be reduced.
The clinical effectiveness is unknown
3.13 The committee recalled that there was no evidence on the accuracy of AI‑derived software to detect suspected lung cancer in a population referred from primary care and no evidence on the technical failure rates. It noted that the evidence summarised came from populations with unclear referral criteria and the results may not be generalisable to a primary care population. The committee concluded that the accuracy of AI‑derived software to identify suspected lung cancer on chest X‑rays from people referred from primary care is uncertain. Therefore, the committee could not determine whether AI‑derived software is clinically effective, and recommended further evidence is generated on diagnostic accuracy and technical failure rates.
The cost impact is unknown
3.14 The diagnostic accuracy data on AI‑derived software to detect suspected lung cancer in a population referred from primary care is uncertain. So, it is also unknown how these technologies would impact the number of people referred for chest CT scans and the number of lung cancer cases that would be identified. The committee was therefore unable to understand whether the benefits would outweigh the risks and was concerned that AI‑derived software may not be cost effective. It recommended further research on the impact of the software on clinical decision making, the number of people referred for chest CT scans, and how AI‑derived software impacts healthcare costs and resource use.
Addressing unmet need
3.15 The unmet need was around the speed of chest X‑ray reports being returned to the referring GP, particularly for X‑rays that show abnormalities that suggest lung cancer because a delay can impact on time to diagnosis, treatment and patient outcomes. There was no evidence on the impact of AI‑derived software on the time to CT scan and time to diagnosis from a population referred from primary care. So, the committee was uncertain whether any of the AI‑derived software could meet the unmet need, but it noted that some of the software probably could reduce delays. The committee recommended further research on the impact of AI‑derived software on review and reporting time and time to CT referral and diagnosis.
Use only in research
3.16 The committee recalled that the clinical effectiveness and cost impact of AI‑derived software could not be determined, and it was uncertain whether the software would address the unmet need. Also, the committee could not determine whether the benefits would outweigh the risks if the software was adopted for use in the NHS alongside evidence generation. It was concerned that AI‑derived software for chest X‑rays could be cost incurring and may not improve clinical outcomes, and should not currently be used to inform clinical care in the NHS. So, the committee decided that AI‑derived software should only be used in research to resolve some of the uncertainty and allow re-assessment in the future when further data is available. The committee was aware that some centres are already using AI‑derived software and concluded that these centres may continue, but only under appropriate research governance. AI‑derived software should only be used alongside clinician review.
How are you taking part in this consultation?
You will not be able to change how you comment later.
You must be signed in to answer questions