Tools and resources

3 Approach to evidence generation

The Getting it Right First Time (GIRFT) review of stroke services noted that AI software is already widely used across stroke centres in the NHS, with estimates showing that 96% have access to it. The remaining centres are expected to be using it by the end of 2023, and it is possible that new centres implementing the software may choose to complete local studies or audits.

3.1 Evidence gaps and ongoing studies

The ongoing Health Innovation Oxford & Thames Valley study of e-Stroke, due to report in March 2024, uses data from the Sentinel Stroke National Audit Programme (SSNAP). This compares the time periods before and after AI software was introduced, and may address evidence gaps relating to the impact of AI on time to treatment, and numbers of people having thrombolysis or thrombectomy.

Several additional studies have been identified that may partially address the impact of the technology on treatment decision making and time to treatment. But these studies are limited by small sample sizes and only including known stroke cases.

TableĀ 1 Evidence gaps and ongoing or newly identified studies
Evidence gap e-Stroke RapidAI Viz

The impact of AI-derived software on a healthcare professional's ability to identify people for whom thrombolysis and thrombectomy is suitable

Limited evidence

Ongoing studies

Limited evidence

No evidence

The impact of the software on how many people have thrombolysis or thrombectomy

Limited evidence

Ongoing study

No evidence

No evidence

The impact of the software on time to thrombolysis or thrombectomy

Limited evidence

Ongoing studies

Limited evidence

Suitable evidence

How often the software is unable to analyse CT brain scans, with reasons for this

No evidence

Newly identified study

Limited evidence

Limited evidence

3.2 Data sources

There are several data collections that have different strengths and weaknesses that could support evidence generation. NICE's real-world evidence framework provides detailed guidance on assessing the suitability of a real-world data source to answer a specific research question.

SSNAP is a comprehensive dataset covering all NHS stroke centres in England. It records data on inpatient care, outcomes and interventions for up to 6 months after a person has had a stroke. The Core Dataset collects patient-level data including the date, time, and modality of first imaging, whether AI supported interpretation of these images, what treatment was given and when. It is feasible that this could be used to address some evidence gaps. But this dataset does not currently report which AI technology was used, or whether the software was unable to analyse an image. It may be possible to modify the Core Dataset to record this additional information on a per-image basis, but this could take up to 2 years. Alternatively, these modifications could more quickly be included in the Comprehensive Dataset for specific sites of interest. They would not be mandatory so engagement would be needed with sites to promote and encourage completion.

The Diagnostic Imaging Dataset, which collects monthly extracts from local Radiology Information Systems, may provide useful stroke imaging data such as type of CT requested and time between request and reporting.

Linking routine data sources has been considered in previous studies and is viable but may be challenging.

The quality and coverage of real-world data collections are of key importance when used in generating evidence. Active monitoring and follow up through a central coordinating point is an effective and viable approach of ensuring good-quality data with broad coverage.

3.3 Evidence collection plan

Before evidence generation, a user survey across all NHS stroke centres in England is proposed to elicit information about their use of AI software in the stroke pathway. This should include which software they use, when it was implemented, which staff groups or healthcare professionals use it, and on which CT scans (or where in the care pathway) it is used. Some of this information may be available directly from NHS England. From this, sites can be identified for the proposed studies to best represent acute stroke care in the NHS, and its variation across centres, to address confounders and produce the most generalisable evidence possible.

The suggested approaches to addressing the evidence gaps for AI software in stroke are an experimental concordance study with existing imaging data, plus evaluation of SSNAP data. How these approaches will address each evidence gap is considered, and any strengths and weaknesses highlighted.

Concordance study

A concordance study is used to assess the agreement between 2 or more methods. The evidence gaps that this will address are:

  • The impact of the addition of AI-derived software on a healthcare professional's ability to identify people for whom thrombolysis and thrombectomy is suitable.

  • How often the software is unable to analyse CT brain scans, with reasons for this.

  • The impact of using the software on time to thrombolysis or thrombectomy (partially, since only speed of review can be assessed).

Each patient case would include clinical data available at the time of scanning, and may include unenhanced CT, CT angiography (CTA) and CT perfusion (CTP) images, for review alongside each other, as appropriate according to standard care.

This study will assess the concordance between the treatment decision reached for each included case of suspected stroke by:

  • Healthcare professionals assisted by AI software (intervention).

  • Healthcare professionals unassisted by AI software (comparator).

  • Consensus assessment by an expert panel, unassisted by AI software (reference standard).

Data should be selected carefully to be representative of existing sites, considering responses to the user survey. This is to make sure acute and comprehensive stroke centres, relevant staff roles and experience levels, and specific AI software are represented.

Representative image sets would be provided by stroke centres, anonymised, and processed by e-Stroke, RapidAI, Viz, and no AI software. They would then be presented to, and assessed retrospectively by, recruited healthcare professionals with appropriate qualifications and experience, recommending no treatment, thrombolysis, or thrombectomy. Cases that the software was unable to analyse would be recorded.

If every reader assessed every case, using every AI software and none, this would be a full factorial design. But a pragmatic approach to optimise healthcare resource usage is a 'split plot' design where each reader assesses a subset of cases, but each case is assessed using each AI software and none, enabling a comparison. The sample size (number of patient cases, and number of readers) should account for the following factors, and patient cases will be randomly allocated to readers to ensure each factor is fairly represented:

  • AI used: e-Stroke, RapidAI, Viz, none

  • staff role: radiologist and physician responsible for making treatment decisions

  • years of experience in staff role.

The factors above would be controlled for in the design of the study, and the influence of other, explanatory, variables could be considered in cases of discordance. These variables may account for particular groups of interest, such as people aged over 80, people with cerebrovascular disease, specifications of different scanners used, type of stroke centre, and types of CT imaging included in the case.

Comparison between AI-assisted (intervention), and unassisted (comparator) readings, and the expert panel (reference standard) would allow assessment of concordance for each AI software, used as intended. By including 2 staff groups (radiologists and physicians), concordance could be measured within each group, and between groups. Discordant cases could be further explored to identify common characteristics, and reasons for discordance could be suggested by the expert panel.

A secondary outcome of this study would be the time spent assessing images to reach a treatment decision in each case, with and without AI assistance. Although this timing activity lacks some real-world validity by not reflecting all steps in the care pathway, this element of the proposed study may highlight differences in read times between different staff groups, and different experience levels.

Other concordance study designs are possible, which could better represent real-world use of the software, but may have less internal validity or need greater data collection.

Evaluation of SSNAP data

The evidence gaps that this will address are:

  • The impact of using the software on time to thrombolysis or thrombectomy.

  • The impact of using the software on how many people have thrombolysis or thrombectomy.

  • How often the software is unable to analyse CT brain scans, with reasons for this.

The work supported by Health Innovation Oxford & Thames Valley is using SSNAP data to capture use of e-Stroke across 26 sites in England. Because of variation in how time to outcome data was collected before April 2021, use of a 'before-and-after' study design is limited. But because sites implemented AI software at different time points, time between CT scan and treatment decision could be monitored over time to determine whether this decreased as use of AI increased. Although the work of Health Innovation Oxford & Thames Valley is restricted to e-Stroke, its methodology could be applied to studies of other software, including RapidAI and Viz.

A random selection of cases could be extracted from SSNAP. Completion of the data field relating to the use of AI software could be used as a factor in subsequent analysis. Time since AI implementation and the AI software in use at each site (taken from the user survey) could also be treated as factors. Additional data fields may be needed to address how often the software is unable to analyse CT brain scans, and the reasons why, which could be added to the Comprehensive Dataset for specific sites of interest.

3.4 Data to be collected

The following data should be prioritised for collection within each of the studies described above.

Information to be collected by surveying all stroke centres using AI software

The following data items should be collected for each site:

  • AI software used, including version and date of implementation.

  • Description of the pathway followed by people suspected of having an acute stroke, including which CT images are taken and when, and which staff role reviews these.

  • Specification of CT scanner (or scanners) used.

Concordance study

  • Treatment decisions made using relevant clinical information and CT scans with and without AI support, and by the expert panel.

  • Whether or not the AI software was able to process each image.

  • Time spent assessing CT scans for each case, plus relevant clinical information, with and without AI support.

Evaluation of SSNAP data

The following data items should be extracted, or calculated from the SSNAP dataset, for periods before and after AI implementation:

  • time between CT scan and thrombolysis (if indicated)

  • time between CT scan and arterial puncture for thrombectomy (if indicated)

  • number and proportion of people having treatment with thrombolysis

  • number and proportion of people having treatment with thrombectomy

  • number and proportion of people referred to another site for thrombectomy

  • how often the software is unable to analyse CT brain scans, with reasons for this.

A subgroup analysis could consider the difference in time to treatment for those referred to another site for thrombectomy, or shorter or longer time after stroke symptom onset. For outcomes lacking comparable data from a 'before' phase, analysis could be repeated using quarterly extracts from SSNAP to determine whether increased use led to changes over time.

Data collection should follow a predefined protocol and quality assurance processes should be put in place to ensure the integrity and consistency of data collection. See NICE's real-world evidence framework, which provides guidance on the planning, conduct, and reporting of real-world evidence studies.

3.5 Evidence generation period

Because 99 of 107 stroke units in England are already using AI technologies, with the rest expected to implement AI software by the end of 2023, it is feasible that sufficient robust evidence could be generated within the next 3 years.


This page was last updated: