3 Evidence

3.1 Assessment of the evidence

3.1.1 A comprehensive evidence base is fundamental to the evaluation process. Evidence of various types and from multiple sources may inform the evaluation. To ensure that the guidance issued by NICE is appropriate and robust, the evidence and analysis, and their interpretation, must be of the highest standard possible and transparent.

3.1.2 Evaluating effectiveness needs quantification of the effect of the technology under evaluation and of the relevant comparators on appropriate outcome measures.

3.1.3 For costs, evidence should quantify the effect of the technology on resource use in terms of physical units (for example, days in hospital or visits to a GP). These effects should be valued in monetary terms using appropriate prices and unit costs.

3.1.4 In addition to evidence on the technology's effects and costs, health technology evaluation should consider a range of other relevant issues. For example:

the impact of having a condition or disease, the experience of having specific treatments or diagnostic tests for that condition, the experience of the healthcare system for that condition
organisational issues that affect patients, carers or healthcare providers
NICE's legal obligations on equality and human rights
the requirement to treat people fairly.

3.2 Guiding principles for evidence

3.2.1 The evidence considered by the committee should be:

Relevant to the evaluation in terms of patient groups, comparators, perspective, outcomes and resource use as defined in the scope. It should include transparent reporting of data, study design, analysis, and results.
Clear in the rationale for the selection of outcomes, resource use and costs.
Assembled systematically and synthesised in a transparent way that allows the analysis to be reproduced.
Analysed in a way that is methodologically sound and, in particular, minimises any bias.

NICE has defined a 'reference case' that specifies the methods it considers to be most appropriate for estimating clinical effectiveness and value for money. This is to ensure that the evidence base for evaluations is consistent with these principles.

3.2.2 There are always likely to be limitations in the evidence available to inform an evaluation. There may be questions about internal validity of the evidence because of data quality or methodological concerns. Or there may be questions about the external validity because of, for example, the population and settings. It is essential that limitations in the evidence are fully described and the impact on bias and uncertainty fully characterised and ideally quantified. Committees will reach judgements about the acceptability of all the evidence according to the evaluation context (including, for example, the type of technology, evaluation or population).

3.3 Types of evidence

3.3.1 NICE considers all types of evidence in its evaluations. This includes evidence from published and unpublished data, data from non-UK sources, databases of ongoing clinical trials, end-to-end studies, conference proceedings, and data from registries, real-world evidence and other observational sources.

3.3.2 The preferred source of evidence depends on the specific use being considered. For relative treatment effects there is a strong preference for high-quality randomised controlled trials (RCTs). Non-randomised studies may complement RCTs when evidence is limited or form the primary source of evidence when there is no RCT evidence. For diagnostic technologies, there is a preference for end-to-end studies. When there is insufficient evidence from these studies, a linked evidence approach should be taken. For clinical outcomes such as natural history, treatment patterns or patient experiences, real-word evidence may be preferred.

3.3.3 The need to search beyond RCTs for treatment effects should be informed by the residual uncertainties, the likelihood of this uncertainty being resolved through non-randomised evidence, and the practicalities of the evidence search. The search could be done in an iterative, hierarchical way, searching first for more robust forms of non-randomised evidence before searching for less reliable study designs.

3.3.4 Whatever the sources of evidence available on a particular technology and patient group, a systematic review of the relevant evidence relating to a technology should be done using a pre-defined protocol. This protocol should allow evidence to be included from all sources likely to inform the decision about using the technologies by the NHS. A systematic review attempts to assemble all the available relevant evidence using explicit, valid and replicable methods in a way that minimises the risk of biased selection of studies. The data from the included studies can be synthesised, but this is not essential. All evidence should be critically appraised, and potential biases must be identified (see section 6.2).

Randomised controlled trials

3.3.5 RCTs minimise potential external influences to identify the effect of 1 or more interventions on outcomes. Randomisation ensures that any differences in baseline characteristics between people assigned to different interventions at the start of the trial are because of chance, including unmeasured characteristics. Blinding (when applied) prevents knowledge of treatment allocation from influencing behaviours, and standardised protocols ensure consistent data collection. The trial should, in principle, provide a minimally biased estimate of the size of any benefits or risks associated with the technology relative to those associated with the comparator. RCTs are therefore considered to be most appropriate for measures of relative treatment effect.

3.3.6 The relevance of RCT evidence to the evaluation depends on both the internal and external validity of each trial. Internal validity is assessed according to the design, analysis and conduct of a trial. It includes blinding (when appropriate; this is often not possible when trials use specific medical devices or diagnostics), the method of randomisation and concealment of allocation, and the completeness of follow up. Other important considerations are the size and power of the trial, the selection and measurement of outcomes and analysis by intention to treat. External validity is assessed according to the generalisability of the trial evidence, that is, whether the results apply to wider patient groups and to routine clinical practice.

3.3.7 When basket trials are used, they should be appropriately designed and analysed, include assessment of heterogeneity and allow borrowing between baskets. They should include relevant comparators, use a random allocation of treatments, use appropriate clinical endpoints (including a validated relationship with the overall survival and quality of life of the patients) and enrol all patient groups relevant to the indication.

3.3.8 High-quality RCTs directly comparing the technology being evaluated with relevant comparators provide the most valid evidence of relative efficacy. However, there are some key limitations of RCTs:

For some indications or technologies, RCTs may not provide enough evidence to quantify the effect of treatment over the course of the condition.
In some circumstances, or for particular conditions, RCTs may be unethical or not feasible.
For some evaluations the results may not be generalisable to the population of interest, either because of the relevance of comparator or the relevance of the population, setting and treatment pathway in which it was used.
For some medical devices there may be learning effects or behaviours associated with their use that may not be captured using an RCT.
Some technologies may also be better suited to alternative study designs (for example, histology-independent cancer treatments may be suited to being studied in basket trials including a heterogeneous population of patients).

When an RCT is not available or appropriate, justification should be provided for the source and methods used to generate evidence on the relative effects. Any potential bias arising from the design of the studies used in the evaluation should be explored and documented in a formal, transparent and pre-specified manner.

Non-randomised studies

3.3.9 Non-randomised studies can be interventional (but without randomisation) or observational. They include observational database studies with concurrent control, and single-arms trials using external control. Non-randomised studies tend to be at high risk of bias because the factors influencing treatment assignment may be predictive of the outcomes (that is, confounding). Other forms of bias may arise because of limitations in data quality, detection bias, or patient entry into or exit from studies (that is, selection bias). Inferences about relative effects drawn from studies without randomisation will often be more uncertain than those from RCTs. Technical support document 17 provides guidance on methods for addressing for confounding using individual patient level data from observational studies.

3.3.10 The potential biases of observational studies should be identified and quantified and adjusted for when possible. Choice of data, study design and analysis should be selected to minimise the risk of bias. Bias should be evaluated using validated tools specific to the study design and use case. It should be recognised that no single tool covers all relevant domains of bias. Stakeholders should take comprehensive approaches to assessing study quality and should note limitations of tools used when relevant.

3.3.11 Evidence from non-randomised studies may be beneficial in supplementing and supporting RCT data, or substituting for RCT data if there is none. Non-randomised data may also be used to contextualise results from RCTs by, for instance, understanding differences in patient populations, treatment patterns, or outcomes. For example, non-randomised evidence may be used to:

assess the generalisability of results from RCTs
show effectiveness of interventions over longer time horizons
describe the characteristics of real-world populations of interest
understand differences in treatment patterns or outcomes
provide information on the natural history of the condition to supplement trials
provide evidence on real-world safety and adverse events
provide estimates of resource use for populating economic models
provide information about the experience of people having treatments or using a medical device, diagnostic or digital technology.

3.3.12 Non-randomised studies are usually at higher risk of bias than RCTs because of confounding (that is, systematic differences between treatment groups, and association of those differences with the outcome of interest), selection bias, or informational biases from limitations of the data or differential data collection. It is therefore essential to assess the risk of bias in each study using a validated tool (for example, ROBINS‑I). Use of some tools may require sufficient knowledge and experience for application. Alternative tools are available for less experienced authors but justification for their use and any limitations should be presented.

3.3.13 An assessment of the quality of the data should consider completeness, validity, consistency, and accuracy which can be done using an appropriate checklist. As with RCT evidence, it is also important to consider the external validity of the evidence. When possible, more than 1 independent source of such evidence should be examined to gain some insight into the validity of any conclusions. The following principles should guide the generation of the highest quality evidence from non-randomised studies and when using real-world data:

1. Evidence should be developed in a fully transparent and reproducible way from study planning through study conduct to the reporting of results.
2. Data sources should be identified through systematic, transparent and reproducible approaches. The origin of any data source should be shown, and its quality and relevance in relation to the intended applications shown.
3. Data should be analysed using appropriate analytical strategies. Bias and uncertainty should be fully characterised and ideally quantified. Extensive sensitivity analyses should be done, covering all key risks of bias.

3.3.14 Additional guidance on the design, conduct and reporting of non-randomised and real-world studies is provided on the NICE website (see the preliminary version of the NICE real-world evidence framework; a link to the final version will be added when available).

3.3.15 Study quality can vary, and so systematic review methods, critical appraisal and sensitivity analyses are as important for review of this data as they are for reviews of data from RCTs.

Diagnostic accuracy studies

3.3.16 Diagnostic test accuracy studies compare test results of people with a disease or condition to those of people without it. Designs are generally prospective cohort or cross-sectional studies, or retrospective case-control studies. Most compare a single index test of interest with a reference standard to calculate the accuracy. Paired design studies compare 2 index tests with each other, and often also with a reference standard. These studies are less prone to bias resulting from confounding.

Impact of technology on clinical pathway

3.3.17 Devices or diagnostics may affect outcomes because of their effect on the clinical pathway. For example, the technology may produce results more quickly, reducing the need for the patient to attend extra appointments or reducing the time to treatment. These outcomes can be included in the evaluation but are sometimes associated with uncertainty. As such, clinical expert opinion or expert elicitation is likely to be important.

Qualitative research

3.3.18 Qualitative research can explore areas such as values, preferences, acceptability, feasibility and equity implications. Many elements of the decision problem can be informed by qualitative evidence. When this evidence is submitted it can be particularly useful to assess aspects including, but not limited to:

patients' experience and quality of life as a result of having a disease or condition
patients' experience and quality of life as a result of having a treatment or test
any subgroups of patients who may need special consideration in relation to the technology
patients' view on the acceptability of different types of treatment, device or test
views of carers
views of people with experience using the device or a comparator device
views of treating clinicians
views on the feasibility of guidance implementation.

3.3.19 Qualitative data may be collected ad hoc or opportunistically, through formal qualitative research studies or from a systematic review of relevant qualitative research.

3.3.20 When qualitative evidence is extensive and is appropriate to inform decision making, recognised methods of analysing, synthesising, and presenting qualitative evidence is preferred. For example, rapid review, framework synthesis, narrative summary and synthesis, meta-synthesis and thematic synthesis.

Expert elicitation or expert opinion

3.3.21 In the absence of empirical evidence from RCTs, non-randomised studies, or registries, or when considered appropriate by the committee taking into account all other available evidence, expert elicitation can be used to provide evidence. Expert elicitation may use either structured or unstructured methods. Evidence generated by expert elicitation, either using structured or unstructured methods, is subject to risk of bias and high uncertainty. Structured methods are preferred because they attempt to minimise biases and provide some indication of the uncertainty. Structured approaches should adhere to existing protocols (such as the Medical Research Council protocol). They typically involve assessing probability distributions, usually after training the responders about the various types of common cognitive biases.

3.3.22 Clinical experts and patient experts can also provide opinions (both quantitative and qualitative). This is different to the methods applied for expert elicitation. This could be used to supplement, support, or refute any observed data from RCTs or non-randomised studies (including drug usage evaluations, cross-sectional studies or case studies). Expert opinion may include any information relevant to the evaluation, including the technology, the comparators and the conditions for which the technology is used. For devices or diagnostics, such information can relate to the technical characteristics, such as their design, if this might affect its capability in delivering the intended benefits; or the training and experience needed to use the technology; or organisational factors that might influence the technology's technical performance or use in clinical practice.

3.3.23 Clear reporting of the methods used for expert elicitation or expert opinion (quantitative) is needed from study planning to conduct. This includes the identification and selection of experts, and the reporting of results including the consensus of opinions or data aggregation. This should follow existing reporting guidelines when possible.

Care management

3.3.24 Clinical guidelines from NICE and other organisations can provide a good source of evidence for care management and the care pathway. When this is not clear or not available, expert clinical input of the usual care pathway can be used. Diagnostic before-and-after studies also provide useful information on any change in management after the introduction of an index test to clinical practice. However, these studies are often not available, especially when assessing a new test that is not in routine clinical use. As such, expert clinical input on the usual care pathway is likely to be important.

Unpublished and part-published evidence

3.3.25 To ensure that the evaluation does not miss important relevant evidence, it is important that attempts are made to identify evidence that is not in the public domain. Such evidence includes unpublished clinical trial data and clinical trial data that are in abstract form only or are incomplete, and post-marketing surveillance data. However, this evidence should still consider the key principles of design, analysis and reporting. Such information must be critically appraised, transparently reported and adjusted for bias. When appropriate, sensitivity analysis should examine the effects of its incorporation or exclusion.

Economic evaluations

3.3.26 Economic evaluations may be based on new analyses. However, a review of published, relevant economic evaluations of interventions should also be done. Search for economic evaluations using transparent and reproducible approaches until sufficient appropriate and relevant evidence has been identified. Reviews may not be exhaustive if additional studies identified would merely provide further support that is consistent with the already-identified evidence (rather than necessarily identifying all relevant studies). Once identified, critically assess economic evaluations using a suitable tool and assess external validity related to the decision problem. Clearly state and rationalise if no relevant economic evaluations are found.

3.3.27 Existing economic evaluations can be used as an alternative to de novo modelling if the existing economic evaluations are adequate and appropriate.

3.4 Synthesis of evidence

3.4.1 The aim of clinical-effectiveness analysis is to get precise, relevant and unbiased estimates of the mean clinical effectiveness of the technologies being compared. Consider all relevant studies in the assessment of clinical effectiveness and base analyses on studies of the best available quality. Consider the range of typical patients, normal clinical circumstances, clinically relevant outcomes, comparison with relevant comparators, and measures of both relative and absolute effectiveness with appropriate measures of uncertainty. NICE prefers RCTs directly comparing the intervention with 1 or more relevant comparators and, if available, these should be presented in the reference-case analysis.

Systematic review

3.4.2 Identify and quantify all health effects, and clearly describe all data sources. Evidence on outcomes should come from a systematic review, defined as systematically locating, including, appraising and synthesising the evidence to give a reliable and valid overview of the data related to a clearly formulated question.

3.4.3 Search strategies for reviews of diagnostic test accuracy tend to be longer and more complex than search strategies to identify treatment effects. Filters should not be used to narrow the search to diagnostic studies because indexing of these types of studies is often poor.

Study selection and data extraction

3.4.4 Do a systematic review of relevant studies of the technology being evaluated according to a previously prepared protocol to minimise the potential for bias. This should include studies investigating relevant comparators.

3.4.5 Compile a list of possible studies once the search strategy has been developed and literature search completed. Each study must be assessed to determine if it meets the inclusion criteria of the review. Keep a log of ineligible studies, with the rationale for why studies were included or excluded. More than 1 reviewer should assess all records retrieved by the search strategy to increase the validity of the decision. Clearly report the procedure for resolving disagreements between reviewers.

Critical appraisal

3.4.6 The quality of a study's overall design, its execution, and the validity of its results determines its relevance to the decision problem. Critically appraise each study that meets the criteria for inclusion. Whenever possible, use the criteria for assessing published studies to assess the validity of unpublished and part-published studies.

Factors that affect the effectiveness

3.4.7 Many factors can affect the overall estimate of relative effectiveness from a systematic review. Some differences between studies happen by chance, others from differences in the patient characteristics (such as age, sex, severity of disease, choice and measurement of outcomes), care setting, additional routine care and the year of the study. Identify such potential effect modifiers before data analysis, either by a thorough review of the subject area or discussion with experts in the clinical discipline.

Pairwise meta-analysis

3.4.8 The combination of outcome data through meta-analysis is appropriate if there are enough relevant and valid data using outcome measures that are comparable.

3.4.9 Fully report the characteristics and possible limitations of the data (that is, population, intervention, setting, sample size and validity of the evidence) for each study included in the analysis and include a forest plot.

3.4.10 Accompany statistical pooling of study results with an assessment of heterogeneity (that is, any variability in addition to that accounted for by chance). This can, to some extent, be taken into account using a random (rather than fixed) effects model. However, the degree of heterogeneity and the reasons for this should be explored as fully as possible. Known clinical heterogeneity (for example, because of patient characteristics) may be explored by using subgroup analyses and meta-regression. When there is doubt about the relevance of a particular study, a sensitivity analysis should exclude that study. If the risk of an event differs substantially between the control groups of the studies in a meta-analysis, assess whether the measure of relative effectiveness is constant over different baseline risks. This is especially important when the measure of relative effectiveness will be used in an economic model and the baseline rate of events in the comparator arm of the model is very different to the corresponding rates in the meta-analysis studies.

Indirect comparisons and network meta-analyses

3.4.11 When technologies are being compared that have not been evaluated within a single RCT, data from a series of pairwise head-to-head RCTs should be presented together with a network meta-analysis if appropriate. Fully describe the network meta-analysis and present it as an additional analysis. The committee will consider the additional uncertainty associated with the lack of direct evidence when considering relative-effectiveness estimates derived from indirect sources only. NICE prefers the methods for network meta-analysis set out in the technical support document evidence synthesis series.

3.4.12 The term 'network meta-analysis' includes adjusted indirect comparisons, but also refers to more complex evidence analysis such as mixed treatment comparisons. An 'adjusted indirect comparison' refers to data synthesis from trials in which the technologies of interest have not been compared directly with each other but have been compared indirectly using a common comparator. Mixed treatment comparisons include both head-to-head trials of technologies of interest (both interventions and comparators) and trials that include 1 of the technologies of interest.

3.4.13 Ideally, the network meta-analysis should contain all technologies that have been identified either as an intervention or as appropriate comparators in the scope. Therefore, trials that compare at least 2 of the relevant (intervention or comparator) technologies should be incorporated, even if the trial includes comparators that are not relevant to the decision problem. Follow the principles of good practice for doing systematic reviews and meta-analyses when doing mixed and indirect treatment comparisons. In brief, a clear description of the synthesis methods and the rationale for how RCTs are identified, selected and excluded is needed. Document the methods and results of the individual trials included in the network meta-analysis and a table of baseline characteristics for each trial. If there is doubt about the relevance of a particular trial or set of trials, present sensitivity analysis in which these trials are excluded (or included if the trials are not in the base-case analysis).

3.4.14 In networks consisting of a small number of trials, indirect comparisons are highly vulnerable to systematic bias. Population adjustment methods in connected networks can be considered when effect modifiers between trials may be imbalanced. Population adjustment methods need individual patient data to be available from at least 1 trial in the comparison or network. Recognise the limitations of using these methods and, if possible, the likely size of any systematic bias reported (see technical support document 18).

3.4.15 Report the heterogeneity between results of pairwise comparisons and inconsistencies between the direct and indirect evidence on the technologies. If inconsistency within a network meta-analysis is found, then attempt to explain and resolve these inconsistencies.

3.4.16 Use external information to help estimate the between-study heterogeneity to improve the precision of the estimates. In networks with few included studies, it may be preferable to use informative prior distributions for the between-study heterogeneity parameter.

3.4.17 Distributions tailored to particular outcomes and disease areas are recommended.

3.4.18 Note the source of the prior distribution for the between-study heterogeneity and provide justification for its use. Present a sensitivity analysis assessing the impact of using different candidate prior distributions.

3.4.19 Informative prior distributions for relative effectiveness are not recommended unless under very specific circumstances (for example, very sparse adverse event data) and need additional justification.

3.4.20 In all cases when evidence is combined using adjusted indirect comparisons or network meta-analysis frameworks, trial randomisation must be preserved. It is not acceptable to compare results from single treatment arms from different randomised trials. If this type of comparison is presented, the data will be treated as observational in nature and associated with increased uncertainty. Present evidence from a network meta-analysis in both tables and graphical formats such as forest plots. Clearly identify the direct and indirect components of the network meta-analysis and state the number of trials in each comparison. Present results from pairwise meta-analyses using the direct comparisons alongside those based on the full network meta-analysis.

3.4.21 Bias adjustments should be considered if there are concerns about methodological quality or size of included studies in a network meta-analysis (see technical support document 3). When there is not enough relevant and valid data for including in pairwise or network meta-analyses, the analysis may have to be restricted to a narrative overview that critically appraises individual studies and presents their results. In these circumstances, the committee will be particularly cautious when reviewing the results and drawing conclusions about the relative clinical effectiveness of the options.

Evidence synthesis challenges

3.4.22 Evidence synthesis methods should be appropriate to the evaluation context. The underlying assumptions, purpose and strengths and limitations of the chosen method should be described and justified.

3.4.23 Meta-analysis of test accuracy data can be complicated because of the correlation between sensitivity and specificity. In addition, there are likely to be many sources of heterogeneity across test results, arising from differences in setting, patient population, reference standard, equipment, procedures and skill levels of test operators. The cut-off point at which test accuracy data is reported may also differ between studies. Several methods for meta-analysis of test accuracy data exist. They vary in complexity and in the assumptions that need to be made. The appropriate choice of method depends on the data available and should be justified.

How are you taking part in this consultation?

NICE health technology evaluations: the manual