How are you taking part in this consultation?

You will not be able to change how you comment later.

You must be signed in to answer questions

    The content on this page is not current guidance and is only for the purposes of the consultation process.

    6 Reviewing evidence

    Reviewing evidence is an explicit, systematic and transparent process that can be applied to both quantitative (experimental and observational) and qualitative evidence (see the chapter on developing review questions and planning the evidence review). The key aim of any review is to provide a summary of the relevant evidence to ensure that the committee can make fully informed decisions about its recommendations. This chapter describes how evidence is reviewed in the development of guidelines.

    Evidence reviews for NICE guidelines summarise the evidence and its limitations so that the committee can interpret the evidence and make appropriate recommendations, even where there is uncertainty.

    Most of the evidence reviews for NICE guidelines will be presenting syntheses of evidence from systematic literature searches for primary research studies. Evidence identified during these literature searches and from other sources (see the chapter on identifying the evidence: literature searching and evidence submission) should be reviewed against the review protocol to identify the most appropriate information to answer the review questions. The evidence review process used to inform guidelines must be explicit and transparent, and involves 8 main steps:

    Any substantial deviations from these steps need to be agreed, in advance, with NICE staff with responsibility for quality assurance. Additional considerations for reviews using alternative methods not based primarily on literature reviews of primary studies (such as formal consensus methods, adapting recommendations from other guidelines or primary analyses of real-world data) are discussed in the section on presenting evidence for reviews other than reviews of primary studies.

    For all evidence reviews and data synthesis, it is important that the method used to report and evaluate the evidence is easy to follow. It should be written up in clear English and any analytical decisions should be clearly justified.

    Updating previous NICE reviews

    In many cases, the evidence reviews will be an update of a previous review done by NICE on the same or a similar topic, to include more recently published evidence. In these cases, a judgement should be made on what elements of the previous review can be reused, and which need to be redone, based on the level of similarity between the original and new review questions, protocols and methods. Examples of elements that can be considered for reuse include:

    • literature searches and literature search results

    • evidence tables for included studies

    • critical appraisal of included studies

    • data extraction and meta-analysis

    • previously identified information on equalities and health inequalities.

    6.1 Identifying and selecting relevant evidence

    The process of selecting relevant evidence is common to all evidence reviews based on systematic literature searches; the other steps are discussed in relation to the main types of review question. The same rigour should be applied to reviewing all data, whether fully or partially published studies or unpublished data supplied by stakeholders. Care should be taken to identify and remove multiple reports of the same study to prevent double-counting.

    Published studies

    Titles and abstracts of the retrieved citations should be screened against the inclusion criteria defined in the review protocol, and those that do not meet these should be excluded. A percentage should be screened independently by 2 reviewers (that is, titles and abstracts should be double-screened). The percentage of records to be double-screened for each review should be specified in the review protocol.

    If reviewers disagree about a study's relevance, this should be resolved by discussion or by recourse to a third reviewer. If, after discussion, there is still doubt about whether or not the study meets the inclusion criteria, it should be retained. If there are concerns about the level of disagreement between reviewers, the reasons should be explored, and a course of action agreed to ensure a rigorous selection process. A further proportion of studies should then be double-screened to validate this new process until appropriate agreement is achieved.

    Once the screening of titles and abstracts is complete, full versions of the selected studies should be obtained for assessment. As with title and abstract screening, a percentage of full studies should be checked independently by 2 reviewers, with any differences being resolved and additional studies being assessed by multiple reviewers if sufficient agreement is not achieved. Studies that fail to meet the inclusion criteria once the full version has been checked should be excluded at this stage.

    The study selection process should be clearly documented and include full details of the inclusion and exclusion criteria. A flow chart should be used to summarise the number of papers included and excluded at each stage and this should be presented in the evidence review (see the PRISMA statement). Each study excluded after checking the full version should be listed, along with the reason for its exclusion. Reasons for study exclusion need to be sufficiently detailed for people to be able to understand the reason without needing to read the original paper (for example, avoid stating only that 'the study population did not meet that specified in the review protocol', but also include why it did not match the protocol population).

    Priority screening

    Priority screening refers to any technique that uses a machine learning algorithm to enhance the efficiency of screening. Usually, this involves taking information on previously included or excluded papers, and using this to order the unscreened papers from those most likely to be included to those least likely. This can be used to identify a higher proportion of relevant papers earlier in the screening process, or to set a cut‑off for manual screening, beyond which it is unlikely that additional relevant studies will be identified.

    There is currently no published guidance on setting thresholds for stopping screening where priority screening has been used. Any methods used should be documented in the review protocol and agreed in advance with the NICE team with responsibility for quality assurance. Any thresholds set should, at minimum, consider the following:

    • the number of references identified so far through the search, and how this identification rate has changed over the review (for example, how many candidate papers were found in each 1,000 screened)

    • the overall number of studies expected, which may be based on a previous version of the guideline (if it is an update), published systematic reviews, or the experience of the guideline committee

    • the ratio of relevant/irrelevant records found at the random sampling stage (if undertaken) before priority screening.

    The actual thresholds used for each review question should be clearly documented, either in the guideline methods chapter or in the evidence reviews. Examples of how this has been implemented can be found in NICE's guidelines on autism spectrum disorders in under 19s and prostate cancer.

    Ensuring relevant records are not missed

    Regardless of the level of double-screening, and whether or not priority screening was used, additional checks should always be made to reduce the risk that relevant studies are not identified. These should include, at minimum:

    • checking reference lists of identified systematic reviews, even if these reviews are not used as a source of primary data

    • checking with the guideline committee that they are not aware of any relevant studies that have been missed

    • looking for published papers associated with any key trial registry entries or published protocols that have been identified.

    It may be useful to test the sensitivity of the search by checking that it picks up known studies of relevance.

    Conference abstracts

    Conference abstracts seldom contain enough information to allow confident judgements about the quality and results of a study. It can be difficult to trace the original studies or additional data, and the information found may not always be useful. Also, good-quality studies will often publish full text papers after the conference abstract, and these will be identified via routine searches. Conference abstracts should therefore not routinely be included in the search strategy and review, unless there are good reasons for doing so. If a decision is made to include conference abstracts for a particular review, the justification for doing so should be clearly documented in the review protocol. If conference abstracts are searched for, the investigators may be contacted if additional information is needed to complete the assessment for inclusion.

    National policy, legislation and medicines safety advice

    Relevant national policy, legislation or medicines safety advice may be identified in the literature search and used to inform guidelines (such as drug safety updates from the Medicines and Healthcare products Regulatory Agency [MHRA]). This evidence does not need critical appraisal in the same way as other evidence, given the nature of the source. National policy, legislation or medicines safety advice can be quoted verbatim as evidence (for example, the Health and Social Care Act [2012]), where needed, and a summary of any relevant medicines safety advice identified should be included in the evidence review.

    Unpublished data and studies in progress

    Any unpublished data should be quality assessed in the same way as published studies (see the section on assessing quality of evidence: critical appraisal, analysis, and certainty in the findings). If additional information is needed to complete the quality assessment, the investigators may be contacted. Similarly, if data from in-progress studies are included, they should be quality assessed in the same way as published studies. Confidential information should be kept to a minimum, and a structured abstract of the study must be made available for public disclosure during consultation on the guideline.

    Grey literature

    Grey literature may be quality assessed in the same way as published literature, although because of its nature, such an assessment may be more difficult. Consideration should therefore be given to the elements of quality that are most likely to be important (for example, elements of the study methodology that are less clearly described than in a published article, because of the lack of need to go through the peer-review process, or conflicts of interest in the study).

    6.2 Assessing evidence: critical appraisal, analysis, and certainty in the findings

    Introduction

    Assessing the quality of the evidence for a review question is critical. It requires a systematic process of assessing both the appropriateness of the study design and the methods of the study (critical appraisal) as well as the certainty of the findings (using an approach, such as GRADE).

    Options for assessing the quality of the evidence should be considered by the developer. The chosen approach should be discussed and agreed with NICE staff with responsibility for quality assurance, where the approach deviates from the standard (described in critical appraisal of individual studies). The agreed approach should be documented in the review protocol (see the appendix on review protocol template) together with the reasons for the choice. If additional information is needed to complete the data extraction or quality assessment, study investigators may be contacted, although this is not something that is done routinely.

    Critical appraisal of individual studies

    Every study should be appraised using a checklist appropriate for the study design (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles for checklists). If a checklist other than those listed is needed, or the one recommended as the preferred option is not used, the planned approach should be discussed and agreed with NICE staff with responsibility for quality assurance and documented in the review protocol.

    The ROBINS-I checklist is currently only validated and recommended for use with non-randomised controlled trials and cohort studies. However, there may be situations where a mix of non-randomised study types is included within a review. It can then be helpful to use this checklist across all included study types to maintain consistency of assessment. If this is done, additional care should be taken to ensure all relevant risks of bias for study designs for which ROBINS-I is not currently validated (such as case-control studies) are assessed.

    In some evidence reviews, it may be possible to identify particular risk of bias criteria that are likely to be the most important indicators of biases for the review question (for example, conflicts of interest or study funding, if it is an area where there is known to be concern about the sponsorship of studies). If any such criteria are identified, these should then be used to guide decisions about the overall risk of bias of each individual study.

    Sometimes, a decision might be made to exclude certain studies at particularly high risk of bias, or to explore any impact of bias through sensitivity analysis. If so, the approach should be specified in the review protocol and agreed with NICE staff with responsibility for quality assurance.

    Criteria relating to key areas of bias may also be useful when summarising and presenting the evidence (see the section on summarising evidence). Topic-specific input (for example, from committee members) may be needed to identify the most appropriate criteria to define subgroup analyses, or to define inclusion in a review, for example, the minimum biopsy protocol for identifying the relevant population in cancer studies.

    For each criterion that might be explored in sensitivity analysis, the decision on whether it has been met or not (for example, which population subgroup the study has been categorised as), and the information used to arrive at the decision (for example, the study inclusion criteria, or the actual population recruited into the study), should be recorded in a standard template for inclusion in an evidence table (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles for examples of evidence tables).

    Each study included in an evidence review should be critically appraised by 1 reviewer and a proportion of these checked by another reviewer. Any differences in critical appraisal should be resolved by discussion or involving a third reviewer.

    Data extraction

    Study characteristics should be extracted to a standard template for inclusion in an evidence table (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles). Care should be taken to ensure that newly identified studies are cross-checked against existing studies to avoid double-counting. This is particularly important where there may be multiple reports of the same study.

    If complex data extraction is done for a review question (for example, situations where a large number of transformations or adjustments are made to the raw data from the included studies), data extraction should be checked by a second reviewer to avoid data errors, which are time-consuming to fix. This may be more common in reviews using more complex analysis methods (for example, network meta-analyses or meta-regressions) but decisions around dual data extraction should be based on the complexity of the extraction, not the complexity of the analysis.

    Analysing and presenting results for studies on the effectiveness of interventions

    Meta-analysis may be appropriate if treatment estimates of the same outcome from more than 1 study are available. Recognised approaches to meta-analysis should be used, as described in the handbook from Cochrane, in Higgins et al. (2021) and documents developed by the NICE Guidelines Technical Support Unit.

    There are several ways of summarising and illustrating the strength and direction of quantitative evidence about the effectiveness of an intervention, even if a meta-analysis is not done. Forest plots can be used to show effect estimates and confidence intervals for each study (when available, or when it is possible to calculate them). They can also be used to provide a graphical representation when it is not appropriate to do a meta-analysis and present a pooled estimate. However, the homogeneity of the outcomes and measures in the studies needs to be carefully considered: a forest plot needs data derived from the same (or justifiably similar) population, interventions, outcomes and measures.

    Head‑to‑head data that compares the effectiveness of interventions is useful for a comparison between 2 active management options. A network meta-analysis (NMA) is a method that can include trials that compare the interventions of interest head-to-head and also trials that allow indirect comparisons via other interventions.

    The same principles of good practice for evidence reviews and meta-analyses should be applied when conducting network meta-analyses. The reasons for identifying and selecting the randomised controlled trials (RCTs) should be explained. This includes the reasons for selecting the treatment comparisons, and whether any interventions that are not being considered as options for recommendations will be included within the network to allow for indirect comparisons between interventions of interest. The methods of synthesis should be described clearly either in the methods section of the evidence review or the guideline methods chapter.

    When multiple competing options are being appraised, network meta-analysis is the preferred approach to use, and should be considered in such cases. The data from individual trials should also be documented (usually as an appendix). If there is doubt about the inclusion of particular trials (for example, because of concerns about limitations or applicability), a sensitivity analysis in which these trials are excluded may also be presented. The level of consistency between the direct and indirect evidence on the interventions should be reported, including consideration of model fit and comparison statistics such as the total residual deviance, and the deviance information criterion (DIC). Results of any further inconsistency tests done, such as deviance plots or those based on node-splitting, should also be reported.

    In addition to the inconsistency checks described above, which compare the direct and indirect evidence within a network meta-analysis model, results from direct comparisons may also be presented for comparison with the results from a network meta-analysis (thus comparing the direct and overall network meta-analysis results to aid validity checks and interpretation, rather than direct and indirect to check consistency). These may be the results from the direct evidence within the network meta-analysis, or from direct pairwise comparisons done outside the network meta-analysis, depending on which is considered more informative.

    When evidence is combined using network meta-analyses, trial randomisation should typically be preserved. If this is not appropriate, the planned approach should be discussed and agreed with NICE staff with responsibility for quality assurance. A comparison of the results from single treatment arms from different RCTs is not acceptable unless the data are treated as observational and analysed as such.

    Further information on complex methods for evidence synthesis is provided by the documents developed by the NICE Guidelines Technical Support Unit. The methods described in these documents should be used as the basis for analysis, and any deviations from these methods clearly described and justified, and agreed with NICE staff who have responsibility for quality assurance.

    To promote transparency of health research reporting (as endorsed by the EQUATOR network), evidence from a network meta-analysis should usually be reported according to the criteria in the modified PRISMA‑NMA checklist in the appendix on network meta-analysis reporting standards.

    Evidence from a network meta-analysis can be presented in a variety of ways. The network should be presented diagrammatically with the available treatment comparisons clearly identified, and show the number of trials in each comparison. Further information on how to present the results of network meta-analyses is provided by the documents developed by the NICE Guidelines Technical Support Unit.

    There is no NICE-endorsed approach for assessing the quality or certainty of outputs derived from network meta-analysis. At minimum, a narrative description of the confidence in the results of the network meta-analysis should be presented, considering all the areas in a standard GRADE profile (risk of bias, indirectness, inconsistency and imprecision). Several other approaches have been suggested in the literature that may be relevant in particular circumstances (Phillippo et al. 2019, Phillippo et al. 2017, Caldwell et al. 2016, Purhan et al. 2014, Salanti et al. 2014). The approach to assessing confidence in results should take into account the particular questions the network meta-analysis is trying to address. For example, the approach to imprecision may be different if a network meta-analysis is trying to identify the single most effective treatment, compared to creating a ranking of all possible treatments.

    Dealing with complex interventions

    Analysing quantitative evidence on complex interventions may involve considering factors other than effectiveness. This includes:

    Different analytical approaches are relevant to different types of complexity and question (see table 1 in Higgins et al, 2019). The appropriate choice of technique will depend on the review question, available evidence, time needed to do the approach and likely impact on guideline recommendations. The approach should be discussed and agreed with NICE staff who have responsibility for quality assurance.

    Further information on complex methods for evidence synthesis is provided by the documents developed by the NICE Guidelines Technical Support Unit and NICE's Decision Support Unit.

    Additional information is available from:

    Analysing and presenting results of studies of diagnostic test accuracy

    Information on methods of presenting and synthesising results from studies of diagnostic test accuracy is available in the Cochrane Handbook for Systematic Reviews of Interventions. When meta-analyses of paired accuracy measures (such as sensitivity and specificity) are done, bivariate analysis should be used where possible, to preserve correlations between outcomes. Univariate analyses can still be used if there are insufficient studies for a bivariate analysis.

    Meta-analyses should not normally be done on positive and negative predictive values, unless the analysis takes account of differences in prevalence. Instead, analyses can be done on sensitivity and specificity and these results applied to separate prevalence estimates to obtain positive and negative predictive values, if these are outcomes specified in the review protocol.

    If meta-analysis is not possible or appropriate (for example, if the differences between populations, references standard or index test thresholds are too large), there should be a narrative summary of the results that were considered most important for the review question.

    Analysing and presenting results of studies of prognosis, or prediction models for a diagnosis or prognosis

    There is currently no consensus on approaches for synthesising evidence from studies on prognosis, or prediction models for diagnosis or prognosis. The approach chosen should be based on the types of data included (for example, prognostic accuracy data, prediction models, or associative studies presenting odds ratios or hazard ratios). For prognostic accuracy data, the same approach for synthesis can be taken as with diagnostic accuracy data, with the addition of the need to consider length of follow-up as part of the analysis. When considering meta-analysis, reviewers should consider how similar the prognostic factors or predictors and confounding factors are across all studies reporting the same outcome measure. It is important to explore whether all likely confounding factors have been accounted for, and whether the metrics used to measure exposure (or outcome) are universal. When studies cannot be pooled, results should be presented consistently across studies. For more information on prognostic reviews, see Collins 2015 and Moons 2015.

    Analysing, synthesising and presenting results of qualitative evidence

    Qualitative evidence occurs in many forms and formats and so different methods may be used for synthesis and presentation (such as those described by the Cochrane Qualitative & Implementation Methods Group).

    Qualitative evidence should be synthesised and then summarised using GRADE-CERQual (see GRADE-CERQual Implementation series). If synthesis of the evidence is not appropriate, a narrative summary may be adequate; this should be agreed with NICE staff with responsibility for quality assurance. The approach used may depend on the volume of the evidence. If the qualitative evidence is extensive, then a recognised method of synthesis is preferable (normally aggregative, thematic or framework synthesis type approaches). If the evidence is disparate and sparse, a narrative summary may be appropriate.

    The simplest approach to synthesise qualitative data in a meaningful way is to group the findings in the evidence tables (comprising of 'first order' participant quotes and participant observations as well as 'second order' interpretations by study authors). Then, to write third-order interpretations based on the reviewers' interpretations of the first and second-order constructs synthesised across studies. These third-order interpretations will become themes and sub-themes or 'review findings'. This synthesis can be carried out if enough data are found, and the papers and research reports cover the same (or similar) context or use similar methods. These should be relevant to the review questions and could, for example, include intervention, age, population or setting.

    Synthesis can be carried out in several ways (as noted above), and each may be appropriate depending on the question type, and the evidence identified. Papers reporting on the same findings can be grouped together to compare and contrast themes, focusing not just on consistency but also on any differences. The narrative should be based on these themes.

    A more complex but useful approach is 'conceptual mapping' (see Johnson et al. 2000). This involves identifying the key themes and concepts across all the evidence tables and grouping them into first level (major), second level (associated) and third level (subthemes) themes. Results are presented in schematic form as a conceptual diagram and the narrative is based on the structure of the diagram.

    Data and thematic saturation

    There are differences between 'data saturation' and 'thematic saturation'.

    Data saturation

    Data saturation applies to a specific 'theme' identified from simple thematic analysis of included studies. Data saturation for a theme could be judged by the coherence and adequacy components from GRADE-CERQual. For example, is there enough data to judge that an identified 'theme' is fully coherent (for example the data extracted from included studies are unambiguous, consistent across studies and sufficiently 'rich').

    It is not appropriate to apply an arbitrary threshold on the number of studies required per theme to make the judgement that data saturation has been reached. Instead, it is about the depth of information relating to a particular theme provided by each individual study that contributes to it.

    The more complex the finding, the more detailed or rich the supporting data need to be. For simple findings, relatively superficial data could be considered adequate to explain and explore the phenomenon being described.

    If data saturation is judged to be reached for 'a specific theme' in the review, the reviewer can stop extracting data that support that specific theme from the rest of the included studies (but it doesn't mean reviewer can stop going through the rest of the included studies to identify/extract data relevant to 'other themes').

    Thematic saturation

    Thematic saturation applies when no more potential themes can be identified from the review.

    If the committee has provided an 'anticipated framework of phenomena' during protocol development (for example, a list of 'phenomena' they are interested in, to investigate whether the evidence validates or refutes their hypotheses), this list of phenomena of interest needs to be highlighted in the review protocol with brief rationales or explanation. In this case, thematic saturation could be reached when the reviewer is satisfied that data is sufficiently 'rich' and 'coherent' to validate (or refute) the committee's hypotheses. The reviewer can then stop the review (stop going through the rest of the included studies). Judgement needs to be applied by the reviewer and it is not appropriate to apply an arbitrary threshold based on the number of studies.

    If the review is fully 'exploratory' using simple thematic analysis (to identify whatever themes the reviewer comes across in the included studies, so there is no theoretical framework nor a committee hypothesis on phenomena of interest), thematic saturation does not apply. This is because the reviewer can never be sure that 'new themes' won't exist until the last included study is assessed and data extracted.

    Integrating and presenting results of mixed methods reviews

    If a mixed methods approach has been identified as needed (see the section on developing review questions and planning the evidence review), then the approach to integration needs consideration. Integration refers to A) how quantitative and qualitative evidence are combined following separate synthesis (convergent segregated) or, B) how quantitative and qualitative data that have been transformed are merged (convergent-integrated).

    • A) The convergent-segregated approach consists of doing separate quantitative and qualitative syntheses (as usual), followed by integration of the results derived from each of the syntheses. Integrating the quantitative and qualitative synthesised findings gives a greater depth of understanding of the phenomena of interest compared to doing two separate component syntheses without formally linking the two sets of evidence.

    • B) All qualitative evidence from a convergent-segregated mixed methods review should be synthesised and then summarised using GRADE-CERQual. If appropriate, all quantitative data (for example, for intervention studies) should be presented using GRADE. An overall summary of how the quantitative and qualitative evidence are linked should ideally be presented in either matrices or thematic diagrams. It should also be summarised in the review using the approach questions in the section on integration of quantitative and qualitative evidence to frame the integration evidence summary (JBI manual for evidence synthesis):

    Integration of quantitative and qualitative evidence 

    The integration section should provide a summary that represents the configured analysis of the quantitative and qualitative evidence. This can include matrices, look-up tables or thematic maps, but as a minimum should include statements that address all of the following questions:

    • Are the results and findings from individual syntheses supportive or contradictory?

    • Does the qualitative evidence explain why the intervention is or is not effective?

    • Does the qualitative evidence explain differences in the direction and size of effect across the included quantitative studies?

    • Which aspects of the quantitative evidence were or were not explored in the qualitative studies?

    • Which aspects of the qualitative evidence were or were not tested in the quantitative studies?

    This should be reported as a summary of the mixed findings after reporting on the effectiveness and qualitative evidence synthesis.

    • A) The convergent-integrated approachrefers to a process of combining extracted data from quantitative studies (including data from the quantitative component of mixed methods studies) and qualitative studies (including data from the qualitative component of mixed methods studies) and involves data transformation.

    The convergent-segregated approach is the standard approach to adopt in most NICE mixed-methods reviews. If convergent segregated is not the planned approach, data transformation methods and outcome reporting should be discussed and agreed with NICE staff who have responsibility for quality assurance and documented in the review protocol.

    Certainty or confidence in the findings of analysis

    Once critical appraisal of the studies and data analysis are complete, the certainty or confidence in the findings should be presented (for individual or synthesised studies) at outcome level using GRADE or GRADE-CERQual. Although GRADE has not been formally validated for all quantitative review types (such as prognostic reviews), GRADE principles can be applied and adapted to other types of questions. Any substantial changes made by the developer to GRADE should be agreed with NICE staff with responsibility for quality assurance before use.

    If using GRADE or GRADE-CERQual is not appropriate, the planned approach should be discussed and agreed with NICE staff with responsibility for quality assurance. It should be documented in the review protocol (see the appendix on review protocol template) together with the reasons for the choice.

    Certainty or confidence in the findings by outcome

    Before starting an evidence review, the outcomes of interest which are important to people using services and the public for the purpose of decision making should be identified. The reasons for prioritising outcomes should be documented in the evidence review. This should be done before starting the evidence review and clearly separated from discussion of the evidence, because there is potential to introduce bias if outcomes are selected when the results are known. An example of this would be choosing only outcomes for which there were statistically significant results.

    The committee discussion section should also explain how the importance of outcomes was considered when discussing the evidence. For example, the committee may want to define prioritised outcomes into 'critical' and 'important'. Alternatively, they may think that all prioritised outcomes are crucial for decision making. In this case, there will be no distinction between 'critical' or 'important' for all prioritised outcomes. The impact of this on the final recommendations should be clear.

    GRADE and GRADE-CERQual assess the certainty or confidence in the review findings by looking at features of the evidence found for each outcome or theme. GRADE is summarised in box 6.1, and GRADE-CERQual in box 6.2.

    Box 6.1 GRADE approach to assessing the certainty of evidence for intervention studies

    GRADE assesses the following features for the evidence found for each outcome:

    • study limitations (risk of bias) – the internal validity of the evidence

    • inconsistency – the heterogeneity or variability in the estimates of treatment effect across studies

    • indirectness – the extent of differences between the population, intervention, comparator for the intervention and outcome of interest in the studies from that in the review protocol

    • imprecision – the level of certainly in the effect estimate

    • other considerations – publication bias, the degree of selective publication of studies.

    In a standard GRADE approach, the certainty or confidence of evidence is classified as high, moderate, low or very low. In the context of NICE guidelines, it can be interpreted as follows:

    • High – further research is very unlikely to change our recommendation.

    • Moderate – further research may have an important impact on our confidence in the estimate of effect and may change the strength of our recommendation.

    • Low – further research is likely to have an important impact on our confidence in the estimate of effect and is likely to change the recommendation.

    • Very low – any estimate of effect is very uncertain and further research will probably change the recommendation

    Box 6.2 GRADE-CERQual approach to assessing the confidence of evidence for qualitative studies

    GRADE-CERQual assesses the following features for the evidence found for each finding:

    • methodological limitations – the internal validity of the evidence

    • relevance – the extent to which the evidence is applicable to the context in the review question

    • coherence – the extent of the similarities and differences within the evidence

    • adequacy of data – the extent of richness and quantity of the evidence.

    In a standard GRADE-CERQual approach, the certainty or confidence of evidence is classified as high, moderate, low or very low. In the context of NICE guidelines, it can be interpreted as follows:

    • High – it is highly likely that the review finding is a reasonable representation of the phenomenon of interest.

    • Moderate – it is likely that the review finding is a reasonable representation of the phenomenon of interest.

    • Low – it is possible that the review finding is a reasonable representation of the phenomenon of interest.

    • Very low – it is unclear whether the review finding is a reasonable representation of the phenomenon of interest.

    The approach taken by NICE differs from the standard GRADE and GRADE-CERQual system in 2 ways:

    • it also integrates a review of the quality of cost-effectiveness studies (see the chapter on incorporating economic evaluation)

    • it does not use 'overall summary' labels for the quality of the evidence across all outcomes, or for the strength of a recommendation, but uses the wording of recommendations to reflect the strength of the evidence (see the chapter on writing the guideline).

    GRADE or GRADE-CERQual tables summarise the certainty in the evidence and data for each critical and each important outcome or theme and include a limited description of the certainty in the evidence. GRADE or GRADE-CERQual tables should be available (in an appendix) for each review question.

    For mixed methods findings there is no recognised approach to combining the certainty of evidence from GRADE and GRADE-CERQual. The certainty and confidence ratings should be reported for both evidence types within the evidence summary of integrated findings and their impact on decision making described in the relevant section of the review.

    Alternative approaches to assessing imprecision in GRADE

    NICE has carried out a pilot of an alternative approach to assessing imprecision using GRADE. In a standard GRADE approach, clinical decision thresholds (often based on minimum clinically important differences) are used as part of the assessment of imprecision. In the alternative approach piloted, imprecision was not assessed as part of the GRADE profile but instead was explicitly discussed in the committee discussion section of the evidence review. An explicit discussion of the magnitude of benefits and harms and their relative importance was also included. Examples of this approach can be found in the evidence reviews for the NICE guidelines on neonatal infection and depression in children.

    If this approach is adopted, the following modifications need to be made to the write-up in the evidence review:

    • Minimal clinically important differences will no longer be used as part of the GRADE process but should still be searched for and, where available, used in the interpretation of effect sizes.

    • GRADE tables should still be presented, but without the domain for imprecision (this will mean overall confidence ratings will be either the same or higher than if imprecision is included as a domain).

    • The committee discussion section of the evidence review should contain both a discussion of the committee's interpretation of imprecision and the clinical importance of the findings, and how both that information and the other factors contained within the GRADE tables fed into the final recommendations made.

    After the pilot, this approach to assessing imprecision can be used as part of developing NICE guidelines. However, this should be agreed in advance with NICE staff who have responsibility for quality assurance and, if this approach is adopted, it should be used for all quantitative review questions within a guideline or guideline update, and stated in the review protocol.

    6.3 Equality and diversity considerations

    NICE's equality and diversity duties are expressed in a single public sector equality duty ('the equality duty', see the section on key principles that guide the development of NICE guidance and standards in the introduction chapter). The equality duty supports good decision-making by encouraging public bodies to understand how different people will be affected by their activities. For NICE, much of whose work involves developing advice for others on what to do, this includes thinking about how people will be affected by its recommendations when these are implemented (for example, by health and social care practitioners).

    6.4 Health inequalities

    In addition to meeting its legal obligations, NICE is committed to going beyond compliance, particularly in terms of tackling health inequalities. Specifically, NICE considers that it should also take account of the four dimensions of health inequalities – socioeconomic status and deprivation, protected characteristics (defined in the Equality Act 2010), inclusion health groups (such as people experiencing homelessness and young people leaving care), and geography. Wherever possible, NICE's guidance aims to reduce and not increase identified health inequalities.

    Ensuring inclusivity of the evidence review criteria

    Any equality criteria specified in the review protocol should be included in the evidence tables. At the data extraction stage, reviewers should refer to the PROGRESS-Plus criteria (including age, gender/sex, sexual orientation, gender reassignment, disability, ethnicity, religion, place of residence, occupation, education, socioeconomic position and social capital; Gough et al. 2012) and any other relevant protected characteristics, and record these where reported, if specified in the review protocol. Review inclusion and exclusion criteria should also take the relevant groups into account, as specified in the review protocol.

    Equalities and health inequalities should be considered during the drafting of the evidence reviews, including any issues documented in the equality impact assessment. Equality and health inequality considerations should be included in the data extraction process and should be recorded in the committee discussion section.

    6.5 Summarising evidence

    Presenting evidence

    The following sections should be included in the evidence review:

    • an introduction to the evidence review

    • a description of the studies or other evidence identified, in either table or narrative format

    • evidence tables (usually presented in an appendix)

    • full GRADE or GRADE-CERQual profiles (in an appendix)

    • evidence summaries (of the results or conclusions of the evidence)

    • an overall summary of merged quantitative and qualitative evidence (either using matrices or thematic diagrams) and the integration questions for mixed methods reviews

    • results from other analysis of evidence, such as forest plots, area under the curve graphs, network meta-analysis (usually presented in an appendix; see the appendix on network meta-analysis reporting standards).

    The evidence should usually be presented separately for each review question; however, alternative methods of presentation may be needed for some evidence reviews (for example, where review questions are closely linked and need to be interpreted together).

    Any substantial deviations in presentation need to be agreed, in advance, with NICE staff with responsibility for quality assurance.

    Describing the included evidence

    A description of the evidence identified should be produced. The content of this will depend on the type of question and the type of evidence. It should also identify and describe any gaps in the evidence, and cover at minimum:

    • the volume of information for the review question(s), that is, the number of studies identified, included, and excluded (with a link to a PRISMA selection flowchart, in an appendix)

    • the study types, populations, interventions, settings or outcomes for each study related to a particular review question.

    Evidence tables

    Evidence tables help to identify the similarities and differences between studies, including the key characteristics of the study population and interventions or outcome measures.

    Data from identified studies are extracted to standard templates for inclusion in evidence tables. The type of data and study information that should be included depends on the type of study and review question, and should be concise and consistently reported.

    The types of information that could be included for quantitative studies are:

    • bibliography (authors, date)

    • study aim, study design (for example, RCT, case–control study) and setting (for example, country)

    • funding details (if known)

    • population (for example, source and eligibility, and which population subgroup of the protocol the study has been mapped to, if relevant)

    • intervention, if applicable (for example, content, who delivers the intervention, duration, method, dose, mode or timing of delivery, and which intervention subgroup of the protocol the study has been mapped to, if relevant)

    • comparator, if applicable (for example, content, who delivers the intervention, duration, method, dose, mode or timing of delivery)

    • method of allocation to study groups (if applicable)

    • outcomes (for example, primary and secondary and whether measures were objective, subjective or otherwise validated, and the timepoint at which these outcomes were measured)

    • key findings (for example, effect sizes, confidence intervals, for all relevant outcomes, and where appropriate, other information such as numbers needed to treat and considerations of heterogeneity if summarising a systematic review or meta-analysis)

    • inadequately reported data, missing data or if data have been imputed (include method of imputation or if transformation is used)

    • overall comments on quality, based on the critical appraisal and what checklist was used to make this assessment. When study details are inadequately reported, or absent, this should be clearly stated.

    If data are not being used in any further statistical analysis, or are not reported in GRADE tables, effect sizes (point estimate) with confidence intervals should be reported, or back calculated from the published evidence where possible. If confidence intervals are not reported, exact p values (whether or not significant), with the test from which they were obtained, should be described. When confidence intervals or p values are inadequately reported or not given, this should be stated. Any descriptive statistics (including any mean values and degree of spread such as ranges) indicating the direction of the difference between intervention and comparator should be presented. If no further statistical information is available, this should be clearly stated.

    The type of data that could be reported in evidence tables for qualitative studies includes:

    • bibliography (authors, date)

    • study aim, study design and setting (for example, country)

    • funding details (if known)

    • population or participants

    • theoretical perspective adopted (such as grounded theory)

    • key objectives and research questions; methods (including analytical and data collection technique)

    • key themes/findings (including quotes from participants that illustrate these themes or findings, if appropriate)

    • gaps and limitations

    • overall comments on quality, based on the critical appraisal and what checklist was used to make this assessment. When study details are inadequately reported, or absent, this should be clearly stated.

    Evidence summaries

    Full GRADE or GRADE-CERQual tables that present both the results of the analysis and describe the confidence in the evidence should normally be provided (in an appendix).

    Additionally, whether GRADE or GRADE-CERQual are used or not, a summary of the evidence should be included within the evidence review. This summary can be in any format (narrative, tabular, pictorial) but should contain sufficient detail to explain the key findings of the review without needing to refer to the full results in the appendices.

    Evidence summaries are structured and written to help committees formulate recommendations, and stakeholders and users of the guidance to understand the reason why those recommendations were made. They are separate to the committee's interpretation of the evidence, which shoud be covered in the committee discussion section. They can help to understand:

    • whether or not there is sufficient evidence (in terms of strength and applicability) to form a judgement

    • whether (on balance) the evidence demonstrates that an intervention, approach or programme is effective or ineffective, or is inconclusive

    • the size of effect and associated measure of uncertainty

    • whether the evidence is applicable to people affected by the guideline and contexts covered by the guideline.

    Structure and content of evidence summaries

    Evidence summaries do not need to repeat every finding from an evidence review, but should contain sufficient information to understand the key findings of the review, including:

    • Sufficient descriptions of the interventions, tests or factors being reported on to enable interpretation of the results reported.

    • The volume of and confidence in the evidence, as well as the magnitude and direction of effects.

    • Key strengths and limitations of the evidence that may not be obvious from overall confidence ratings (for example, the countries evidence came from, if that is expected to have a meaningful impact on the results).

    • For findings not showing a meaningful benefit or harm between multiple options, it should be clear whether these have been interpreted as demonstrating equivalence, or simply that it is not possible to tell whether there is a difference or not from the available evidence.

    • Any outcomes where evidence was searched for but no or insufficient evidence was found.

    These summaries can be done in a variety of formats (for example, evidence statement, narrative summaries, tables) provided they cover the relevant information. 'Vote counting' (merely reporting on the number or proportion of studies showing a particular positive or negative finding) is not an acceptable summary of the evidence.

    Context- or topic-specific terms (for example, 'an increase in HIV incidence', 'a reduction in injecting drug use' and 'smoking cessation') may be used. Any such terms should be used consistently in each review and their definitions reported.