3 Evidence

The diagnostics advisory committee (section 7) considered evidence on the ARCHITECT and Alinity i Urine neutrophil gelatinase-associated lipocalin (NGAL) assays, BioPorto NGAL test and NephroCheck test for detecting emerging acute kidney injury from several sources. Full details of all the evidence are in the committee papers.

Clinical effectiveness

3.1 The external assessment group (EAG) did a systematic review to identify evidence on the diagnostic accuracy and clinical effectiveness of the ARCHITECT and Alinity i Urine NGAL assays, BioPorto NGAL test and NephroCheck test to help assess, and reduce, the risk of acute kidney injury for critically ill patients who are being considered for critical care admission. Although the population in the scope was people being considered for critical care admission, to maximise the available data the EAG included data from studies that enrolled patients already admitted to critical care.

3.2 In total, 56 studies (reported in 71 articles) were included. Of these, 46 enrolled adults only, 8 enrolled children only and 2 enrolled both adults and children. Twenty-eight studies were done in Europe (4 in the UK), 15 in North America, 9 in Asia, 2 in North America and Europe, 1 in Australia and 1 study did not provide details of location. In most studies data were collected prospectively.

3.3 The studies either reported data on using the biomarkers to detect or predict acute kidney injury or to predict clinical outcomes (mortality or need for renal replacement therapy [RRT]) in critically ill patients admitted to hospital. No randomised controlled trials or controlled clinical trials were identified. No studies compared using the biomarkers with standard clinical care for clinical effectiveness outcomes.

3.4 The studies assessed using the tests in various clinical settings. The EAG divided the studies in adults and children into 3 groups based on clinical setting: people who had cardiac surgery, people who had major non-cardiac surgery and people admitted to critical care (including critically ill patients presenting to the emergency department, patients admitted to intensive care or patients considered for critical care for various medical conditions).

Evidence on accuracy to detect emerging acute kidney injury

3.5 Test accuracy was determined by the ability of the tests to identify the presence of acute kidney injury according to current clinical criteria (that is, using serum creatinine and urine output). A rise in serum creatinine levels or fall in urine output, or both, occurring within a certain time after the NephroCheck or NGAL test was done (this varied between studies, from within 12 hours to within 7 days) were used to indicate if acute kidney injury occurred (reference standard). The EAG could extract or derive the necessary data for calculating sensitivity and specificity estimates from 33 of the included studies.

3.6 The QUADAS‑2 tool was used for quality assessment of the studies. The EAG commented that it was not clear in most studies if the tests were interpreted without knowledge of the reference standard (unclear risk of bias). Studies that used NephroCheck were judged at low risk of bias for interpretation of the test because they used a common threshold for a positive result. However, for the NGAL studies a common threshold was not used. The EAG also commented that in the NGAL studies the threshold was not pre-specified before data were collected. Two studies were assessed as being at high risk of bias on the patient flow domain because more than 50% of the participants were excluded from the analysis (Jaques et al. 2019) or because of poor reporting (Asada et al. 2016). The EAG considered that the applicability of the index test results to the NHS was unclear in many studies because there was wide variation in the NGAL threshold used to define a positive test result and in the timing of the test sample collection. The EAG commented that it had no major concerns that the patient population, index text and reference standard were not applicable to the review question. However, in some of the included studies people were already admitted to critical care.

3.7 Because the threshold used for a positive test result varied in the identified studies, the EAG ran meta-analyses using the hierarchical summary ROC (HSROC) model to estimate summary values for sensitivity and specificity. If multiple thresholds were used in a study, the EAG selected 1 threshold to use in its analysis. Meta-analysis was only done if data from 4 or more studies were available.

NephroCheck test (adults)

3.8 All studies assessed used the NephroCheck test on urine samples. No studies were done in the UK. Two studies assessed using NephroCheck to detect acute kidney injury after cardiac surgery and 5 studies assessed its use in hospitalised patients admitted to intensive or critical care for various clinical reasons. No studies were identified in people who had major non-cardiac surgery. The summary estimate for sensitivity was 0.75 (95% confidence interval [CI] 0.58 to 0.87) and for specificity was 0.61 (95% CI 0.49 to 0.72). The EAG commented that there was heterogeneity across studies and noted that estimates of specificity were generally low.

ARCHITECT Urine NGAL assay (adults)

3.9 Two studies provided test accuracy data on using the ARCHITECT NGAL assay to detect acute kidney injury after cardiac surgery. Four studies assessed its use in hospitalised patients admitted to intensive or critical care for various clinical reasons. No studies were done in the UK or were identified in people who had major non-cardiac surgery. The summary estimate for sensitivity was 0.67 (95% CI 0.58 to 0.76) and for specificity was 0.72 (95% CI 0.64 to 0.79). The EAG commented that there was heterogeneity across studies.

BioPorto NGAL test – urine (adults)

3.10 Eight studies assessed using the BioPorto NGAL test with urine for detecting acute kidney injury: 1 study in people who had cardiac surgery, 1 study in people who had major non-cardiac surgery and 6 studies in hospitalised patients admitted to intensive or critical care for various clinical reasons. One study was done in the UK (Matsa et al. 2014). The summary estimate for sensitivity was 0.73 (95% CI 0.65 to 0.80) and for specificity was 0.83 (95% CI 0.64 to 0.93). The EAG commented that there was heterogeneity across studies.

BioPorto NGAL test – plasma (adults)

3.11 The EAG only identified studies in the critical care setting for the BioPorto NGAL test used with blood plasma (4 studies). One study was done in the UK (Matsa et al. 2014). The summary estimate for sensitivity was 0.76 (95% CI 0.56 to 0.89) and for specificity was 0.67 (95% CI 0.40 to 0.86). The EAG commented that there was heterogeneity across studies.

Children

3.12 Seven studies assessed using the NGAL assays with urine samples to detect acute kidney injury in children. No studies were done in the UK. No studies assessing the use of NephroCheck in children were identified.

ARCHITECT Urine NGAL assay (children)

3.13 Five studies assessed using the ARCHITECT Urine NGAL assay to detect acute kidney injury in children who had cardiac surgery. The summary estimate for sensitivity was 0.68 (95% CI 0.53 to 0.80) and for specificity was 0.79 (95% CI 0.63 to 0.89). The EAG commented that there was considerable heterogeneity across studies. No studies were identified in a population who had major non-cardiac surgery. One study assessed using the ARCHITECT Urine NGAL assay to detect acute kidney injury in children admitted to intensive or critical care for various clinical reasons. The sensitivity and specificity were 0.77 (95% CI 0.60 to 0.90) and 0.85 (95% CI 0.74 to 0.92), respectively.

BioPorto NGAL test – urine (children)

3.14 One study assessed using the BioPorto NGAL test with urine for detecting acute kidney injury in children who had cardiac surgery. NGAL was measured using a concentration normalised by units of creatinine. The sensitivity and specificity were 0.77 (95% CI 0.69 to 0.84) and 0.47 (95% CI 0.40 to 0.54), respectively.

Evidence on ability to predict intermediate outcomes

3.15 The EAG identified 11 studies with data on the ability of the tests to predict mortality, 4 studies with data on predicting the need for RRT and 3 studies that assessed the ability of the tests to predict worsening of acute kidney injury. All studies were in critically ill patients at risk of acute kidney injury. For predicting mortality, area under the curve (AUC) values varied from 0.55 to 0.91. For predicting the need for RRT, AUC values varied from 0.68 to 0.86. For predicting worsening of acute kidney injury, AUC values varied from 0.66 to 0.71.

3.16 The EAG commented that adding the tests to existing clinical models generally improved risk prediction of newly developed acute kidney injury, or worsening of acute kidney injury, and mortality. However, it cautioned that there were limited data available and the statistical models used varied between studies. Also, information on potential candidate variables considered in studies was often not provided.

3.17 No studies were identified that reported the effect of using the tests on clinical or patient-reported outcomes.

Cost effectiveness

Systematic review of cost-effectiveness evidence

3.18 The EAG did a systematic review to identify any published economic evaluations of the ARCHITECT and Alinity i Urine NGAL assays, the BioPorto NGAL test (plasma and urine) and the NephroCheck test for assessing people at risk of developing acute kidney injury. Two of the studies identified used modelling strategies that were similar, and that the EAG considered appropriate for the current decision problem. One of these (Hall et al. 2018) was done in the UK, and the EAG considered it a comprehensive and high-quality assessment. But because the setting was outside the scope of this assessment (people already admitted to intensive care units), the EAG adapted the model for critically ill patients who are at risk of acute kidney injury and being considered for admission to critical care.

Model structure

3.19 The EAG developed a de novo economic model designed to assess the cost effectiveness of using the tests (in addition to standard clinical monitoring) to help detect the risk of developing acute kidney injury and to help start early preventive care.

3.20 This was a 2-stage model using TreeAge Pro software. Limited direct evidence was identified that showed the effect of using the tests (compared with standard monitoring alone) on health outcomes (such as acute kidney injury status; mortality; development of chronic kidney disease). So the EAG used observational associations to infer how preventing or reducing the severity of acute kidney injury may affect changes in health outcomes (a linked-evidence approach). An initial decision-tree phase modelled:

  • The accuracy of the tests to identify people with emerging acute kidney injury.

  • For people with a positive biomarker test result, the effect of preventive measures (a Kidney Disease Improving Global Outcomes [KDIGO] care bundle) on reducing the probability that they develop acute kidney injury or reducing the severity of the condition if they develop it.

  • The effect of developing acute kidney injury, and its severity, on short-term outcomes (within 90 days): whether a person is admitted to intensive care, length of stay in intensive care or hospital, development of chronic kidney disease and 90‑day mortality.

    After this initial 90‑day period, a longer-term Markov model was used to model the effect of developing acute kidney injury while in hospital on the risk of developing chronic kidney disease, and the effect of this condition on the rest of a person's life.

Population

3.21 The modelled population was people in hospital at risk of developing acute kidney injury, having their serum creatinine and urine output monitored. The EAG used the Grampian population register of hospitalisations to characterise this population. This dataset included 17,630 adults admitted to hospital in Grampian in 2003. It is the complete population of all patients who had an abnormal kidney function blood test on hospital admission and had at least an overnight stay in hospital, including all patients who developed acute kidney injury. The model starting base-case population was 63 years old, 54.3% women, with about 11% having chronic kidney disease (in the model, more people could develop this condition over time). The base-case prevalence of acute kidney injury (that is, people who will develop the condition while in hospital under standard monitoring) was assumed to be 9.2%.

Model inputs

3.22 The sensitivity and specificity of the tests to identify people who will develop acute kidney injury (as shown by a later increase in serum creatinine or drop in urine output, or both) was taken from the systematic review and meta-analysis referred to in the clinical effectiveness section. The EAG used values pooled from all studies identified for each of the tests across all clinical settings. The incidence of acute kidney injury and the effect of developing the condition on clinical outcomes (admission to intensive care, 90-day mortality) was estimated by the EAG largely using data from the Grampian observational dataset. The model could vary which clinical outcomes were affected by acute kidney injury status, and the size of this effect.

3.23 The EAG assumed that a KDIGO care bundle would be the preventive care used if the tests were positive. It did a literature search to identify studies to estimate the effectiveness of this intervention for the model. The EAG did not include the identified studies in its clinical effectiveness review because the studies did not report the direct effect of using the tests on clinical outcomes. Instead the EAG included the studies in its cost-effectiveness review (as part of the rationale for parameter values used in the model). The EAG used data from Meersch et al. (2017) for the effect of the KDIGO care bundle in the model. This was a single-centre randomised controlled trial done in Germany in people who had cardiac surgery (n=276). People who had a positive NephroCheck test (using a score of over 0.3) were randomised to either standard care (less intensive care than with the KDIGO care bundle) or standard care plus a KDIGO care bundle. People having standard care followed the recommendations of the American College of Cardiology Foundation (2011), which included keeping mean arterial pressure over 65 mmHg and central venous pressure between 8 mmHg and 10 mmHg. The KDIGO care bundles included avoiding nephrotoxic agents, discontinuing angiotensin-converting enzyme inhibitors and angiotensin receptor blockers, close monitoring of urine output, serum creatinine, avoiding hyperglycaemia (for 72 hours), considering alternatives to radiocontrast agents, and optimising fluids. Although there was a significant reduction in occurrence of acute kidney injury by 72 hours for the KDIGO arm compared with standard care (odds ratio 0.48 [95% CI 0.29 to 0.80]), the EAG commented that this did not appear to translate to other clinical outcomes (need for RRT in hospital, 90‑day all-cause mortality and length of stay in intensive care or hospital).

3.24 The EAG found 2 other studies reporting the effects of KDIGO care bundles; Gocze et al. (2018) and Schanz et al. (2018). Both were done in Germany and assessed the effect of NephroCheck-guided application of a KDIGO care bundle compared with standard care (no use of a care bundle). Gocze et al. was a smaller study (n=121) than Meersch et al. and reported that NephroCheck-guided care (after major non-cardiac surgery) showed a trend towards a lower probability of acute kidney injury. But the results were not statistically significant; the odds ratio for standard care compared with NephroCheck was 1.96 (95% CI 0.93 to 4.10). There was, however, a statistically significant increase in the odds of stage 2 or stage 3 acute kidney injury in the standard care group compared with NephroCheck: 3.43 (95% CI 1.04 to 11.32). Schanz et al. (n=100) compared the effect of NephroCheck-triggered implementation of KDIGO recommendations for acute kidney injury with standard care alone in an emergency department in Germany. Acute kidney injury outcomes were similar in both groups. The probability of acute kidney injury stage 2 or stage 3 was 32.1% for the intervention group and 33.3% for the control group after 1 day. After 3 days this was 38.9% for intervention group and 39.1% for the control group. The effect size from Gocze et al. was used in a scenario analysis. Data from intensive care registers, reports and studies were used for parameters in the longer-term Markov model.

Costs

3.25 Test-related costs are shown in table 1. In its base-case analysis, the EAG assumed that an Astute 140 meter would need to be purchased to use NephroCheck, so included the cost of this. The EAG assumed that the NGAL tests are run on platforms that are already available in hospital laboratories, so the cost of these analysers was assumed to be negligible and was not included in the analysis. A scenario analysis was done in which no capital costs (including an analyser) or training costs were included for the tests.

Table 1 Test-related costs

Cost per test

NephroCheck

BioPorto NGAL test a

Abbott ARCHITECT NGAL assay

Abbott Alinity b i Urine NGAL assay

Platform cost

£0.53

Equipment cost

£49.80

£20.00

£25.71

£28.29

Maintenance/ consumables

£4.23

£1.90

£3.51

£3.51

Staff costs

£37.62

£37.62

£37.62

£37.62

Staff training costs

£0.08

£0.03

£0.03

£0.03

Total cost

£92.26

£59.55

£66.87

£69.44

a Costs assumed to be the same for plasma and urine samples.

b The Alinity NGAL assay was not included in the base-case analysis because of a lack of data for this assay.

Abbreviation: NGAL, neutrophil gelatinase-associated lipocalin.

3.26 The EAG assumed that the KDIGO care bundle would be applied for an additional 3 days over and above standard care for people who tested positive on the NephroCheck or NGAL tests (based on clinical opinion and consistent with the primary outcome measure from Meersch et al. 2017). Resources included in the care bundle costs included intravenous fluids (including nurse time), nephrologist and pharmacist review time and stopping blood pressure medication. The total additional cost of applying the KDIGO bundle was assumed to be £106.36 per person.

Health-related quality of life

3.27 The EAG updated the searches run in Hall et al. (2018) to identify any additional source of utility data for its model for both the initial decision-tree phase and longer-term Markov model. The age- and sex-matched EQ‑5D UK population norms were calculated using an equation published by Ara and Brazier (2010). These were used to derive age- and sex-adjusted utility multipliers from the raw pooled estimates from studies, based on the age and sex distribution of the source studies.

Base-case assumptions

3.28 The following assumptions (in addition to those described in previous sections) were applied in the base-case analyses:

  • Acute kidney injury, and more severe acute kidney injury, can be prevented by earlier NephroCheck or NGAL-guided use of a KDIGO care bundle (for people who would otherwise develop it with standard monitoring alone) in base case 1. In base case 2, NGAL-guided care cannot prevent acute kidney injury (but can reduce the severity of the condition).

  • In base case 1, the NephroCheck biomarkers and NGAL rise at similar times and the earlier identification of emerging kidney injury (relative to serum creatinine and urine output changes) is the same for both tests.

  • There are no adverse effects on health caused by a false-positive NephroCheck or NGAL test result.

  • No adaptions to standard monitoring were made for people testing negative on NephroCheck or NGAL tests (although standard monitoring done alongside would detect acute kidney injury for false-negative tests, just at a later time). This was because the EAG assumed that de-escalation of care would not occur solely because of a negative test result.

  • Everyone with a positive NephroCheck or NGAL test immediately had a KDIGO care bundle.

  • After 5 years post-transplant, mortality reverted to the general population all-cause mortality probability. The annual probability of transplant failure remained as that reported from years 3 to 5 in the UK renal registry.

  • The proportion of people whose transplant failed returned to dialysis. Their probability of progressing from end-stage renal disease on dialysis to a second transplant was the same as for progressing to the first transplant.

Base-case results

3.29 No evidence for NGAL test-guided implementation of preventive care for acute kidney injury on clinical outcomes was identified. Therefore, the EAG did 2 base cases:

  • Base case 1: Using the NGAL test had the same effect as the NephroCheck test to prevent acute kidney injury and reduce severity of the condition if it occurred (based on Meersch et al. 2017).

  • Base case 2: Using the NGAL test could only reduce the severity of acute kidney injury (as for base case 1), not prevent it from occurring (NephroCheck effects were unchanged).

3.30 The results of base case 1 (probabilistic) are shown in table 2. Because of uncertainty about the extent of any effect of acute kidney injury on other clinical outcomes, the EAG did several scenario analyses (B, C and D). This was in addition to the base case varying which outcomes acute kidney injury occurrence (and severity) affected, and the size of this effect. Scenario C was the most pessimistic (no effect of preventing acute kidney injury, or reducing severity, on clinical outcomes) and scenario D was the most optimistic (full effect of preventing acute kidney injury, or reducing severity, on clinical outcomes).

Table 2 Cost-effectiveness results (probabilistic) for base case 1

Test

Total cost

Total QALYs

Fully incremental ICER (probability cost effective at £20,000 per QALY gained)

ICER compared with standard monitoring (probability cost effective at £20,000 per QALY gained)

BioPorto NGAL test (urine)

£22,887

6.07332

(43.5%)

Dominant

(54.6%)

BioPorto NGAL test (plasma)

£22,900

6.07332

£2,694,918

(11.1%)

Dominant

(47.6%)

Standard monitoring only

£22,901

6.07296

Dominated

(45.1%)

ARCHITECT NGAL

£22,912

6.07328

Dominated

(0.1%)

£32,131

(41.4%)

NephroCheck

£22,938

6.07332

Dominated

(0.2%)

£101,456

(31.9%)

Abbreviations: NGAL, neutrophil gelatinase-associated lipocalin; QALYs, quality-adjusted life years; ICER, incremental cost-effectiveness ratio.

3.31 In scenario C, standard care dominated all the tests (that is, they had higher costs and lower quality-adjusted life years), with all tests having 0% probability of being cost effective at a maximum acceptable incremental cost-effectiveness ratio (ICER) of £20,000 per quality-adjusted life year (QALY) gained. See table 3 for the results for scenario D.

Table 3 Cost-effectiveness results (probabilistic) for scenario D (in base case 1)

Test

Total cost

Total QALYs

Fully incremental ICER (probability cost effective at £20,000 per QALY gained)

ICER compared with standard monitoring (probability cost effective at £20,000 per QALY gained)

Standard monitoring only

£22,959

6.08383

(0.7%)

BioPorto NGAL test (urine)

£23,013

6.11006

£2,052

(40.7%)

£2,052

(99.3%)

BioPorto NGAL test (plasma)

£23,028

6.11091

£17,702

(47.5%)

£2,538

(99.1%)

ARCHITECT NGAL

£23,031

6.10799

Dominated

(1.1%)

£2,981

(98.8%)

NephroCheck

£23,065

6.11064

Dominated

(10.0%)

£3,955

(97.7%)

Abbreviations: NGAL, neutrophil gelatinase-associated lipocalin; QALYs, quality-adjusted life years; ICER, incremental cost-effectiveness ratio.

3.32 The EAG also did 16 further scenario analyses (not all are discussed in this document). Changes made to several parameters improved the cost effectiveness of the tests, so that they all dominated standard care (in a pairwise comparison):

  • Increasing long-term costs and risk of mortality in the Markov model (scenario G) for people who were admitted to intensive care while in hospital (in the decision-tree phase).

  • For people having acute kidney injury while in hospital, extending the time of increased risk of developing chronic kidney disease from 1 year to the rest of a person's life (scenario H).

  • Increasing the prevalence of acute kidney injury to 23% (from 9.2% in base case; scenario K).

    Assuming false-positive tests increased mortality (scenario M), which worsened the cost effectiveness of the tests.

3.33 In scenario Q, the EAG used alternative accuracy estimates from studies that enrolled children only. Data were only available for the ARCHITECT NGAL and the BioPorto NGAL (urine) tests. The EAG cautioned that the model was not configured for children but used parameters from an adult population. Because there were limited accuracy data for the tests in children and a lack of data for other parameters, the EAG considered the analysis to be exploratory only.

3.34 In base case 2 (probabilistic analysis), NephroCheck dominated all other tests, with an ICER of about £106,000 per QALY gained compared with standard monitoring. The probability of NephroCheck being the most cost-effective test across scenario analyses increased considerably.

3.35 In scenario T (provided in an addendum to the diagnostics assessment report), the EAG used Gocze et al. (rather than Meersch et al.) to inform estimates of the effect of a KDIGO care bundle on reducing the risk of developing acute kidney injury, or the severity of the condition if it did develop. This improved the cost-effectiveness estimates of the tests. In base case 1, all tests dominated standard monitoring. In base case 2, NephroCheck dominated all other tests and standard monitoring.