4 Committee discussion

4.1 The patient expert explained that having Crohn's disease can substantially affect the quality of life of the person and their family. Currently the extent of inflammation is monitored using endoscopic imaging and faecal calprotectin blood tests, but they do not predict disease progression or the likelihood of needing surgery in the future. People may not want invasive monitoring using colonoscopy because it is stressful to prepare for, has unpleasant side effects and may aggravate symptoms. The patient expert suggested that a test to predict long-term disease course could help give people a better understanding and acceptance of their condition, and make planning review appointments more efficient.

Studies on the prognostic ability of the tests are heterogenous and have small sample sizes

4.2 The reviewed studies on prognostic ability had mixed populations, including people with ulcerative colitis. The numbers of people with Crohn's disease in each study was small, given the prevalence of the condition in the wider population. The committee noted that the small sample sizes could mean that the reviewed studies were underpowered to produce robust estimates of the prognostic ability of the tests. The committee also noted that there are other predictive studies for Crohn's disease with larger populations, showing that larger sample sizes are possible. The committee concluded that the heterogeneity in the population and the population size added substantial uncertainty to the interpretation of study results.

There is no standard definition of a high or low risk of severe disease

4.3 The reviewed studies used different measures to define a person as being at high or low risk of severe Crohn's disease. IBDX studies used poor outcomes, such as surgery and complications, as a proxy for severe disease (see sections 3.5 and 3.6), whereas the PredictSURE IBD study used the need for multiple treatment escalations (see section 3.7). This inconsistency is a source of additional uncertainty.

The accuracy of PredictSURE IBD and IBDX in predicting severe disease is uncertain

4.4 Little data were identified on the prognostic accuracy of the tests. Sensitivity, specificity and negative predictive value were only reported for the PredictSURE IBD test, and in only 1 study (Biasci 2019). The clinical expert said that at the moment severe disease may be predicted by known risk factors such as age and smoking status. But there is no consensus on, or algorithm for, how these risk factors should be combined, and their predictive value is limited. The clinical expert also said that, based on the findings of the Biasci study, the PredictSURE IBD test appears to perform better than risk prediction based on clinical features or endoscopic findings, and therefore has the potential to be a useful test. The committee noted that it would help to understand if the tests can give a more accurate prognosis when used alongside clinical features rather than as a substitute. The committee concluded that overall, the evidence on the prognostic accuracy of PredictSURE IBD and IBDX is weak, and encouraged further research on their accuracy used alongside clinical features (see section 5).

There is little evidence on how the tests affect treatment decisions

4.5 The proposed value of the tests is to categorise people with Crohn's disease according to their risk of severe disease. People predicted to have severe disease could have top-down treatment, which may help control the disease early, leading to better outcomes like fewer flare-ups, and prevent bowel damage and limit the need for surgery. The committee noted that currently there was no evidence on how the tests can help with decisions about personalised treatment plans. It concluded that it would help to have research on how the tests affect treatment decisions (see section 5). PROFILE, a randomised, multicentre, biomarker-stratified, open-label study is ongoing in the UK with results expected in 2022. This trial uses PredictSURE IBD to assign people to top-down or step-up treatment, and may help address this evidence gap.

There is no evidence on how the tests affect clinical outcomes

4.6 The committee considered that there was no evidence to show that using the prognostic tests to identify people at high risk of severe disease and help guide treatment improves clinical outcomes. The committee encouraged studies assessing how the tests affect clinical outcomes (see section 5).

Cost effectiveness

Drug treatment for people with Crohn's disease varies across the NHS

4.7 The committee noted that the treatment sequences modelled by the external assessment group (EAG) may not reflect treatment in the NHS. The EAG said that in its model 30% of people who had a tumour necrosis factor (TNF)-alpha inhibitor, and 20% of people who had a biological treatment that was not an anti-TNF, also had an immunomodulator. This is because there is evidence to show that combination treatment reduces the chances of losing response to biologics (immunogenicity). However, clinical experts said there is no consensus on using monotherapy or combination therapy, and that it varies in clinical practice. There is also the option of having immunomodulators after biologics as part of treatment de-escalation in the top-down strategy (as modelled by the company – see section 4.11). Top-down treatment is not widely used in the NHS and so it is uncertain what the treatment pathway would look like. The company model included an immunomodulator step after biologics but the EAG base case did not. The EAG explored this as a scenario analysis (see section 3.30). In addition, the biologics modelled as second and third line can also be used as first line. The committee concluded that variation in clinical practice created an added level of uncertainty around the model structure.

It's not certain if top-down treatment has clinical benefits over step-up treatment

4.8 The committee heard from clinical experts that early rather than late treatment with biologics could improve outcomes for people likely to have more severe disease. The EAG noted that the evidence on the effectiveness of top-down compared with step-up treatment in the model was from the D'Haens study. This showed that people who had top-down treatment had a longer time to relapse than people who had step-up treatment. The hazard function (based on the assumption that time to relapse is a proxy for time to next treatment escalation) derived from D'Haens was applied only to the first step of the model (the anti-TNF compared with immunomodulator step). Later treatment steps in both the top-down and step-up strategies were assumed to have the same time to treatment escalation as anti-TNF in the top-down arm. This assumption was made because there was no evidence either way. The top-down treatment sequence modelled in D'Haens differed from the one described by the clinical experts because people did not carry on having maintenance treatment with infliximab but were allowed infliximab as needed (see section 3.19). This might have underestimated the benefits of top-down treatment. The EAG said that in the long term top down may not have an advantage over step up because the 10-year follow-up study of D'Haens (Hoekman 2018) showed no difference in hospitalisation, surgery and endoscopic remission between both strategies. The clinical experts considered that early treatment with biologics does make a difference, but good-quality evidence generalisable to the NHS to support this is limited. Registry data could have been useful. The committee concluded that more evidence is needed on the effectiveness of top-down compared with step-up strategies. This is because if there is no evidence of benefit, there is no clinical rationale for identifying people at high risk of severe disease and treating them using a top-down strategy.

Because of the lack of data and the need for many assumptions, the model results are not certain

4.9 The committee noted that interpreting the modelling was difficult because of the very weak data feeding into it. There were limited data on the prognostic accuracy of the tests (see section 4.4), on the effectiveness of a top-down strategy compared with a step-up strategy (see section 4.8), and no information from studies on how these 2 steps would combine to affect clinical outcomes. The committee heard that the EAG had to make many assumptions to be able to link the evidence in the model. There was great variation in the results of the model. Base case results (see section 3.29) showed that standard care dominated the top-down strategy. This dominance was sustained in the majority of the scenario analyses, and the probabilistic sensitivity analysis scatter plot showed the top-down strategy was mostly more costly and less effective than standard care. Some of the scenario analyses produced incremental cost-effectiveness ratios (ICERs) in favour of the top-down strategy, although these were far higher than the range that NICE usually considers to be cost effective (see sections 3.30 to 3.34). One scenario (see section 3.35) combining multiple assumptions produced an ICER in the acceptable range in favour of the top-down strategy. Because of the limited data and assumptions that needed to be made, the cost effectiveness of the tests is highly uncertain.

Assuming that IBDX and PredictSURE IBD have the same prognostic ability is not appropriate

4.10 Only data on the prognostic ability of PredictSURE IBD were included in the base case. The EAG included IBDX in an exploratory analysis that assumed that the ability of IBDX to identify people at high or low risk was the same as PredictSURE IBD. The committee heard that the tools identify different markers and require different test samples. The committee also noted that there was 1 abstract (Lyons 2020), which compared both tools and showed that PredictSURE IBD predicted a shorter time to treatment escalation in people classed as high risk. IBDX did not predict a difference in time to treatment escalation between people positive for 2 or more markers and those positive for only 1 marker (see section 3.8). The committee concluded it was not appropriate to assume the tests had the same prognostic accuracy, and that more evidence is needed (see section 5).

Some of the key assumptions in the model are drivers of the model results

4.11 The model results were mainly driven by the assumption that step-up treatment has benefits over top-down treatment because of the proportion of people who respond to immunomodulators in the step-up arm. The EAG noted that having the immunomodulator step at the start of the step-up strategy meant that, for some high-risk people, their condition could respond to less costly immunomodulators. Another assumption that drove the model results was that after 2 years in remission with biologics, a proportion of people have mucosal healing and do not need more treatment escalations. A scenario in which some low-risk people were assumed to be misdiagnosed as high risk (see section 3.34) showed QALYs being gained in favour of PredictSURE IBD because they did not need any more treatment escalation.

The EAG's model results are different from the company's model and the most relevant published economic model

4.12 The base case probabilistic and deterministic results of the EAG's model produced QALYs in favour of standard care. This suggests that a no testing strategy with step-up treatment is better for people at high risk of severe Crohn's disease than top-down treatment using the prognostic tool. This result was not consistent with the company's model and the model reported by Marchetti (2013), both of which reported that a top-down strategy is associated with more QALYs. The EAG noted that the difference between its model and the company's was that the treatment sequence modelled by the company had an immunomodulator as a last treatment step in the top-down arm. This was not modelled in the EAG's base case but as a scenario analysis. This scenario produced an ICER in favour of top-down treatment that was much higher than what NICE normally considers a cost-effective use of NHS resources (see section 3.30). The company's model also assumed a constant relative treatment effect, whereas the EAG's model assumed a diminishing relative treatment effect (see further details in the addendum to the diagnostic assessment report). Marchetti modelled a different treatment sequence (see section 3.11) to the EAG's, and a different time horizon – 5 years compared with the EAG's 65 years. The EAG did not explore changing the time horizon so it was not clear if the time horizon influenced the different results. The difference in the results was likely due to the uncertainties in the top-down treatment pathway and the effectiveness of top-down compared with step-up strategies.

Evidence from a different starting cohort that includes children and teenagers would be useful

4.13 The committee heard that the average age in the EAG's model was 35. It considered that the model might not reflect other age groups that are first diagnosed with Crohn's, for example, one peak is in teenagers and another is at around 60. A clinical expert noted that the treatment pathway for children or teenagers would be different from adults because children often follow a more severe disease course and may need enteral nutrition. The committee heard that modelling this population may require an entirely new model rather than an adaptation of the model built by the EAG for the adult population.

Modelling adverse events or varying the cost of surgery may not have a huge impact on the results

4.14 The EAG did not model adverse events, to keep the model simple. It predicted that if it had modelled adverse events top-down treatment would have been more dominated. The committee thought the cost of surgery might have been underestimated and that its impact on the model results was not clear. The EAG noted that, although it did not vary the costs of surgery, the number of surgical events modelled was very small, so it did not anticipate a significant difference in results.

Multiple uncertainties make it difficult to determine cost effectiveness so the tests cannot be recommended for routine use in the NHS

4.15 Lack of evidence on the prognostic ability, the effect on treatment decisions and clinical outcomes (see sections 4.4 to 4.6) of the PredictSURE IBD and IBDX tests makes it difficult to assess the cost effectiveness of the tests for assigning people to top-down or step-up treatment. The base case model was based on data for PredictSURE IBD. IBDX was only included in an exploratory scenario analysis (see section 3.13). Issues in the modelling (see sections 4.7 to 4.10) relate to:

the effectiveness of the top-down strategy
the assumed equivalence in prognostic accuracy of both tools
how appropriate the sequence modelled is to all people with Crohn's disease.

These, and the many assumptions needed to link the data because of limited evidence, make the cost effectiveness of the tests to the NHS uncertain. In the absence of the evidence the committee would have liked to see (see section 5), changes to the model at this time would not change the overall conclusion.

How are you taking part in this consultation?

PredictSURE IBD and IBDX to guide treatment of Crohn's disease

Clinical effectiveness

Knowing the likely course of the disease may help people with Crohn's disease and the NHS