3 Evidence

3.1 The appraisal committee (section 7) considered evidence submitted by Janssen and a review of this submission by the evidence review group (ERG; section 8). See the committee papers for full details of all the evidence.

Clinical effectiveness

3.2 The clinical-effectiveness evidence presented in the company's submission came from COU‑AA‑302, a worldwide trial in which 9% of the trial population were from the UK. This randomised controlled trial compared abiraterone plus oral prednisone or prednisolone (referred to hereafter as abiraterone) with placebo plus prednisone/prednisolone (referred to hereafter as placebo) in 1,088 people; 546 people were allocated to the abiraterone arm (1,000 mg abiraterone daily plus 5 mg prednisone/prednisolone twice daily) and 542 people were allocated to placebo plus 5 mg prednisone/prednisolone twice daily. Patients in the trial stopped abiraterone or placebo at disease progression, if they had not already stopped for another reason (for example, because of adverse reactions). After disease progression, patients in the trial were followed up for up to 60 months after stopping treatment or until the patient was lost to follow-up, or withdrew consent; median follow-up was 27.1 months. The trial had co-primary end points of radiographic progression-free survival and death (overall survival).

3.3 The statistical plan for COU‑AA‑302 called for a single pre-planned analysis for radiographic progression-free survival after 378 events had occurred. This plan included 3 interim analyses and 1 final analysis for overall survival after 15%, 40%, 55% and 100% of the 773 deaths occurred that the company had determined it would need to find a difference between the 2 treatment arms. The company's statistical plan stated that, to be considered statistically significant, the p value for radiographic progression-free survival should be less than 0.01. Because of the repeated analyses of overall survival, the p values at which the results could be considered statistically significant were p<0.0001, 0.0005, 0.0034 and 0.040 respectively for each of the 4 analyses. COU‑AA‑302 was unblinded by the company between the second and third interim analyses, based on advice from the Independent Data Monitoring Committee (IDMC). The IDMC considered abiraterone to have a 'highly significant advantage' for patients, despite the p value for overall survival not meeting the criteria for statistical significance. The company's submission presented data from the second interim analysis (December 2011; when the trial was still blinded) and the third interim analysis (May 2012; after the trial was unblinded and 3 people in the placebo group had crossed over to the abiraterone group). The company's additional evidence included data from the final analysis of overall survival (May 2014); by this time point, 93 people had crossed over from placebo to abiraterone.

3.4 COU‑AA‑302 included patients with metastatic hormone-relapsed prostate cancer whose disease had progressed after androgen deprivation therapy and who had no or mild symptoms, defined by a brief pain inventory (BPI) score of 0 to 3, reflecting the worst pain on a scale of 0 to 10 in the last 24 hours (with a score of 0 or 1 being no symptoms, and 2 or 3 being mild symptoms). Patients had an Eastern Cooperative Oncology Group (ECOG) score of 0 (no symptoms) or 1 (symptoms but able to walk). COU‑AA‑302 excluded people who had an estimated life expectancy of less than 6 months, people who had comorbidities for which they took more than 5 mg of corticosteroids twice daily and people who had visceral metastases.

3.5 The median treatment duration in COU‑AA‑302 was 13.8 months in the abiraterone arm and 8.3 months in the placebo arm, based on the third interim data cut. Treatment was continued until disease progression (defined by radiographic progression or unequivocal clinical progression, for example, need for alternative cancer therapy), or if the patient had adverse reactions, started a new anticancer treatment, had medications prohibited by the trial or withdrew consent to take part in the trial. By 10 cycles (28 days per cycle), 70% of people were taking abiraterone and 30% were taking placebo. By 20 cycles, 38% of people were taking abiraterone and 21% were taking placebo. By 40 cycles, 15% of people were taking abiraterone and less than 1% were taking placebo.

3.6 By the final analysis, 67% of people in the abiraterone group and 80% of people in the placebo group had had subsequent treatment after stopping the study drug (see table 1). Forty four per cent of people in the placebo group had abiraterone, of whom 17% had abiraterone before docetaxel and 27% had it after docetaxel.

Table 1 Summary of subsequent therapies taken by patients in COU‑AA‑302 (intention-to-treat population, final analysis)

Subsequent therapy

Abiraterone group (n=546)

Placebo group (n=542)

Docetaxel

311 (57.0%)

331 (61.1%)

Cabazitaxel

100 (18.3%)

105 (19.4%)

Abiraterone

69 (12.6%)

238 (43.9%)

Sipuleucel-T

45 (8.2%)

32 (5.9%)

Radium-223

20 (3.7%)

7 (1.3%)

Enzalutamide

87 (15.9%)

54 (10.0%)

3.7 Radiographic progression-free survival was defined as time from randomisation to 1 of the following: progression by bone, CT or MRI scan or death. An independent radiologist unaware of study group assignments determined radiographic progression, but only until unblinding, after which local radiologists determined progression. The company used intention-to-treat (ITT) analyses including all patients for efficacy analyses. By May 2012 (when the company did its third interim analysis of overall survival), 292 (53.5%) of people in the abiraterone group and 352 (64.9%) of people in the placebo group had had radiographic progression. The median duration of radiographic progression-free survival was 16.5 months (95% confidence interval [CI] 13.8 to 16.8 months) in the abiraterone group and 8.2 months (95% CI 8.0 to 9.4 months) in the placebo group (hazard ratio [HR] 0.52, 95% CI 0.45 to 0.62; p<0.0001).

3.8 At the third interim analysis (when 55% of the 773 deaths on which the study was powered had occurred), 200 (36.6%) people in the abiraterone group and 234 (43.2%) people in the placebo group had died. The median overall survival in the abiraterone group was 35.3 months (95% CI 31.2 to 35.3 months) and 30.1 months (95% CI 27.3 to 34.1 months) in the placebo group (HR 0.79, 95% CI 0.66 to 0.96, p=0.0151). This p value did not meet the pre-defined value for statistical significance (p=0.0034, see section 3.3). By the final data cut-off, 354 (65%) people in the abiraterone group and 387 (71%) people in the placebo group had died. The median overall survival was 34.7 months (95% CI 32.7 to 36.8 months) in the abiraterone group and 30.3 months (95% CI 28.6 to 33.3 months) in the placebo group (HR 0.81, 95% CI 0.70 to 0.93).The company stated that adjusting for subsequent treatments would reduce the hazard ratio to 0.74 but did not describe the methods of this adjustment.

3.9 The company presented safety data from the 'safety population' in COU‑AA‑302 (that is, 1082 people who had had at least 1 dose of study medication). By the third interim analysis, more people had drug-related grade 3–4 adverse events with abiraterone than with placebo (relative risk 1.30, 95% CI 1.03 to 1.65). The most frequently reported adverse events affecting 5% or more people were fatigue, back pain, arthralgia, nausea, peripheral oedema, constipation and diarrhoea, and they were mostly grade 1 or 2. Abiraterone was associated with more grade 3 or 4 increased alanine aminotransferase than placebo (5.5% compared with 0.7%), increased aspartate aminotransferase (3.1% compared with 0.9%) and dyspnoea (breathing difficulty) (2.6% compared with 0.9%) but less hydronephrosis (retention of urine in the kidney causing swelling) (0.2% compared with 1.5%).

3.10 The health-related quality of life of patients in COU‑AA‑302 was measured using the Functional Assessment of Cancer Therapy prostate cancer subscale (FACT‑P). The company presented the results as the median time to a decrease of 10 or more points and the hazard ratio of abiraterone relative to placebo. People randomised to abiraterone showed a longer median time to a 10‑point decrease in total FACT‑P score (12.7 months, 95% CI 11.1 to 14.0) than people randomised to placebo (8.3 months, 95% CI 7.4 to 10.6), hazard ratio 0.79 (95% CI 0.67 to 0.93, p=0.0046).

3.11 The ERG had concerns about how the company used data from the FACT‑P measure in its submission; it presented the results only as time-to-event data and did not provide scores by treatment group for baseline or follow-up. The ERG commented that the company stated that the main drivers of reduced health-related quality of life reported by patients with metastatic hormone-relapsed prostate cancer are bone pain, fatigue, sexual dysfunction and interrupted social relationships. Of these, the company only reported time to an increase in pain intensity (it did not report the differences in pain intensity between the 2 treatment groups). The time to an increase in the worst pain intensity (an increase in baseline BPI score of 30% or more on 2 consecutive occasions) showed no difference between the 2 treatment groups.

Cost effectiveness

3.12 The company submitted an individual time-to-event model (discrete event simulation), tracking patients at an individual level through a sequence of treatments until they reached a maximum age of 100 years, to reflect a lifetime horizon. Costs were considered from the NHS and personal social services perspective and a 3.5% discount rate was applied. The company's base case compared 2 treatment pathways:

  • abiraterone followed by docetaxel followed by best supportive care

  • best supportive care followed by docetaxel followed by abiraterone.

    Modelled patients passed through 3 treatment phases (pre-docetaxel, on-docetaxel and post-docetaxel). In each treatment phase, patients could have active treatment or best supportive care. Once the active treatment had stopped, patients had best supportive care until starting their next treatment or until death (if the patient did not have further treatment). The model assessed whether subsequent treatments were suitable after ending an active treatment. For example, if a patient's disease had progressed, the modelled patients were monitored in a phase (lasting over 6 months in the company's base case) of pre-docetaxel best supportive care to assess whether moving on to docetaxel was suitable. Patients who were too unwell to have docetaxel (people with a Karnofsky performance status of 60% or less [approximately an ECOG performance status of 2 and above]) transitioned to best supportive care and had no further treatment until death.

3.13 Some patients in COU‑AA‑302 had cabazitaxel after docetaxel. Because cabazitaxel has a survival benefit compared with best supportive care, but is not recommended in NICE's technology appraisal on cabazitaxel for hormone-refractory metastatic prostate cancer previously treated with a docetaxel-containing regimen, the company adjusted post-docetaxel survival estimates from COU‑AA‑302 to exclude the survival benefit associated with cabazitaxel. The company made this adjustment by modelling the survival benefit of abiraterone compared with best supportive care after docetaxel. It then adjusted the survival of people who had cabazitaxel after docetaxel in the abiraterone group of COU‑AA‑302 to exclude this benefit. It did not adjust the survival estimates of the placebo group. The company carried out a scenario analysis in which it did not include a survival adjustment for cabazitaxel (see section 3.20).The company did not adjust for other active treatments that were used by some patients in COU‑AA‑302 but are not used in the NHS after abiraterone, including sipuleucel‑T (the marketing authorisation has been withdrawn).

3.14 The model used 17 prediction equations to estimate the time to starting treatment, time to stopping treatment and time to death within the treatment phases and also to estimate the disease status of the patient at different times. The company constructed the equations in 3 steps:

  • First, it decided whether a separate equation was needed for the abiraterone and best supportive care arms. For most equations, the company used the same equation for both arms and used 'treatment' as a predictor. However, for 'time from stopping abiraterone or best supportive care to death', the company used a separate equation for each treatment arm.

  • Second, for 10 of the equations, the company chose a parametric distribution with which to extrapolate the trial data over a longer period of time. It chose the curve with the best fit to the survival curves from the ITT population from COU‑AA‑302.

  • Third, it determined which baseline variables (such as age) should be included in the equation. The company included covariates that had a statistically significant association with the event/outcome of interest at a 10% level of statistical significance. The covariates differed between equations. Two further covariates that did not meet the 10% level of statistical significance were also included. The company justified this by stating that it was better to 'be inclusive', that analyses may not have reached statistical significance because of small patient numbers, and that the inclusion of these 2 covariates was clinically justified. To derive the prediction equations, the company used data from patients who had complete data for the baseline variables of interest (meaning that the sample size differed between equations). Out of the 1,088 patients in the ITT population, 902 patients (83%) had complete data for all baseline variables, so the minimum sample size was 902.

    The company reported that all of the equations had a good fit to the trial data. In response to the second appraisal consultation document, the company provided further details of how it constructed the prediction equations, and stated that it followed a pre-specified analysis plan.

3.15 The company's base case used utility values from the company-sponsored 'UK mCRPC patient utility study'. This study was an online survey of 163 men with metastatic hormone-relapsed prostate cancer in the UK all of whom, unless they had been surgically castrated, had previously taken anti-androgen tablets for more than 1 month but had since stopped. The study did not compare men taking abiraterone with men not taking abiraterone and assumed that patients had the same utility regardless of their treatment, provided that they were in the same treatment phase. Patients with metastatic hormone-relapsed prostate cancer after androgen deprivation therapy had failed were divided into the following subgroups:

  • No or mild symptoms; chemotherapy not yet clinically indicated (n=50). The mean EQ‑5D utility value was 0.83.

  • With symptoms; chemotherapy clinically indicated but not started (n=50). The mean EQ‑5D utility value was 0.63.

  • Having chemotherapy (n=17). The mean EQ‑5D utility value was 0.69.

  • After chemotherapy (n=46). The mean EQ‑5D utility value was 0.70.

    The utility value for people receiving best supportive care before death was assumed to be 0.5 based on Sandblom et al. (2004). The company did not apply a utility decrement for adverse events with different treatments.

3.16 The company also presented utility values derived from mapping FACT‑P to EQ‑5D from the data collected in COU‑AA‑302. The company used data from an observational study of patients with metastatic hormone-relapsed prostate cancer in 6 European countries to develop an algorithm to map FACT‑P data to EQ‑5D using an ordinary least squares regression model and the UK EQ‑5D tariff. The company applied this mapping algorithm to map FACT‑P data from patients in both treatment groups in the COU‑AA‑302 study to EQ‑5D utility values. From this, the company calculated a utility gain of 0.021 for people while they were taking abiraterone (either pre- or post-docetaxel).

3.17 The company grouped the use of medical resources into 'scheduled' and 'unscheduled'. Scheduled resources included disease-related tests such as imaging, diagnostic and clinical laboratory tests. To determine the frequency of scheduled appointments during the different stages of the disease pathway, the company surveyed 53 oncologists. The company applied higher resource use for patients having abiraterone than for patients on best supportive care in both the pre- and post-docetaxel setting for the first 3 months of abiraterone treatment to account for the additional monitoring as specified in the summary of product characteristics. Thereafter, the company assumed that patients incurred the same costs in both treatment arms.

3.18 The company estimated the frequency of unscheduled medical resource use (for example, adverse events while on treatment) using data from COU‑AA‑302 (for pre-docetaxel abiraterone or best supportive care) and COU‑AA‑301 (for post-docetaxel abiraterone or best supportive care). COU‑AA‑301, the key clinical trial in NICE's technology appraisal on abiraterone for castration-resistant metastatic prostate cancer previously treated with a docetaxel-containing regimen (hereafter referred to as TA259), compared abiraterone plus prednisone or prednisolone with placebo plus prednisone or prednisolone in people whose disease had progressed on or after docetaxel therapy. For people being treated with docetaxel, the company used the rates of grade 3 and 4 adverse events reported in the literature and consulted its clinical advisors on the costs of treating these events. The company also applied a one‑off cost of £3,598 to account for palliative care in the last 3 months of the best supportive care phase.

3.19 The company's model used the following costs:

  • £2,300.00 per 30 days for abiraterone (based on a 1,000 mg daily dose). To reflect the new complex patient access scheme (PAS), the cost of abiraterone was incurred only for the first 10 months of treatment.

  • £1,240.00 per month for docetaxel (based on 1 dose every 3 weeks for a patient of average weight based on the patient characteristics in COU‑AA‑302). The company calculated the cost of docetaxel by applying a 20% discount to the British national formulary (BNF; edition 67) price of £1,069.50, resulting in a cost of £855.60 per 160‑mg vial. In a sensitivity analysis, the company used the electronic medicines information tool (eMIT) price for docetaxel. An additional administration cost of £214.00 was applied for docetaxel.

    The company estimated that some patients would not take the full licensed dose of abiraterone ('non-adherence') and so reduced the cost of abiraterone prescribed before docetaxel by 2%. The company's base-case model did not include the training or administration costs associated with implementing the new complex PAS. It estimated that these costs would be £388 per year per hospital or homecare provider. In response to the second appraisal consultation document, the company submitted an analysis that included the administration costs of the complex PAS for abiraterone used both before and after docetaxel; this increased its incremental cost-effectiveness ratio (ICER) from £28,563 to £28,717 per quality-adjusted life year (QALY) gained.

3.20 In the company's deterministic base-case analysis, abiraterone was associated with an incremental cost of £16,055, 0.62 life years gained and 0.56 QALYs gained compared with best supportive care. The estimated deterministic ICER was £28,563 per QALY gained. The company did not present a probabilistic ICER but presented the results of a probabilistic sensitivity analysis and cost-effectiveness acceptability curves. A scenario in which the survival estimate in the abiraterone arm was not adjusted for cabazitaxel use (see section 3.13) resulted in an ICER of £27,738 per QALY gained.

3.21 The company carried out a scenario analysis in which it replaced the log-logistic distribution for the equation 'time from starting to stopping first treatment with abiraterone or BSC' with a Weibull distribution. Using the Weibull distribution increased the ICER to £35,789 per QALY gained. In response to the second appraisal consultation document, the company submitted data showing the duration of abiraterone treatment in clinical practice in the UK and US. The UK data are commercial-in-confidence and cannot be reported here. The US data came from the Optum database of healthcare insurance claims, which contained records for 8,326 people who had abiraterone and had not had docetaxel. The US data showed that 1,171 (14%) of people were still taking abiraterone after 53 months (4.4 years). The company stated that these data support its choice of a log-logistic curve for predicting time on first treatment.

3.22 In response to the second appraisal consultation document, the company submitted a scenario analysis using a 'piecewise' method to predict time on first treatment. For abiraterone, the company used a log-logistic distribution for the first 2.5 years and a Weibull curve thereafter. For best supportive care, the company used a log-logistic distribution for the first 2.5 years and then it assumed that all patients stopped having best supportive care. This scenario increased the company's base-case ICER from £28,563 to £32,849 per QALY gained. The ERG stated that it was arbitrary to assume that all patients stopped best supportive care after 1,000 days.

3.23 The ERG considered that it was appropriate for the company to develop a new model, but it did not think that using a discrete event stimulation model was the simplest or most transparent approach because it was more complicated to assess face validity and internal validity than, for example, a Markov model of health states.

3.24 The ERG stated that the model structure lacked face validity because it did not allow the possibility of dying during abiraterone treatment, or during best supportive care before docetaxel treatment, or during post-docetaxel treatments. It noted that, in COU‑AA‑302, 5 patients had died before the end of abiraterone or placebo treatment.

3.25 The ERG commented that the model population was not the same as the entire COU‑AA‑302 population because the model equations used data from patients who had complete data for the baseline variables of interest. Out of the 1,088 patients in the ITT population, 902 patients (83%) had complete data for all baseline variables (the 'full covariate subgroup'), so the minimum sample size for deriving prediction equations was 902. The ERG commented that, for the abiraterone group, time on first treatment was longer in the full covariate subgroup than in the ITT population. In its response to the second appraisal consultation document, the company provided the characteristics of the full covariate subgroup. It also stated that there was no statistically significant difference between the ITT population and the full covariate subgroup in baseline characteristics, time on first treatment or overall survival.

3.26 The ERG agreed with the company that using the EQ‑5D utility values from the UK mCRPC utility study was the preferred approach given the uncertainty about the mapped utility values based on the FACT‑P responses from COU‑AA‑302. The ERG considered whether the utility value for the pre-docetaxel treatment phase would be expected to be different between treatment arms. In the base case, the ERG noted that the company had applied a utility increment for people taking abiraterone (see section 3.15), and that the company stated that this was based on the benefits experienced with abiraterone compared with best supportive care with respect to pain and fatigue. The ERG did not agree with this approach because, in COU‑AA‑302, abiraterone led to significantly more adverse events (both overall and grade 3–4) than best supportive care. The ERG considered it more appropriate to incorporate and apply separate utility decrements for each separate adverse event in the model.

3.27 The ERG noted that the company used a different utility increment for patients taking abiraterone (before or after docetaxel) in the current appraisal (0.021) than it did for patients taking abiraterone after docetaxel in its previous submission for TA259 (0.046). The ERG also preferred to apply a utility decrement to the baseline utility values for people not taking abiraterone, rather than adding on an increment to baseline utility values for people taking abiraterone.

3.28 The ERG stated that its preferred base case would:

  • include a utility increment of 0.046 applied in the post-docetaxel phase for patients having abiraterone

  • derive prediction equations for time to stopping treatment, time to starting treatment and time to death from the full ITT population in COU‑AA‑302, accounting for treatment effect only, and not including other risk predictors based on baseline characteristics

  • not adjust the cost of abiraterone for non-adherence because the NHS would not recover the cost of dispensed medication for people who do not take the full course of treatment.

    Applying the first assumption (post-docetaxel utility increment if having abiraterone) to the company's base case resulted in an ICER of £29,498 per QALY gained. Applying new risk equations based on the ITT population resulted in an ICER of £35,191 per QALY gained. Removing the cost adjustment for non-adherence to abiraterone resulted in an ICER of £29,307 per QALY gained. The combination of these 3 scenarios (the ERG's exploratory base case) resulted in an ICER of £35,486 per QALY gained.

3.29 The ERG noted that the post-docetaxel survival in the current model was much lower than at the same point in the care pathway in TA259, which had appraised the cost effectiveness of abiraterone taken after docetaxel compared with best supportive care. In a sensitivity analysis, the ERG modified the prediction equation so that the post-docetaxel survival was similar to that estimated in TA259. This increased the 'ERG exploratory base case' ICER from £35,486 to £39,722 per QALY gained.

3.30 The ERG did 3 additional sensitivity analyses:

  • The ERG stated that it was unclear how the company had adjusted for treatment with cabazitaxel in COU‑AA‑302 in the model (see section 3.13). Therefore, it tested a scenario without adjusting for cabazitaxel use. This decreased the ICER from the ERG's exploratory base-case estimate of £35,486 to £34,771 per QALY gained.

  • The ERG stated that a log-logistic model, as used for 2 prediction equations in the company's base case, is often criticised for its long tail, which may result in an unrealistic survival benefit. The ERG therefore used a Weibull model to extrapolate the data for time from starting to stopping treatment with abiraterone or best supportive care, and time from starting treatment with docetaxel to death while on docetaxel treatment. This increased the ICER to £55,616 per QALY gained.

  • The ERG stated that its criticisms of log-logistic models also apply to log-normal models. The ERG therefore used a Weibull model rather than a log-normal distribution to extrapolate time from stopping first treatment to starting docetaxel. This decreased the ICER from the ERG's exploratory base-case estimate from £35,486 to £34,928 per QALY gained.

3.31 Most analyses from the company and the ERG applied the new complex PAS to abiraterone used before and after docetaxel. Following a request from NICE, the ERG provided an additional analysis that applied the new complex PAS to abiraterone used before docetaxel and applied the existing simple PAS to abiraterone used after docetaxel (in the best supportive care arm of the model). The new scenario increased the ERG's base-case ICER from £35,486 to £37,859 per QALY gained. The ERG's scenario using Weibull rather than log-logistic curves for 2 prediction equations, and also applying the existing simple PAS to abiraterone used after docetaxel, resulted in an ICER of £59,567 per QALY gained.

Estimates of life expectancy for patients for whom abiraterone is indicated

3.32 In response to the first appraisal consultation document, the company presented survival data from 2 studies that it had not included in its original submission. One was a systematic literature review by Kirby et al. (2011) stating that median survival was between 9 months and 30 months for patients with castrate-resistant prostate cancer and between 9 months and 13 months for people with metastatic disease. The other study was an observational analysis of a trial population (Hussain et al. 2006) documenting an association between prostate-specific androgen levels and mortality in people with prostate cancer. The company reiterated that the 2012 European Association of Urology guidelines stated a mean survival of between 9 months and 27 months for metastatic disease.