Main

Biliary tract carcinomas (BTC) are uncommon but highly fatal malignancies in the United States and Europe. BTC comprise gallbladder carcinoma (GBC) and cholangiocarcinoma (CC) (bile duct cancer), which arise from the epithelial cells of the intrahepatic and extrahepatic bile ducts. The anatomic location of CC can be described as intrahepatic, distal extrahepatic, or hilar. Lesions can be described as mass-forming, periductal or intraductal, or as mixed mass-forming and periductal (Patel, 2006).

Approximately 5000 cases of GBC and 2500 cases of CC are diagnosed annually in the USA (de Groen et al, 1999). The incidence of CC (particularly, intrahepatic CC) has been rising over the past two decades in the United States, United Kingdom, and Australia (Rajagopalan et al, 2004). Worldwide, the highest prevalence of GBC is seen in India, Pakistan, Ecuador, Israel, Mexico, Chile, Japan, and among Native American women, particularly those living in New Mexico. Mortality rates in these areas can reach 5–10 times that in the United States (Lazcano-Ponce et al, 2001; Randi et al, 2006). Worldwide, CC accounts for 3% of all gastrointestinal cancers and is the second commonest primary hepatic tumour (Khan et al, 2005). Incidence of CC is highest in Israel, Japan, among Native Americans, and in Southeast Asia, where it can reach 87 per 100 000 (Rajagopalan et al, 2004). This indicates the global significance of both GBC and CC.

The reported incidence of ‘surprise’ or ‘incidental’ GBC varies from 0.35 to 2% (Misra and Guleria, 2006). Even in patients undergoing aggressive surgery, the general outcome of patients with BTC has been disappointing. Five-year survival rates are 5-10% for GBC and 10–40% for CC (de Groen et al, 1999). Unfortunately, most biliary tract carcinomas are diagnosed at advanced stages when the tumour is unresectable. Median survival of patients with advanced disease is in the range of only a few months.

Owing to the lack of randomised phase III studies, there is no standard regimen for palliative chemotherapy of GBC and CC. Depending on the patient's general condition best supportive care, a clinical trial, 5-fluorouracil, or gemcitabine is recommended according to guidelines of the National Comprehensive Cancer Network.

The aim of this study was to extensively analyse existing data of published clinical trials, even small and non-randomised, and, if possible, identify superior regimens, which may represent a standard of care of palliative chemotherapy in this disease.

Methods

Data for this analysis were identified by searches of PubMed and references from relevant articles using the search terms ‘biliary tract neoplasms’, ‘bile duct neoplasms’, ‘cholangiocarcinoma’, and ‘gallbladder neoplasms’. Only papers reporting the results of chemotherapy trials published in English from January 1985 to July 2006 were included. Abstracts from ASCO meetings presented from 1999 to 2006 were included also. Trials of intra-arterial hepatic chemotherapy or of chemoradiotherapy were excluded.

For inclusion of a trial in the analysis, the number of patients (included, treated, or evaluable) and the response rate (RR=CR+PR) were required at least. Furthermore, tumour control rate (TCR=CR+PR+SD), time to tumour progression (TTP), and overall survival (OS) were recorded if available. In trials with more than one treatment arm, arms were analysed separately as single-arm trials. All trials included were analysed independent of the tumour classification for BTC. In addition, subgroup analysis was performed for GBC-only and CC-only for trials with sufficient data.

For subgroup analysis results of the trials were compared and tested nonparametrically (Mann–Whitney, Kruskal–Wallis). Furthermore, RR and TCR data were pooled by summarising the number of patients of the trials. For example, a pooled RR was computed as the sum of the responders of a subgroup divided by the sum of the patients of this subgroup. Ninety-five per cent confidence interval (CI) was calculated by the method of Clopper and Pearson. Proportions (e.g., pooled RRs) were compared by z-test. Nonparametric Correlation was tested according to Spearman.

Results

One hundred and four trials comprising of 112 trial arms were included in this analysis (for references see Appendix A). Only three were randomised trials, thereof two phase II (Kornek et al, 2004; Ducreux et al, 2005) and one phase III (Rao et al, 2005). No appropriate trial could be identified as published 1992 or earlier (limited to 1985). Seventeen (15%) trials were published from 1993 to 1999, whereas 95 (85%) trials were published from 2000 to July 2006. The 112 trials analysed comprise a total of 2810 patients treated.

The number of patients per trial ranged from 5 to 65 resulting in a mean number of patients per trial of 25.1 with a small range in various subgroups. The mean number of patients per trial (or per subgroup of a trial) for GBC-only and CC-only was smaller with 16.7 and 19.6, respectively.

Among all 2810 patients (25.1 per trial) analysed, 634 responders (5.7 per trial) were observed resulting in a pooled RR of 22.6% (95% CI 21.0–24.2%, n=2810). The RRs of all trials analysed sorted by the number of patients are shown in Figure 1. The RR of nine (8%) trials was above the upper limit of the 95% CI of 22.6% (Figure 1, ‘high’ RR). Twenty two (20%) trials had RRs below the lower limit of the 95% CI of 22.6% (‘low’ RR) and 81 (72%) trials had RRs in the range of the 95% CI (‘middle’ RR). The nine ‘high’ RR trials evaluated gemcitabine plus platinum compounds (n=5), fluoropyrimidines plus platinum compounds (n=3), and gemcitabine alone (n=1). Among the 22 trials with ‘low’ RRs are trials evaluating docetaxel, paclitaxel, irinotecan, gemcitabine, and fluoropyrimidines as well as new drugs (erlotinib, lapatinib, exatecan, dolastatin, lanreotide). Figure 2 shows the RR and its 95% CI of all trials analysed sorted by the RR.

Figure 1
figure 1

Response rates of all trials analysed sorted by the number of patients. Full papers are indicated by black rhombi and ASCO abstracts by empty triangles. The horizontal grey line represents the pooled response rate of all patients (22.6%). The limits of the 95% CI of the overall pooled RR are shown by doted lines.

Figure 2
figure 2

Response rate and 95% CI of all trials analysed sorted by the RR. The horizontal grey represents the pooled RR of all patients (22.6%).

Ninety six trials reported stable disease or TCR data. These 96 trials comprise 2386 patients, thereof 1368 patients achieving tumour control (14.3 patients with tumour control per trial) resulting in a pooled TCR of 57.3% (95% CI 55.3–59.3%, n=2386).

Survival

The median TTP and OS for all patients was 4.1 months (60 trials, 1543 patients) and 8.2 months (82 trials, 2197 patients), respectively. There was a highly significant correlation between RR and TCR (r=0.59, P=0.000), RR and TTP (r=0.52, P=0.000), TCR and TTP (r=0.66, P=0.000), and TTP and OS (Figure 3A). Furthermore, a significant weak correlation between RR and OS as well as TCR and OS was found (Figures 3B and C). Regression equation showed a 10% increment in RR corresponding to an 8% increase of TCR, a 0.7-month increase of TTP, and a 0.6-month increase of OS. A 10% increment in TCR corresponded to a 0.7-month increase of TTP, a 0.7-month increase of OS, whereas a 1-month increase in TTP corresponded to a 1.3-month increase of OS.

Figure 3
figure 3

(A–C) Charts showing the correlation between RR, TCR, TTP and OS.

Subgroups

The RR of trials (subgroups) of patients with GBC was higher compared with CC (number of patients 500 vs 471, pooled RR 34.4 vs 20.2%, P=0.000; median RR of trials 35.5 vs 17.7%, P=0.008). For the TCR, there was no significant difference between GBC and CC (pooled TCR 60.5 vs 59.7%, P=0.904; median RR of trials 60.0 vs 55.0%, P=0.784). In contrast, the OS was significantly longer in trials (subgroups) of patients with CC compared with GBC (median 9.3 vs 7.2 months, P=0.048).

Comparison of regimens containing one or two drugs showed significant superiority of two drug combinations compared with monotherapy concerning RR (number of patients 1499 vs 971, pooled RR 28.0 vs 15.3%, P=0.000; median RR of trials 25.8 vs 11.8%, P=0.000), TCR (pooled TCR 61.0 vs 50.4%, P=0.000; median TCR of trials 60.0 vs 48.0%, P=0.003), and TTP (median 4.4 vs 3.4 months, P=0.015) with a trend for OS (median 9.3 vs 7.5 months, P=0.061). Polychemotherapy (three or more drug regimens) resulted in a lower RR compared with two drug combinations (number of patients 340 vs 1499, pooled RR 19.1 vs 28.0%, P=0.000; median RR of trials 19.2 vs 25.8%, P=0.065) but no difference in OS (median 9.0 vs 9.3 months). Comparison of polychemotherapy with monotherapy showed higher TCR (pooled TCR 58.9 vs 50.4%, P=0.028; median TCR of trials 62.8 vs 48.0%, P=0.098), longer TTP (median 5.2 vs 3.4 months, P=0.016), and OS (median 9.0 vs 7.5 months, P=0.086) of multiple drug combinations.

Further subgroup analysis focused on cytotoxic agents. Subgroups of patients treated with regimens containing a particular drug were compared with all other patients, who were treated with regimens, that did not contain this particular drug, regardless of other drugs. Subgroups were defined by fluoropyrimidines (fluorouracil, capecitabine, tegafur), gemcitabine, platinum compounds (cisplatin, oxaliplatin, carboplatin), anthracyclines (adriamycin, epirubicin), mitomycin C, taxanes (paclitaxel, docetaxel), and irinotecan. RR and TCR were analysed by pooling all patients (bars with 95% CI) as well as all trials (boxplots) (Figure 4A–D). Results of treatment with fluoropyrimidines were very similar to the results of all fluoropyrimidine-free regimens and may represent the results of all patients and all trials analysed. In contrast, treatment with gemcitabine as well as with platinum compounds resulted consistently in highly significant superior RRs and TCRs compared with gemcitabine-free as well as to platinum-free combinations (Figure 4A–D). In contrast, beside from a trend for a longer TTP of the gemcitabine subgroup (4.6 vs 3.7 months, P=0.085), differences in survival times were small (platin vs no platin: TTP and OS 0.7 months each) and not significant.

Figure 4
figure 4

(A–D) Fluoro: fluoropyrimidines (fluorouracil, capecitabine, tegafur); Gem: gemcitabine; Platin: platinum compounds (cisplatin, oxaliplatin, carboplatin); Anthra: anthracyclines (adriamycin, epirubicin); MMC: mitomycin C; Taxan: taxanes (paclitaxel, docetaxel), Irino: irinotecan. (A) Pooled RRs (RR=CR+PR) and 95% CIs of all patients included in the analysis and of subgroups of patients, defined by treatment with regimens containing a particular drug regardless of other drugs. The height of the bars correlates with the number of patients. The P-values apply to the comparison of a subgroup, defined by a particular drug vs all other patients, which were not treated with this drug (e.g., patients treated with gemcitabine or gemcitabine-containing combinations vs patients treated with gemcitabine-free regimens). The RRs of the comparison subgroups are not shown. The vertical grey line represents the pooled RR of all patients (pts, 22.6%). (B) Boxplots of the RRs of all trials and of subgroups, defined by a particular drug. The height of the boxplots correlates with the number of trials. P-values for subgroup comparison as in Figure 3A. The vertical grey line represents the median RR of all trials (20.0%). For subgroups consisting of less than five trials, results of single trial are shown and no boxplots are provided. (C) Pooled TCRs (TCR=CR+PR+SD) and 95% CIs as in Figure 3A. The vertical grey line represents the pooled TCR of all patients (57.3%). (D) Boxplots of the TCRs as in Figure 3B. The vertical grey line represents the median TCR of all trials (59.6%).

For further investigation of the effects of fluoropyrimidines, gemcitabine, and platinum compounds, subgroups defined by treatment with these three agents and all combinations (regardless of other drugs) were analysed considering RR and TCR for all patients and all trials (Figure 5A–D). As shown in Figure 5A–B the RR of treatment with gemcitabine was not significantly higher compared with fluoropyrimidines. The addition of platinum compounds increased the RR of fluoropyrimidines as well as of gemcitabine. The increase of the RR by the addition of platinum compounds to gemcitabine was double the increase of the addition to fluoropyrimidines (17.0 vs 8.7%). The increase of the RR by the addition of gemcitabine to fluoropyrimidines was similar to the addition of platinum compounds to fluoropyrimidines.

Figure 5
figure 5

(A–D) Fluoro: n/n: neither Fluoro nor Gem; Fluoro: fluoropyrimidines (fluorouracil, capecitabine, tegafur); Gem: gemcitabine; P: platinum compounds (cisplatin, oxaliplatin, carboplatin). (A) Pooled RRs (RR=CR+PR) and 95% CIs of subgroups of patients, defined by treatment with fluoropyrimidines, gemcitabine, and platinum compounds, regardless of other drugs. The height of the bars correlates with the number of patients. The vertical grey line represents the pooled RR of the Fluoro subgroup (17.1%). The Fluoro-Gem-P subgroup consists of only eight patients and is therefore not shown. Additional P-values: Fluoro vs Gem-P: 0.000; n/n vs all other subgroups: 0.000; n/n-P vs Gem-P: 0.012. (B) Boxplots of RRs of subgroups of trials, defined by treatment with fluoropyrimidines, gemcitabine, and platinum compounds. The height of he boxplots correlates with the number of trials. The vertical grey line represents the median RR of the Fluoro subgroup (19.2%). The Fluoro-Gem-P subgroup consists of only one trial and is therefore not shown. For subgroups consisting of less than five trials, results of single trial are shown and no boxplots are provided. Additional P-values: Fluoro vs Gem-P: 0.000; Fluoro-P vs Gem: 0.033; n/n vs all other subgroups: ⩽0.002; n/n-P vs Gem-P: 0.042. (C) Pooled TCRs (TCR=CR+PR+SD) and 95% CIs of subgroups of patients as in Figure 4A. The vertical grey line represents the pooled TCR of the Fluoro subgroup (50.9%). *P-value in comparison to the Fluoro subgroup. Additional P-values: n/n vs all other subgroups: 0.000. (D) Boxplots of TCRs of subgroups of trials as in Figure 4C. The vertical grey line represents the median RR of the Fluoro subgroup (55.0%). Additional P-values: n/n vs all other subgroups: ⩽0.003.

In contrast to analysation for RR, pooled TCR of the gemcitabine subgroup was significantly higher compared with fluoropyrimidines (P=0.024, Figure 5C). The addition of platinum compounds to fluoropyrimidines and gemcitabine increased the TCRs, but the difference was significant for the pooled TCR of the fluoropyrimidine subgroup only (9.7%, P=0.006). Just as for RRs, TCR was highest in the gemcitabine–platinum combination subgroup. Compared with the fluoropyrimidines subgroup the difference was significant for the pooled TCR (P=0.000, Figure 5C) as well as for the median TCR (P=0.025, Figure 5D). There was a trend for a longer TTP in the gemcitabine–platinum combination subgroup compared with the fluoropyrimidines–platinum combination subgroup (5.5 vs 3.7 months, P=0.072, 21 trials). All other differences of TTP and OS between subgroups were small and not statistically significant.

There were only a few trials evaluating new drugs, such as erlotinib, lapatinib, dolastatin, exatecan, rebeccamycin (one trial each, monotherapy), and raltitrexed (two trials, combination with gemcitabine and cisplatin/epirubicin, respectively). For separate analysis of a new agent subgroup, separately the numbers of trials and patients are too low. Monotherapy trials of new agents are subsumed in the n/n subgroup (neither fluoropyrimidine nor gemcitabine without platinum compounds, Figure 5A–D).

Statistics

Only a minority of the trials reported statistical considerations such as sample size calculation, null and alternative hypothesis, significance level, and power. The preferred test design was the Simon two-stage design. Significance level (alpha) was mostly 0.05 (range 0.03–0.10) and the power was mostly 80% (range 80% – 95%). The null hypotheses tested ranged from an RR of p0 ⩽5 to ⩽20% with alternative hypotheses between an RR of pA⩾15 and ⩾40%. The number of trials analysed in this study, which would have been negative, if tested with RR or TCR as primary end point against different alternative hypotheses pA with different powers, are listed in Table 1A, whereas Table 1B shows the number of trials, which would have been positive if tested against different null hypotheses p0 with different significance levels (alpha).

Table 1 Number of (A) negative trials and (B) positive trials

Discussion

This pooled analysis of all published clinical trials since 1985 showed that chemotherapy with gemcitabine combined with cisplatin or oxaliplatin increases RR and TCR in GBC and CC. Our findings provide best possible evidence that this combination chemotherapy may improve survival in these diseases.

This is the first systematic review including a comprehensive statistical analysis of advanced GBC and CC. One hundred and four trials comprising 112 trial arms were included in this analysis. Pooled RR of all patients was 22.6% (95% CI 21.0–24.2%). RRs of single trials range from 0% to more than 80% and the median RR was 20.0% with a first and third quartile of 11.5 and 33.2%, respectively. In other words, one-fourth of all trials reported RRs less than or equal to 11.5% and another fourth RRs greater than or equal to 33.2%. The aims of this analysis were to identifiy superior regimens among this extreme range of RRs and thus to provide a standard of chemotherapy in advanced BTC, even based on phase II trials before the background of missing phase III trials.

The cochrane collaboration published a protocol to assess the beneficial and harmful effects of chemotherapy for gallbladder cancer (Pandey and Krishnan, 2004). Initially, the review was expected to be published in Issue 4, 2005. However, owing to the principles of the cochrane collaboration, this review should address randomised trials evaluating chemotherapy vs placebo/no chemotherapy and one type of chemotherapy vs another type of chemotherapy. As almost no randomised trials exist, this cochrane review will not be finished at all.

Guidelines for the treatment of CC have been published 2002 by the BASL (British Association for the Study of the Liver) (Khan et al, 2002). Consensus conclusion from predominately phase II trials suggest: (i) RRs is of 5-fluorouracil based and (older) single agents is 10–20%, (ii) RRs of newer single agents, such as gemcitabine, vary from 20 to 30%, (iii) RRs of recent phase II combinations vary from 20 to 40%, and (iv) gemcitabine in combination with cisplatin shows 30–50% RRs. The results of the present analysis are somewhat different, but agree in principle concerning the combination of gemcitabine with cisplatin: (i) more than a half of fluoropyrimidine-based trials reported RRs of more than 20%, (ii) more than a half of single-agent gemcitabine trials and nearly all trials of newer single agents reported RRs of less than 20%, (iii) about 40% of combination trials reported RRs of 20% or less, and (iv) the middle half of gemcitabine plus platinum combinations results in RRs between 26 and 50%, that is, one-quarter of this combination trials reported RRs of 50% or greater (Figure 5B).

Three randomised trials, thereof only one phase III, were included in the present study. The 40955 EORTC phase II trial compared high-dose 5-FU with a combination of cisplatin, 5-FU, and folinic acid (Ducreux et al, 2005). The RR was higher in the combination arm (19 vs 7%), but there was no difference concerning disease stabilisation and toxicity was increased. Based on potential drug synergy a phase II trial compared two experimental arms: MMC combined with biweekly high-dose gemcitabine vs MMC combined with capecitabine (Kornek et al, 2004). The latter combination resulted in higher RR (31 vs 20%), TTP (5.3 vs 4.2 months), and OS (9.3 vs 6.7 months). A statistical comparison of the two groups including P-values was not published. The authors conclude that MMC combined with capecitabine seems to be superior, and further evaluation seems warranted. The only phase III trial of the present analysis compared etoposide, 5-FU, and folinic acid with epirubicin, cisplatin, and 5-FU (ECF) (Rao et al, 2005). As a result of poor recruitment (n=54) the trial was underpowered to detect a significant difference in OS. The ECF regimen was associated with less toxicity, but in conclusion, based on these data it is not possible to define a reference regimen for advanced BTC.

Owing to the lack of randomised phase III trials, there is a need to define treatment standards on predominately phase II trials. For this reason, we will necessarily act on imperfect evidence. This issue was discussed recently (Djulbegovic et al, 2005). For a treatment goal of prolongation of survival by days to months, highest standards of experimental evidence (well-designed and large-scale conducted RCTs) were proposed. The increasing number of publications of chemotherapy trials in this disease emphasise the need of a new standard beyond 5-FU. Hopefully, the increasing number of publications will be followed by an increasing quality of the trials. Only a minority of the trials analysed published statistical considerations and frequently results were subsumed as being promising. For well designed phase II trials it is necessary to prospectively define a null hypothesis, an alternative hypothesis, the significance level (alpha), and the power (Tables 1A, B). By the use of the Simon MinMax two-stage design and reasonable parameters, the number of patients of a phase II trial will not exceed a total of 40 patients and 25 patients for the first stage.

The longer TTP and OS of multiple drug combinations may be due to more strict inclusion criteria of potentially more toxic regimens and may indicate selection bias. Furthermore, the proportion of patients with different localisations of their cancers may contribute to selection bias, as the present analysis showed higher RRs but shorter OS of GBC compared with CC.

Subgroup analysis concerning the three most important drugs demonstrated that gemcitabine alone is not superior to fluoropyrimidines. Platinum compounds increase the activity of both fluoropyrimidines and gemcitabine. The increase of the addition of platinum compounds to gemcitabine is greater compared with the addition to fluoropyrimidines. Synergism of cisplatin and gemcitabine has been demonstrated in cell lines and is based on direct inhibitory effect of gemcitabine on the repair of cisplatin interstrand adducts and interstrand crosslinks (van Moorsel et al, 1999; Moufarij et al, 2003).

The present analysis demonstrated gemcitabine combined with platinum compounds superior concerning both RR and TCR. As RR and TCR significantly correlate with survival times (TTP and OS), RR and TCR represent a meaningful surrogate in BTC. In patients with colorectal cancer a meta-analysis of randomised phase III trials demonstrated highly significant correlation between RR and TTP, TTP and OS, and RR and OS (Louvet et al, 2001). A 10% RR increment corresponded to a 1-month increase in TTP and a 0.9-month increase in OS, whereas a 1-month increase in TTP corresponded to a 0.7-month increase of OS in colorectal cancer patients on first-line treatment. Our findings in BTC demonstrated a 10% RR increase corresponding to a 0.7-month increase in TTP and a 0.6-month increase in OS, whereas a 1-month increase in TTP corresponded to a 1.3-month increase of OS. Consequently, the data of highest experimental evidence in colorectal cancer confirm the results of our pooled analysis of clinical and predominately phase II trials in BTC.

The evidence level of this pooled analysis is limited as discussed above and it remains unclear which platinum compound is optimal and what schedule of administration should be used. Therefore, it is essential to perform randomised trials, such as the UK National Cancer Research Institute ABC-02 trial, to evaluate the definite role of platinum compounds in combination with gemcitabine compared with gemcitabine alone. This and similar trials are needed to establish reference regimens for this disease.

In conclusion, we suggest gemcitabine combined with cisplatin or oxaliplatin as the most active, and therefore a provisional standard regimen in BTC until a new evidence-based standard is defined.