Abstract
Background: Local relapse (LR) after breast-conserving therapy (BCT) is not accurately predicted by histological/clinical factors. Gene expression profiling was used here to discover LR-associated transcriptional alterations. Materials and Methods: Gene expression profiling was carried out of 81 early breast carcinomas obtained from 30 patients who developed a LR (LR+) after BCT and 51 who did not (LR–). LR+ and LR– samples were matched for known LR risk features. Results: LR was not associated with a given molecular subtype. Supervised analysis identified a 212-gene signature, which was not validated in independent tumors. No gene set or biological pathway was differentially expressed between LR+ and LR– groups. Twelve published prognostic expression signatures failed to distinguish these groups of carcinomas. The gene expression profiles of 9 cases of LR and the corresponding primary tumors were very similar despite the delivery of radiotherapy. Conclusion: In this series, the onset of LR was not predicted by gene expression alterations.
Breast cancer is the most frequent and lethal female cancer worldwide. In developed countries, most cases are diagnosed at an early stage (stages I and II). For these patients, locoregional treatment combines breast-conserving surgery (lumpectomy) followed by radiation therapy (whole breast irradiation and boost to the tumor bed). It is clearly established that the survival benefit of this breast-conserving therapy (BCT) is similar to that of mastectomy (1, 2). Nevertheless, local recurrences (LR) are more frequent after BCT than after mastectomy, with a local relapse rate of around 1% per year (1, 2). Local relapse leads not only to extensive surgical treatment (secondary mastectomy), with its esthetic and psychological consequences, but also increases the risk of distant metastatic relapse and reduces survival (3, 4). Thus, efforts to reduce the local relapse rate are crucial and should lead to a reduction of breast cancer mortality.
One solution is to better identify patients at high risk of local relapse so that they can be offered more aggressive local treatment, such as mastectomy or higher radiation doses. Common histological/clinical risk factors of local relapse after BCT include involvement of surgical margins by carcinoma, young age, multifocal disease, lymphovascular invasion, and lack of radiation boost (5-11). The use of adjuvant systemic therapy also reduces the risk (6). But these predictive features are imperfect. Local relapse cannot be explained by incomplete surgical resection alone since most patients with such resection never relapse after BCT, and conversely, local relapse occurs in patients treated with complete surgery (no tumor residue and margins greater than 5 mm) and irradiation including a boost to the tumor bed. Moreover, the impact of age suggests the existence of additional biological mechanisms that influence the onset of relapse.
For many years, the potential of molecular analyses for identifying features associated with the risk of local relapse has been suggested (12-17). However, today the results of such studies remain inconclusive and without clinical application. During the last decade, gene expression profiling (for review, see (18)) has been successfully used for identifying relevant subtypes of breast cancer (19-22), and determining signatures predictive of metastatic risk (23-28) or the response to systemic therapy (29-33). More recently, this approach has been applied to determine expression profiles predictive of local relapse after BCT (34-37); to date however, only four studies have been reported by two teams and the results are rather disappointing. van de Vijver's team failed in identifying a predictive signature in its two first studies (34, 37). In the third study, they reported a 111-gene signature, which lost its prognostic value in multivariate analysis (35). A Swedish team separately analyzed estrogen receptor-positive (ER+) and ER– samples (36)). They found an 81-gene predictive signature, that they did not try to validate, in ER+ tumors. Unfortunately, this signature lost its predictive value when tested by Kreike et al. in their recent study (35). In all these studies, the two patient groups were not matched for classical histological/clinical features predictive of local relapse.
Here, we profiled a unicentric series of 81 early breast carcinomas treated by BCT using whole-genome DNA microarrays. In contrast with the aforementioned studies and to avoid identifying a local relapse predictor associated with a classical risk factor, we matched the cases with relapse and those without for the known risk features.
Materials and Methods
Breast cancer patients and samples. All samples were selected from our institutional breast cancer database from patients with non-metastatic, non-inflammatory invasive breast adenocarcinoma treated by BCT at the Paoli-Calmettes Institute (Marseille, France) between 1988 and 2005. Additional selection criteria included: available frozen tumor sample, no prior malignancy (notably in the contralateral breast), no prior or simultaneous metastasis, clinical tumor size ≤5 cm, and available histological/clinical data, including follow-up. A total of 30 patients who developed a local recurrence (same localization as the primary tumor in the breast) near the initial tumor bed with similar pathological characteristics were selected (thereafter designated LR+ group). We then chose 51 patients who did not develop any local recurrence (LR– group) as controls: they were matched with the 30 LR+ cases based on traditional histological/clinical features (age, margin status, pathological axillary lymph node involvement, tumor size, and Scarff-Bloom-Richardson (SBR) grade, vascular invasion, ER status, radiation boost, and systemic treatment). Histological/clinical data for the 81 samples are summarized in Table I. The median age of patients was 55 years. Most tumors (80%) were invasive ductal carcinomas, 17% had margins microscopically involved by an in situ or invasive component, and 27% had pathologically involved axillary lymph nodes. BCT was defined by a breast-conserving surgery with surgical lymph node evaluation (lymph node dissection or sentinel lymph node procedure) followed by whole breast irradiation of 46 to 50 Gy. A radiation boost to the tumor bed was delivered in 76 patients (median dose of 14 Gy applied by external beam radiations). Most of the patients received a systemic adjuvant treatment according to the standard guidelines at the time of diagnosis: chemotherapy for 46 cases and hormone therapy for 33. For nine patients who experienced a local relapse, the local relapse sample was also available for profiling. The study was approved by our Institutional Review Board, and all patients gave signed informed consent.
Gene expression profiling. Total RNA was extracted from frozen samples by using guanidium isothiocyanate and cesium chloride gradient, as previously described (38). Its integrity was confirmed by analysis (Bioanalyzer; Agilent, Palo Alto, CA, USA).
Gene expression analyses were done carried out with Affymetrix U133 Plus 2.0 human DNA chips (www.Affymetrix.com). Preparation of cRNA, hybridization, washing and detection were as previously described (39). Scanning was with Affymetrix GeneArray scanner and quantification with Affymetrix GCOS software. Data were analyzed by the Robust Multichip Average method in R using Bioconductor and associated packages (40). Robust Multichip Average was used for background adjustment, quantile normalization, and summarization of 11 oligonucleotides per gene.
Public expression dataset. To test the performance of our gene expression signature in independent breast cancer samples, we analyzed publicly available clinical and expression data from two sets of 165 (35) and 161 (37) patients. Data were collected from Array Express (series E-NCMF-24) (35) or the author's website (http://microarray-pubs.stanford.edu/wound_local_recurrence/Local_Recurrence_explore.htm). Genes of our signature were mapped from Human Genome Oligo Set version 3.0 arrays (containing 34,580 probes and used by Dutch laboratories) to U133 Plus 2.0 using the `best match' provided by Affymetrix (http://www.netaffx.com). Expression data for the matching U133 Plus 2.0 probe sets were then submitted to hierarchical clustering.
Gene expression data analysis. Unsupervised analyses were carried out by hierarchical clustering. Data were log2-transformed and submitted to the Cluster program (41) using data median-centered on probe sets, uncentered correlation as similarity metrics, and centroid linkage clustering. Results were displayed using the TreeView program (41).
Molecular subtypes of breast carcinoma (luminal A, luminal B, basal, ERBB2-overexpressing, claudin-low) were defined using the Single Sample Predictor (SSP) (21) as previously described (26), and the 9-cell line claudin-low predictor (42). For non-claudin-low cases, the subtype was that defined by the SSP classifier.
We assessed the capacity of 12 published prognostic gene expression signatures to distinguish LR+ from LR– patients: the 70-gene signature (23), the genomic grade index (97 genes) (24), the 76-gene signature (25), the recurrence score (21 genes) (43), the hypoxia signature (163 genes) (44), the chromosomal instability signature (70 genes) (45), a proliferation-related signature (112 genes) (46), the wound-response signature (442 genes) (47), the radioresistance signature (52 genes) (48), the invasiveness gene signature (186 genes) (49), and two gene expression signatures related to local recurrence (35,36) containing 104 and 81 genes, respectively. All these signatures have been established to predict outcome or response to treatment in breast cancer. We matched each signature with the probe sets represented on our microarrays. We then classified our samples with the resulting common probe sets according to the methodology described in each corresponding study or by hierarchical clustering when this was not mentioned.
Finally, we applied supervised analyzes to identify genes and pathways associated with the occurrence of local relapse in our series. To identify and rank genes discriminating the two subgroups of samples (with local relapse vs. without), analysis was applied to the probe sets with expression values greater than the background using the signal-to-noise ratio (SNR). The SNR was calculated for each gene (50) as SNR=(M1–M2)/(S1+S2), where M1 and S1, respectively, represent the mean and SD of expression levels of the gene in group 1, and M2 and S2 in group 2. Confidence levels were estimated by 100 random permutations of samples as previously described (51). A `leave-one-out cross-validation' (LOOCV) procedure (50) was applied to estimate the predictive accuracy of the signature and the validity of our supervised analysis. To help in the interpretation, the lists of discriminator genes were interrogated using the Ingenuity Pathway Analysis software (version 5.5.1-1002; Ingenuity Systems, Redwood City, CA, USA). We also examined differential expression of pre-defined gene sets using Gene Set Enrichment Analysis (GSEA) (52, 53). GSEA determines whether members of a set of genes that correspond to a given biological pathway tend to occur towards the top or the bottom of a rank-ordered gene list, ordered here by differential expression between LR+ and LR– tumors. To test a broad range of biological processes, we tested the C2 Gene Set Collection version 2 (MSigDB, http://www.broadinstitute.org) containing 1,892 genes. GSEA was computed with 1,000 permutations and SNR as metrics for ranking genes. The same procedure was applied to the 165 tumors from a recently published study (35).
Statistical analysis. Correlations between sample groups and histological/clinical variables were calculated with the Fisher's exact test or Chi-square test when appropriate. Follow-up was measured from the date of diagnosis to the date of last contact with live patients. Time to local recurrence was calculated from the date of diagnosis until the date of first local relapse. Survival was defined using the Kaplan-Meier method and compared between groups with the log-rank test. All statistical tests were two-sided at the 5% level of significance. All statistical analyzes were performed using R software version 2.8.1. This article is written in accordance with reporting recommendations for tumor marker prognostic studies (REMARK) criteria (54).
Results
Patient characteristics. We profiled a series of 81 primary tumors using whole-genome DNA microarrays. Histological/clinical data are shown in Table I. Thirty tumors were from patients who developed a local relapse as first event (LR+ group), and 51 were from patients who did not develop any local relapse (LR– group). To avoid detecting expression differences related to differences in histological/clinical features, we matched the two groups of patients with respect to pathological and clinical criteria usually associated with the occurrence of local relapse. As shown in Table I, the two groups were balanced regarding age, tumor size, lymph node involvement, immunohistochemical (IHC) ER status, pathological grade, pathological subtype, margin status, vascular invasion, radiation boost, and adjuvant systemic therapy. The median time to local relapse for the 30 LR+ cases was 32 months after diagnosis, whereas the median follow-up for 51 control cases was 112 months.
Whole-genome expression profiling of breast cancer and local relapse. Whole-genome unsupervised hierarchical clustering of 81 samples identified two main clusters of tumors, I and II (Figure 1), which as expected, were strongly associated with the intrinsic features of breast cancer, ER status and grade. Ninety-six percent of cluster I samples were ER-positive, vs. only 20% of cluster II tumors (p=7.89E–03, Fisher's exact test), and 36% were grade 1 vs. 9% in cluster II (p=7.23E–13, Fisher's exact test). By contrast, no significant correlation existed with patients' age, margin status, pN, and pT, and with the incidence of local relapse, which concerned 34 and 41% of patients in clusters I and II, respectively (p=0.64, Fisher's exact test). Clusters were associated with the molecular subtypes, with 90% of luminal samples being in cluster I and 100% of basal, claudin-low and ERBB2-overexpressing tumors being in cluster II. Coherent gene clusters, related to specific cell types, biological pathways or chromosomal locations, were associated with the molecular subtypes. Some of them are highlighted in Figure 1B. Luminal samples displayed a low expression of the immune and basal gene clusters, and a strong expression of the luminal/ER gene cluster. Luminal B tumors displayed a stronger expression of the proliferation gene cluster than luminal A tumors. Basal and claudin-low tumors overexpressed immune, proliferation, and basal genes.
We then searched for a correlation between the molecular subtypes and the occurrence of local relapse. In the LR+ group, we identified 14 (47%) luminal A tumors, 5 (17%) luminal B, 3 (10%) basal, 4 (13%) claudin-low, and 4 (13%) ERBB2-overexpressing. In the control LR– group, 27 tumors were luminal A (52%), 9 (18%) luminal B, 9 (18%) basal, 2 (4%) claudin-low tumors, and 4 (8%) ERBB2-overexpressing. As expected given the previous matching of the two groups regarding the histological/clinical features, the correlation between the molecular subtypes and the occurrence of local relapse was not significant (p=0.46, Fisher's exact test).
Finally, we assessed the capacity of 10 gene expression signatures – previously published as being associated with survival or therapeutic response in breast cancer – to differentiate the LR+ cases from the LR– cases. Our samples were classified according to each expression signature into two classes: high risk and low risk. As shown in Table II, no classifier was associated with the occurrence of local relapse in our series. We also tested the two signatures recently reported as being predictive of local relapse (35, 36): in our dataset: their predictive value was not confirmed (see Table II).
Supervised analyses for prediction of local relapse. Both global unsupervised approach and the previously reported molecular subtypes and other prognostic signatures were unable to predict the risk of local relapse. Thus, we applied supervised analysis to search for genes or biological pathways discriminating LR+ and LR– samples.
Firstly, at the gene level using SNR combined with 100 permutation tests and a significance threshold producing fewer than 0.2% of false positives, we found 212 out of the 28,325 tested probe sets as being differentially expressed between the two groups (Supplementary Table I): 134 were up-regulated and 78 were down-regulated in LR+ samples. These 212 probe sets represented 172 characterized genes (166 different genes) and 40 ESTs (expressed sequence tags). Using LOOCV as internal validation method, we recorded a relatively low predictive accuracy (58%), suggesting low association with our endpoint. As external validation, this 212-gene signature was tested on 165 and 161 tumors from two independent data sets (35, 37). In both sets, the hierarchical clustering-based classification (data not shown) did not correlate with the occurrence of local relapse, with p-values of 0.21 and 0.29 (Fisher's exact test), respectively. A similar negative result was found when taking into account the time to local relapse, available in one study (37), without any detected significant difference between the Kaplan-Meier LR-free survival curves of high-risk vs. low-risk patients (p=0.219, log-rank test, Figure 2). These observations suggest that our 212-gene signature established on our whole-population was not robust. Analysis of gene ontologies using Ingenuity revealed only one significant canonical pathway that included more than 5 genes and was specifically associated with the up-regulated genes of the signature. No similar pathway was significantly associated with the down-regulated genes (Supplementary Table II). To attempt to identify a robust signature, we performed the same analysis using more stringent selection criteria for samples. We compared the profiles of 17 patients with early relapse, i.e. women presenting a local relapse during the first three years after BCT, with those of 41 patients without local relapse after a long follow-up of at least 8 years. We identified 74 probe sets (Supplementary Table III; theoretical number of false positives=55) differentially expressed between the two groups, with a predictive accuracy of 70% in LOOCV. However, this new signature was not validated in the two previous independent data sets, with p-values for local relapse prediction of 0.93 (37) and 0.18 (35) (Fisher's exact test), respectively.
Secondly, because individual gene set analysis suggested relatively low expression differences between LR+ and LR– tumors, we used GSEA to search for discriminating biological pathways. The idea was to detect coordinated, but relatively small scale (not detectable at the level of gene analysis) differences in expression of genes representing biological functions. Whatever the population analyzed (all 81 samples or the 58 tumors from the more stringent subset), we failed to discover biological pathways significantly associated with local relapse. Among the 1,892 tested pathways, 230 gene sets were found as being up-regulated in the LR+ group and 1,409 in the LR– group, but none reached the significant false discovery rate (FDR) of 25% (data not shown).
Comparative expression analysis of primary tumors and local relapses. For 9 out of the 30 LR+ tumors, we were able to define the gene expression profiles of the corresponding local relapse. Unsupervised analysis of these 9 pairs (18 samples) and the most variant probe sets is shown in Figure 3. Interestingly, 8 of the 9 local relapse samples closely clustered with their corresponding primary tumor. Moreover, the molecular subtypes of most of the primary tumors were conserved in the corresponding local relapse (Figure 3). Finally, supervised analysis between primary tumors and local relapses identified only 74 probe sets as being differentially expressed, with a high expected false-positive number of 56 (Supplementary Table IV).
Discussion
Identification of tools to better tailor treatment for individual breast cancer patients is a major focus of research. Here, we have searched for a gene expression signature associated with local recurrence after BCT of early breast cancer. Using whole-genome DNA chips, we analyzed 81 primary invasive breast adenocarcinomas and 9 primary tumor-matched local recurrences. To our knowledge, this study is the fifth high-throughput molecular study addressing this issue in literature after those reported by two other teams (34-37). But it differs from these laters in the way that we matched the two groups of samples (with and without local recurrence) regarding the features classically associated with local relapse. Indeed, the absence of matching in the previous studies might be a reason for their relative failure to define robust signatures independent from classical histological/clinical variables. However, despite this methodological difference and the use of different technological platforms, our results are relatively similar to those previously reported.
Whole-genome unsupervised analysis was not able to separate LR+ and LR– tumors. This observation, also reported by others (34, 35), suggests that the transcriptional differences, if any, between the two groups are not important in terms of the number of altered genes, and are not associated with classical biological features, such as ER signaling, proliferation, or immune response, which usually govern the whole-genome clustering of breast cancer samples. In our series, local relapse was not dependent on the molecular subtypes. In fact, these two observations were rather expected given the initial matching of LR+ and LR– groups based on features classically associated with the subtypes. By contrast, in a larger and non-matched series (35), tumors from LR+ patients were significantly more often luminal B or ERBB2-overexpressing types. But the results remain conflicting. In a series of 753 breast carcinomas (55), patients with ERBB2-overexpressing and triple-negative (i.e. ER, PR and ERBB2 negative) tumors did not present significantly more local relapses than those with luminal tumors (ER and/or PR positive). The 5-year local relapse-free survival was 2.3%, 4.6% and 3.2% for luminal, ERBB2 and TN breast cancer, respectively (p=0.36). In the same way, 117 triple-negative early breast carcinomas were compared to 365 non-triple-negative tumors (56), all being treated by BCT including radiation therapy with a boost on the tumor bed. No difference was observed in local relapse incidence with a 5-year local relapse-free survival of 83% in both groups. Of note, these studies did not include the recently described claudin-low subtype. Analysis of larger series including all molecular subtypes is warranted to address this issue.
We also showed that none of the 10 tested gene expression signatures with proven value in predicting survival in breast cancer or response to radiation therapy were able to distinguish LR+ and LR– tumors in our series. Relatively similar results have been reported by others. IN assessment of three signatures (70-gene signature (23), hypoxia signature (44), and wound response signature (47)), it was found that only the wound-response signature significantly succeeded. However, this finding was not confirmed in a recent and larger study (35), in which all 8 tested signatures, except the 70-gene signature and the chromosomal instability signature, failed in identifying LR+ and LR– samples. In a Swedish study (36), the wound-response signature did not predict the local relapse in the combined ER+/ER– group, nor in the ER– group, but performed well in the ER+ group. Of note, all the signatures assessed in these studies were among those tested in our series. We also showed that the two signatures recently reported as being predictive of local relapse (35, 36) were not predictive in our dataset. Similarly, Kreike et al. were also unable to validate the Nimeus-Malmstrom signature in their larger series (35). The reduced number of genes from each tested signature common with our platform might explain the loss of predictive performance. However, these results raise questions about the validity of these two signatures established from relatively small and non-matched series (98 tumors for Kreike's signature and 100 for the Swedish signature), without an independent validation set for one of them (36) and without independent prognostic value for the other one.
We also applied classical supervised analyses to search for gene signatures or biological processes associated with local relapse in our series. Using the GSEA software, we showed that, after correction for multiple testing, none of the 1,892 tested biological processes-based gene sets was significantly associated with the onset of relapse. This result was observed in our series whatever the population analyzed (all 81 samples or the 58 tumors from the more stringent subset), but also when we analyzed a larger public dataset containing 165 samples (35). A similar negative result was reported by others using GSEA applied to 504 predefined gene sets and a series of 161 tumors (37). At the gene level, our supervised analysis found 212 probe sets the expression of which was associated with local relapse in the whole population, and 74 in the restricted population. However, because of the high rate of associated false positives, the poor performance of the two gene lists in LOOCV, and more importantly, the failure of their validation in two public independent cohorts, their robustness remains questionable. No biological pathway was strongly represented within these signatures. However, some genes seem biologically relevant in the context of local relapse. For example, the top gene overexpressed in LR+ samples in the two lists is PIP5K1A, which encodes phosphatidylinositol-4-phosphate 5-kinase, type I, alpha. This protein is involved in metabolism, especially the metabolism of phosphatidylinositol. It may also contribute to the inhibition of apoptosis by activating phosphatidylinositol-4,5-bisphosphate (PIP2), a potent inhibitor of caspases 3, 8 and 9. Its potential involvement in resistance to radiotherapy was previously suggested in glioblastoma cell lines (57).
Failure at identifying a robust local relapse signature has already been reported by van de Vijver's group (34, 37) in initial series of 50 and 161 tumors. More recently in a larger series, this group reported a signature that they validated in an independent dataset, but which lost its predictive value in multivariate analysis which included histological/clinical variables. The signature was enriched for genes involved in cell proliferation, likely due to the imbalance for the pathological grade between LR+ and LR– samples. The other published local relapse signature was identified in 100 ER+ tumors, but without any validation set (36), and was not confirmed as being predictive in the large Kreike series or in ours. It was enriched for ribosomal genes, likely for the same reason as Kreike's signature.
Altogether, these inconclusive results highlight the difficulties of gene expression profiling in relevantly linking genes or gene sets to local relapse onset. Several reasons may be proposed. Perhaps the most important concerns the preponderant influence of treatment-associated factors, such as the status of surgical margins or the lack of radiation boost, on local relapse when compared to intrinsic tumor parameters. Recent studies have indeed suggested that ER, PR, and ERBB2 IHC expression did not influence local relapse after BCT (55, 56). It is also possible that relevant molecular alterations are not present at the RNA level in the tumor, but at other post-transcriptional levels. Host factors may also contribute to the local relapse risk as recently reported for metastases (58). Other reasons are methodological: i) all reported series, including the present one, are relatively small, for both the learning and validation sets, with rather low number of events; ii) series are heterogeneous, notably with regard to risk factors associated with local relapse and to the delivery of adjuvant systemic therapies that influence the risk; iii) they do not separate tumors with regard to the molecular subtypes, which represent distinct diseases with, for example, different gene expression signatures predictive of the metastatic risk.
Finally, we studied 9 sample pairs including the primary tumor and its corresponding local relapse. In agreement with the previous results (34, 35), unsupervised and supervised analyses clearly suggested very few transcriptional differences between primary tumors and local relapses. Whole-genome clustering showed that the relapse samples were closer to their corresponding primary tumor than to other samples (relapses and primary tumors). Supervised analysis comparing primary tumors and local relapses resulted in high FDR, suggesting no significant difference between them. We have also shown that molecular subtypes are conserved from the primary tumor to the local relapse. All these observations suggest that gene expression profiles are not deeply modified by local radiation therapy (46 Gy minimum), and for some of them by associated systemic treatments (chemotherapy and/or hormone therapy). They also confirm that our analyzed samples of local relapse are true local relapses and not a second primary tumor in the same breast. Similar results have been reported at the DNA level with analysis of singe-nucleotide polymorphisms (59).
In conclusion, our study–the first one matched on histological/clinical risk factors–suggests that gene expression profiling may not give relevant information to predict local relapse occurrence after BCT. No specific transcriptional pathway involved in local relapse was identified. Based on our results and those of the four other studies published in the field, it seems that RNA expression profiles do not contain any robust signature predictive of the onset of local relapse after BCT. However, these conclusions may be accounted for by several reasons discussed above, including the small size and heterogeneity of series, perhaps not statistically powered enough to detect small, if any, predictive signature. We think that the analysis according to molecular subtype of larger series is warranted to definitively address this issue at the RNA level.
Acknowledgments
This work was supported in part by Inserm, Paoli-Calmettes Institute, and grants from Ligue Contre le Cancer (Comité de Corse du Sud), and Association pour la Recherche sur le Cancer (RS: ARC 2003 - grant No. 3214).
- Received June 28, 2011.
- Accepted July 1, 2011.
- Copyright© 2011 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved