Abstract
Background/Aim: Mutational signatures reflect common patterns based on the counts of mutations and their sequence context. The prognostic value of these signatures, mirroring various carcinogenetic processes of cancers, are unexplored in gastrointestinal cancers. Our aim was to evaluate possible prognostic relevance of mutational signatures in gastrointestinal carcinomas after adjusting with the traditional prognostic factors. Materials and Methods: We used publicly available data from The Cancer Genome Atlas and Pan-Cancer Analysis of Whole Genomes to evaluate the associations between survival endpoints and activity of mutational signatures in seven types of gastrointestinal cancers. Results: Most strikingly, the high activity of age-related single-base substitution 5 (SBS5) and SBS40 signatures were in rectal adenocarcinomas associated with both improved overall survival (OS) [for SBS5 hazard ratio (HR) 0.130; 95% CI=0.03-0.56, for SBS40 HR=0.072; 95% CI=0.012-0.44, respectively] and similarly also to rectal cancer-specific survival. In patients with left-sided (but not right-sided) colon adenocarcinoma, the high activity of SBS2 signatures, formed due to APOBEC activity, predicted shortened OS. In pancreatic cancer, the high activity of SBS10b, caused by polymerase epsilon exonuclease proofreading defects, was associated both with longer OS (HR=0.44; 95% CI=0.205-0.96) and pancreatic cancer-specific survival (HR=0.32; 95% CI=0.112-0.91). Conclusion: Several mutational signatures seem to have clinically meaningful, cancer-specific associations with prognosis among gastrointestinal cancers.
Globally, six out of 14 most common cancer types are gastrointestinal (GI) cancers, and they cover more than one quarter of all cancer deaths (1). GI cancers are very diverse in terms of their risk factors, geographical incidence, and prognosis. As a unifying factor, there are still huge gaps in their etiological research and reliable prognostic factors beyond TNM classification are rare. The 5-year survival of GI cancers varies from up to 70% of rectal adenocarcinomas to very few long-time survivors with pancreatic adenocarcinoma (1). Survival rates within the patients with cancer in same anatomical location may still vary a lot, and more precise and reproducible prognostic factors are needed to optimize surgical and oncological treatments, and surveillance.
Both endogenous and environmental sources of mutagenesis cause consistently identifiable patterns of mutations, mutational signatures, which reflect different carcinogenetic pathways (2). By examining the frequency of these signatures, insights to e.g., past exposure to carcinogens and DNA repair mechanism defects can be achieved. Single-base substitution (SBS) mutational signatures consist currently of 53 distinct SBS in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (3). While the etiology of some SBSs is still unknown, some signatures are caused by a specific DNA proofreading defects (e.g., SBS10), others are related to exposure to specific chemotherapies (e.g., SBS17) or may be secondary to tobacco chewing (e.g., SBS29) or smoking (e.g., SBS4) (3, 4). Doublet base substitutions (DBS), which are generated after the concurrent modification of two consecutive nucleotide bases, and signatures of small insertions or deletions, known as indels, were only recently introduced to COSMIC and are still quite unexplored (3, 4).
As targeted therapeutics advance in GI cancers, whole-genome and whole-exome sequencing data are likely to become more common in the future. Although single mutations may offer prognostic value, exemplified by BRAF and RAS mutations in colorectal cancer, mutational signatures could provide reproducible and much more comprehensive insight to the aggressiveness of GI carcinomas by reflecting their carcinogenetic processes.
The study of mutational signatures is a rapidly emerging field of cancer research, but the association between mutational signatures and survival in the patients with GI carcinomas has not been thus far assessed. To elucidate this, we used publicly available The Cancer Genome Atlas (TCGA) whole-exome data to evaluate possible prognostic relevance of COSMIC signatures after adjusting with the traditional prognostic factors. The analyses were complemented with Pan-Cancer Analysis of Whole Genomes (PCAWG) whole-genome data, where appropriate.
Materials and Methods
Data. Mutational signature activity data (3) were accessed from the ICGC data portal (5). The data comprised of whole-genome sequenced tumors from the PCAWG consortium and whole-exome sequenced tumors from the TCGA. SBS, DBS and ID signatures were available for the PCAWG samples, and for the TCGA samples only SBS and ID signatures were available. Metadata on the PCAWG tumor samples were accessed from the ICGC data portal (6) and curated metadata for TCGA samples were accessed from supplementary Table I of (7). The data files were read and subjected to all further analysis using R, v. 4.0.2 (8). All statistical analyses of this study were performed by biomedical statisticians.
Colon cancers were divided for two groups for our analysis. Right-sided colon cancers included caecum, ascending colon, hepatic flexure of colon and transverse colon while left-sided colon cancers consisted of splenic flexure of colon, descending colon, and sigmoid colon.
Univariate survival analysis. The association between mutational signatures and overall survival (OS) was first tested in a univariate approach utilizing the R packages survival, v. 3.2-7 (9) and survminer, v. 0.4.8 (10). Only primary tumor samples and patients with available vital status and survival/follow-up time were included. TCGA cancer types and PCAWG cancer types were analyzed separately and, additionally, all TCGA samples were analyzed as one group and all PCAWG samples as one group. The association between the signature activity and in each cancer type was analyzed if the following criteria were met: 1) at least 20 samples had both signature and survival data 2) there were at least five death events among the patients and 3) there were at least five samples with non-zero signature activity. The association to survival was then tested using the log-rank test between low activity and high-activity tumors for the given signature. Low-activity tumors were defined as those with a median or lower activity of the signature within the cancer type, and high-activity tumors as those with above-median activity. For each analysis, a Kaplan-Meier curve was plotted using the function ggsurvplot.
Multivariate survival analysis. After the univariate survival analysis, a multivariate survival analysis was performed for seven selected GI cancer types using the TCGA data (CHOL, COAD, ESCA, LIHC, PAAD, READ and STAD), again using the packages survival and survminer. This analysis was carried out as a cancer type specific Cox proportional hazards regression with multiple variables: all signatures with a non-zero activity in at least 5% of the patients within the cancer type and selected clinical variables. Four alternative endpoints were used, generating separate regression models: OS, disease-specific survival (DSS), progression-free interval (PFI) and disease-free interval (DFI). The function cox.zph was used to test the proportional hazards assumptions and plot the Schoenfeld residuals for each variable and the combined model. Function coxph was used to run the Cox regression. Forest plots illustrating the hazard ratios of each variable were generated using the function ggforest of survminer package. This Cox regression was performed on seven cancer types of the GI tract treating the signature activities as binary high vs. low variables thresholded at their median as in the univariate analysis. In some cases, there were no non-censored patients (i.e., patients with a qualifying progression event) with a particular value of a clinical variable, leading to the failure to estimate their coefficients. In such cases, problematic variables were either left out or patients with particular values of that variable were left out or merged. Cross-tables for each of the seven GI tumors indicating the used clinical variables, mutational signatures, and the number of patients with a 1) low-activity and 2) high-activity status by each clinical variable value were produced using the R package gtsummary v.1.4.1 (11).
Signature-mutation count correlations. For the seven GI tumor types, the correlation between the activity of the SBS signatures included in the Cox regression and the total count of small somatic mutations was calculated across all patients of a given cancer type. The correlation coefficients and p-values for both Pearson’s linear correlation and Spearman’s rank correlation were calculated, and the association between signature activity and mutation count was also visualized using scatterplots generated by ggscatter function of R package ggpubr, v.0.4.0 (10).
Signature versus clinical variable associations. For the seven GI tumors, the association between the activity of the SBS signatures included in the Cox regression and the clinical variables was tested. For this, the patients were binarized into low- and high-activity groups as previously, and a p-value was calculated using Fisher’s exact test. A significant p-value indicates that the low- and high-activity patients fall into the categories of the clinical variable non-randomly. The statistical testing and result table generation was performed using the package gtsummary.
The definition of the endpoints. TCGA data has four different endpoints: OS, DSS, DFI and PFI. These endpoints are explained in detail in Liu et al. 2018 (7). Briefly, OS is the period from the date of diagnosis until the date of death from any cause, and DSS until the date of death from the specific cancer. PFI is the period from the date of diagnosis until the date of the first occurrence of any new tumor event. DFI is the period from the date of diagnosis until the date of the first new tumor progression after achieving disease-free status. PCAWG data includes only OS as an endpoint.
Results
Several mutational signatures had associations with prognosis among gastrointestinal cancers. These performed Cox proportional hazard analyses are reported in more detail in Table I and the results are also summarized below and in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, and Figure 6.
High prevalence of SBS17b signatures associated with shorter OS in esophageal carcinomas (HR=11.47; 95% CI=1.187-110.8) (Figure 1). Again, the high number of SBS18 signature was a predictor of improved disease-free interval in esophageal carcinomas (HR=0.18; 95% CI=0.038-0.86). Likewise, SBS18 associated with longer OS in PCAWG data in univariate analysis (p=0.043) (Figure 2). This was the only statistically significant survival association observed in the PCAWG dataset.
In stomach adenocarcinomas, SBS1 associated with shorter DFI (HR=2.55; 95% CI=1.132-5.8) while SBS40 predicted improved stomach adenocarcinoma-specific survival (HR=0.50; 95% CI=0.25-0.98) (Figure 3).
In pancreatic cancer, SBS10b signature was found to be associated both with longer OS (HR=0.44; 95% CI=0.205-0.96) and pancreatic cancer-specific survival (HR=0.32; 95% CI=0.112-0.91) (Figure 4).
In patients with colon adenocarcinoma, high number of SBS2 signatures predicted shortened OS (HR=2.42; 95% CI=1.04-5.6), but this was observed only in the patients with left-sided colon adenocarcinoma (HR=4.97; 95% CI=1.005-24.6), when right-sided and left-sided colon adenocarcinomas were studied separately (Figure 5). In addition, the patients with left-sided colon adenocarcinoma with high SBS5 (HR=6.9; 95% CI=1.039-45.8) or high SBS40 (HR=9.80; 95% CI=1.444-66.5) signature activity in their tumors had decreased OS.
In rectal cancer, SBS5 and SBS40 signatures were associated with both notably improved OS and rectal cancer-specific survival (for SBS5 and OS HR=0.130; 95% CI=0.03-0.56, for SBS5 and DSS HR=0.0025; 95% CI=0.000042-0.15, for SBS40 and OS HR=0.072; 95% CI=0.012-0.44 and for SBS40 and DSS HR=0.00098; 95% CI=0.000013-0.076, respectively) (Figure 6). In addition, the high number of SBS17b signature (present only in nine patients) associated with worse rectal cancer-specific survival (HR=1800; 95% CI=2.9-1.1x107).
In the cohorts consisting of only right-sided colon adenocarcinomas, hepatocellular carcinomas or cholangio-carcinomas, no statistically significant associations between mutational signatures and any of the survival endpoints were observed in Cox regression analysis.
Signature-mutation count correlations. The total mutation count had a strong positive correlation with most of the studied mutational signatures (Figure 7). The number of SBS5 mutational signatures and mutation count had a very clear correlation in the all studied tumor types, with the exception of rectal adenocarcinoma and left-sided colon adenocarcinoma. Also, the number of SBS1 signatures had strong correlations with total mutation counts in all but the smallest cohorts of hepatocellular carcinomas and cholangiocarcinomas.
Furthermore, several inverse correlations between mutation count and mutational signatures were recorded. In esophageal carcinoma SBS16 (p=0.037; R=–0.15) and SBS18 (p=0.024; R=–0.17) correlated inversely with mutation count, in hepatocellular carcinoma SBS30 (p=0.0073; R=–0.15), in pancreatic adenocarcinoma SBS15 (p=0.0054; R=–0.21) and in stomach adenocarcinoma SBS2 had such a correlation (p=0.0049; R=–0.14). In right-sided colon adenocarcinoma, high SBS40 activity correlated with lower number of mutation count (p=0.00023; R=–0.28).
The associations between the traditional prognostic factors and signature activity in each of the studied cancer types are presented in Supplementary Tables I-VII.
Discussion
The impact of mutational signatures on patient survival has not been assessed previously. The herein presented data suggest that COSMIC mutational signatures have cancer-specific associations with diverse prognostic groups among the major GI cancers.
From all GI cancers, SBS1, SBS5 and SBS40 signatures have been most frequently found in colorectal adenocarcinomas and SBS5 is ubiquitous also in benign GI tissues (3, 12). Both SBS5 and SBS40 are flat signatures, and their misattribution has not been excluded (3, 13). The activity of both SBS5 and SBS40 correlate with age, but the etiology especially behind SBS40 is still unknown (3). According to our analysis, the high activity of both SBS5 and SBS40 associated with improved OS in patients with rectal adenocarcinoma. In addition, both SBS5 and SBS40 associated very convincingly with long DSS, the upper limit of 95%CI of the HR being as low as 0.076 for SBS5 and 0.15 for SBS40, respectively. These estimates exceed by far the traditional colorectal cancer prognostic factors, including stage. Nevertheless, rectal cancer-specific survival should be considered as an approximation in TCGA data, while OS is more strongly recommended for use (7). Intriguingly, SBS5 activity did not reflect the total number of mutations in rectal adenocarcinomas, although SBS40 had such a correlation. Neither SBS5 and SBS40 associated with stage, sex, or the site of primary carcinoma in rectum. This may emphasize their occurrence possibly in the early phases of carcinogenesis, but also suggests their independent roles as novel prognostic factors in rectal adenocarcinomas. These results are in contradiction with the inherent clock-like nature of these SBS5 and SBS40 and suggest that their association to non-malignant behavior of rectal adenocarcinomas should be assessed in more mechanistical studies.
SBS17b is considered an easily trackable signature and it has a marked characteristic of T>G substitutions, possibly caused by oxidative damage in the nucleotide pool (4). Although high SBS17b activity was observed only in 7% of the evaluable patients with rectal adenocarcinoma, it associated very strongly with poor rectal adenocarcinoma-specific survival. Again, these DSS rates should be interpreted with caution due to relatively short follow-up (7). There is an enrichment of high SBS17b activity in tumors treated with 5-fluorouracil or capecitabine, which are widely used agents in stage III rectal adenocarcinomas requiring perioperative therapy (3, 14). The association between high SBS17b activity and poor survival was still independent from stage. Recent evidence suggests that high SBS17b activity is also connected to anti-EGFR antibody resistance in colorectal cancer (15).
Sidedness affects a wide spectrum of CRC features, including consensus molecular subtypes and microbiome, but has also a clinically meaningful impact on prognosis and treatment in metastatic setting (16). Contradictory to what was observed in patients with rectal adenocarcinoma, we observed dismal OS outcome in patients with left-sided colon adenocarcinoma, who had either high SBS5 or SBS40 activity. Such associations were not observed in patients with adenocarcinoma originating from the right side of colon and thus the clock-like nature of these signatures does not explain these results. Whether preoperative radiotherapy, used in rectal adenocarcinomas but not in colon adenocarcinomas, could induce SBS5 and SBS40 is currently unknown. High SBS2 activity predicted poor OS in patients with left-sided colon adenocarcinoma, but again not in those whose primary tumor was right-sided. SBS2 signature is formed due to APOBEC activity and is one of the most well-defined mutational signatures (4, 17). Although the prognostic value of mutational signatures has been previously described hardly in any malignancies, APOBEC signatures seem to associate with high mutational load and worse OS also in the patients with multiple myeloma (18).
Pancreatic adenocarcinoma has a dismal prognosis and lacks reproducible genetic or molecular prognostic biomarkers. SBS10 signature was recently split into distinct SBS10a and SBS10b signatures, which are both caused by polymerase epsilon exonuclease (POLE) proofreading defects (3, 19, 20). Specific POLE mutations define in endometrial carcinomas a specific ultramutated subtype, with improved prognosis (21). In line with this, the high activity of SBS10b signature in the patients with pancreatic adenocarcinoma was present in 10% of patients and it associated with longer DSS and OS, with a similar magnitude of effect for both endpoints. Both OS and DSS are considered as reliable endpoints in the TCGA data of pancreatic adenocarcinomas (7). SBS10b was not associated with any of the studied traditional prognostic factors or total mutation load. It is worth emphasizing that the stage distribution of pancreatic adenocarcinomas in TCGA does not represent the distribution seen in usual clinical practice, as 95% cases in the TCGA dataset are stage I-II tumors. Thus, these results suggest that the high activity of SBS10b, indicating POLE repair deficiency and hypermutator phenotype, may be a novel prognostic factor in early pancreatic adenocarcinoma. This could offer a possibility to guide risk-based adjuvant therapy and surveillance after surgery with curative intention. In advanced endometrial cancers, high SBS10 signature activity has been connected to improved response for checkpoint inhibitors, probably due to increased neoepitope formation (4, 19, 20).
In contrast to other GI carcinomas, there are some preliminary data regarding the COSMIC signatures and survival in patients with esophageal and gastroesophageal junction (GEJ) carcinoma. High SBS17 activity (without a separation to SBS17a and SBS17b) predicted worse survival in the material of 83 Chinese GEJ adenocarcinoma patients (22). Also in a small set of esophageal adenocarcinomas, the characteristic of SBS signatures, SBS, 5′-C[T>G]T-3′, predicted worse OS in univariate analysis (23). In line with these data, and with the current results from rectal adenocarcinomas, SBS17b activity predicted poor OS outcomes in esophageal carcinomas, consisting almost solely of adenocarcinomas.
SBS1 associated in stomach adenocarcinomas with short DFI, which is considered as a reliable endpoint in TCGA stomach adenocarcinoma cohort (7). SBS1 correlates tightly with age and mutation load in most cancers, also in stomach adenocarcinomas (3, 24) and age is one of the most important prognostic factors in stomach cancer (25).
Both SBS17b and SBS18 signatures arise after cellular stress, especially due to reactive oxygen species (4). Particularly, SBS17 signatures may be a consequence of exposure for gastric acid or 5-fluorouracil/capecitabine (4), which are one of the most applied chemotherapeutic agents also in esophageal carcinomas. It is possible, that the linkage between SBS17b and shorter survival in patients with esophageal carcinomas may just reflect the increased use chemotherapy or (chemo)radiotherapy in the most aggressive esophageal carcinomas, which would have consequently led to increased number of SBS17b signatures. From all GI carcinomas, SBS18 is the most prevalent in both esophageal carcinomas (3, 26). High SBS18 activity associated with improved DFI in TCGA data and also with prolonged OS in PCAWG dataset. This was the only result from whole-exome TCGA dataset, which could be confirmed in the whole-genome PCAWG data. Taken together, it seems that there is a different origin of signature between SBS17b and SBS18 and consequently diverse contribution to survival in esophageal carcinomas.
There are several limitations in our study, which mostly relate to the nature of TCGA data. We did not have the treatment data available, although we were able to use various other clinically important prognostic factors as co-variates in multivariate analysis. Using PFI and DFI may be criticized for them being surrogates rather than clinically hard endpoints. On the other hand, in TCGA cohorts PFI and DFI are considered the most reliable endpoints due to generally sufficient follow-up (7). Again, DSS results should be interpreted with caution in most cancer types as discussed above (7). As TCGA is based on exome data, only a small proportion of mutations in human genome footprint are covered. This is also a likely reason for our final analyses including only SBSs, but not other types of signatures. The results from PCAWG data were mainly not in concordance with TCGA results. This is not surprising since PCAWG whole-genome data consists of only one endpoint, OS, it lacks the largely clinical variables, and above all, has a limited sample size.
Conclusion
We conclude that several COSMIC mutational signatures seem to have an independent prognostic role among GI cancers. This is highlighted by tremendously improved survival in rectal adenocarcinoma patients with high SBS5 and SBS40 activity. Both SBS5 and SBS40 are rather poorly characterized signatures in terms of their activities at molecular level and more experimental studies are required to resolve this. Some mutational signatures had similar prognostic impact in different sites of cancer origin, as exemplified by poor outcome in the patients with esophageal or rectal cancer and high SBS17b activity. It should be still kept in mind that the observed results are associations, not causations. In following studies, the carcinogenetic processes behind of certain signatures should be clarified and the current results should be confirmed in a material with longer follow-up.
Acknowledgements
We thank Genevia Technologies for their help in statistical analyses and in composing the Figures and Tables.
Footnotes
Authors’ Contributions
PK drafted the manuscript. All Authors participated in the planning the study, evaluating the results, and writing the final versions of the manuscript.
Supplementary Material
Available at:<https://www.dropbox.com/sh/9lyfv5s0bo8u0sk/AAAncv9tMsXBigA74sr_cl3Ya?dl=0>
Conflicts of Interest
The Authors have no conflicts of interest related to the manuscript.
- Received April 21, 2022.
- Revision received May 11, 2022.
- Accepted June 16, 2022.
- Copyright© 2022, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).