Abstract
Background/Aim: Genomic DNA copy number alterations (CNAs) are frequent in tumors and have been catalogued by The Cancer Genome Atlas project. Emergence of chemoresistance frequently renders drug therapies ineffective. Materials and Methods: We analyzed how CNAs recurrently found in the genomes of TCGA patients of thirty-one tumor types affect protein targets of antineoplastic (AN) agents. Results: CNA deletions more frequently affected the targets of AN agents than CNA amplifications. Interestingly, in seven tumors we observed signs of compensatory CNAs. For example, in glioblastoma multiforme, two target genes (FLT1, FLT3) of the experimental drug sorafenib were recurrently deleted, whereas another target (KDR) of sorafenib was recurrently amplified. In renal clear cell carcinoma, the target FLT1 of pazopanib, sunitinib, sorafenib, and axitinib was recurrently deleted, whereas FLT4 bound by the same drugs, was recurrently amplified. Conclusion: Deletions of AN target proteins can be compensated by amplification of alternative targets.
Tumor cells differ phenotypically from normal cells, for example, by showing increased levels of proliferation and evading apoptosis (1). At the genomic level, one common variation of tumor cells is DNA copy number changes that include both gene amplifications and deletions (2). When these changes occur in germline cells, they are referred to as DNA copy number variations (CNV). When they occur in somatic cells, they are termed copy number alterations (CNA) (3). It is believed that CNAs in genome sequences of cancer patients (4) may play important roles in oncogenesis and cancer therapy (5).
An important reference data set on CNAs in patients suffering from more than 30 different tumors was compiled by The Cancer Genome Atlas (TCGA) project. A pan-cancer study of these data analyzed the effect of CNAs on known oncogenic drivers and tumor suppressor genes (TSG) and identified potential new cancer drivers, TSGs and biomarkers (6). This study also analyzed the length and the distribution of somatic CNAs along the chromosomes, identified regions that recurred significantly often and compared the number of genes in amplified and deleted regions (6). Subsequent studies (7, 8) of CNA data from TCGA focused either on specific genes (e.g. PD-L1, CD247, IRS4, IGF2) or on the relationship between copy number events and gene expression (7, 9). From the 33 tumor types available at TCGA today, we processed the data from 31 tumors in this study (glioblastoma multiforme, kidney renal clear cell carcinoma, brain lower-grade glioma, lung squamous cell carcinoma, liver hepatocellular carcinoma, kidney renal papillary cell carcinoma, kidney chromophobe carcinoma, breast invasive carcinoma, ovarian serous cystadenocarcinoma, uterine carcinosarcoma, head and neck squamous cell carcinoma, thyroid carcinoma, prostate adenocarcinoma, colon adenocarcinoma, stomach adenocarcinoma, bladder urothelial carcinoma, cervical squamous cell carcinoma and endocervical adenocarcinoma, sarcoma, acute myeloid leukemia, esophageal carcinoma, pheochromocytoma and paraganglioma, rectum adenocarcinoma, adrenocortical carcinoma, cholangio-carcinoma, lymphoid neoplasm diffuse large B-cell lymphoma, uveal melanoma, mesothelioma, thymoma, testicular germ cell tumors, uterine corpus endometrial carcinoma, pancreatic adenocarcinoma). The original publications on the datasets collected for these thirty-one tumors focused on the rate of copy number alterations, identification of recurrently amplified/deleted CNAs, the distribution of CNAs along the chromosomes, identification of oncogenes and TSGs, and clustered the tumors into subtypes. Several follow-up studies have analyzed CNA data from TGCA and analyzed copy number changes (10-12), recurrent copy number variations/alterations (13-16), the effect of CNAs on specific genes (17-21), identified putative new druggable cancer driver genes (22), tried to predict cancer relapse (23), and studied how cancer patients may be grouped into subtypes (12, 18).
Tumor therapy often involves chemotherapy (24). The current release of Drugbank (version 5.0.11, downloaded on January 12, 2018) lists 477 drugs as antineoplastic (AN) agents that are annotated to bind to 220 different protein targets. Mapping the targets of AN agents to the KEGG database of cellular pathways using the tool KEGG mapper (25) shows that 53 target proteins from this list belong to the PI3K-Akt signaling pathway, 39 to metabolic pathways, 32 to the Rap1 signaling pathway, 30 to Th17 cell differentiation, 32 to the Ras signaling pathway, and 38 to the MAPK signaling pathway.
The aim of this project was to analyze how protein targets of AN agents are affected by CNAs. To our best knowledge, no prior study addressed a related question so far. The only related work we are aware of is a study by Graham et al. who recently reported that recurrent patterns of DNA copy number alterations in tumors reflect metabolic selection pressures such as coordinated alteration of genes involved in glycolytic metabolism (26). For 31 tumor types from the TCGA dataset (see list above), we compared how recurrent CNAs affected the set of protein targets of chemotherapeutic drugs in comparison with a set of housekeeping genes and a set of cancer hallmark genes.
Materials and Methods
Figure 1 summarizes the main steps of our analysis.
Data on copy number alterations. As mentioned, we analyzed genomic data from the TCGA project on CNAs observed in patients suffering from 31 different forms of tumors (listed in the introduction section). Missing from this list are the data for lung adenocarcinoma and skin cutaneous melanoma as these could not be processed with the GISTIC2.0 tool (see below). The CNA data of these patients (start and end position, chromosome, and segment mean of CNA) were downloaded from the Genomic Data Commons Portal (GDC portal) on September 29, 2017 (27).
Clinical data. From the clinical data provided at GDC, we extracted information on which drug treatment was given to specific patients. Thereby, the presence of CNAs in individual patient genomes was associated with the drug treatment applied to these patients. In our work, only data from patients that had both CNA and clinical data available were used.
Antineoplastic agents and their targets. A list of 477 ANs together with their target proteins was extracted from Drugbank (28) (version 5.0.11, downloaded on January 12, 2018). We considered only those protein targets for which pharmacological action of the respective drug molecule is reported as “yes” in Drugbank. These 477 AN agents are reported to bind to 220 different protein targets (labeled here by their Uniprot accessions numbers). After converting Uniprot accession numbers to gene symbols, we were left with 218 genes. As “tumor-specific” drugs, we considered those drugs that were applied to the patients of a particular tumor entity according to the TCGA data files. As shown in Tables I and II, for drugs against lung cancer or breast cancer, these sets comprise a representative subset of the FDA-approved drug treatments for these tumors types (8 out of 16 and 23 out of 31), see https://www.cancer.gov/about-cancer/treatment/drugs/cancer-type. The sets for lung squamous cell carcinoma and breast cancer also included eight further drugs each that are not FDA-approved, but applied to TCGA patients possibly during ongoing clinical trials. Here, such drugs are labeled as “experimental drugs”.
Gene Sets. Beside the set of protein targets of AN agents, we also considered a set of 3804 housekeeping genes (29) (i.e. at least one variant of these genes is expressed in all tissues uniformly; downloaded from https://www.tau.ac.il/~elieis/HKG/ on January 13, 2018) and a set of 2,338 “hallmark genes” of cancer. The latter set contains all human genes that are annotated in the Gene Ontology (30) to at least one of 37 Gene Ontology terms that were described as hallmarks of cancer (31) (downloaded from http://geneontology.org/page/download-annotations on January 13, 2018). After converting Uniprot accession numbers to symbols, this gave 2321 gene symbols in the hallmarks of cancer gene set. Figure 2 shows the overlap of the three gene sets.
Genes affected by CNAs. Genes that are recurrently affected by CNAs were identified with the GISTIC2.0 tool version 2.0.22 (32) using segmentation files and marker files created from the CNA data of the tumor samples. Following Laddha et al. (33), we used 0.2 and -0.2 as thresholds for GISTIC2.0 to identify recurrent amplification and deletion peaks and the genes contained in those peaks. Uniprot accession numbers used by Drugbank were converted to gene symbols used by GISTIC2.0 by making use of data from the HUGO Gene Nomenclature Committee (HGNC database) (34) that were downloaded in January 2017. Information on genes (chromosome, start position, and end position) was based on data from Ensembl (data downloaded from http://rest.ensembl.org on January 16, 2018).
Results
General statistics. The aims of this work were (1) to test the hypothesis that genomic CNAs observed in tumors affect the protein targets of AN agents significantly more often than expected by chance, (2) to test whether either amplifications or deletions are more common, and (3) to study the potential relevance for chemoresistance. In principle, one can expect that eventually all genes except for the essential genes will be affected by CNAs in some patients. Hence, to get more meaningful results, our analysis was focused on the set of recurrently occurring CNAs that appear statistically more often in each individual tumor entity than expected by chance. This strategy is similar to that used by Graham et al. (26).
Table III lists the number of recurrently amplified and deleted genes obtained by processing the raw CNA data for the 31 considered tumors with the GISTIC2.0 program. Specified is also how many of these amplifications/deletions affect hallmark genes, housekeeping genes, and protein targets of AN drugs. Note that, in this initial analysis, protein targets of all 477 considered AN drugs were considered irrespective of whether these drugs are actually being used to treat the particular subtype of cancer. In acute myeloid leukemia 38 of 105 cases (26.57%) received treatment prior to the time when the CNA data were taken. For glioblastoma (22 of 590 cases) and kidney renal clear cell carcinoma (18 of 530 cases), the number of such cases was around 4%. In all other tumors, the fraction of pre-treated patients was below 3%. Hence, in all tumors except for acute myeloid leukemia, the detected amplifications and deletions are unlikely to reflect resistance phenomena occurring in response to treatment (Table IV). As shown in Table III, in twenty-nine out of thirty-one studied tumors (the exceptions are thyroid carcinoma and kidney chromophobe), the number of recurrently deleted genes exceeded the number of recurrently amplified genes. However, this difference between the lower number of amplifications and the higher number of deletions was equally significant for the sets of all genes, antineoplastic targets, hallmark genes, and housekeeping genes (p-values 8.501e-09, 1.721e-08, 9.196e-09 and 8.367e-09, Wilcoxon test) and, hence, does not reflect a peculiar property of AN target genes. Table V shows that a similar behavior is observed for genes annotated to specific cancer hallmarks.
For each disease, we then extracted from the GDC clinical data files the names of the drugs that were prescribed to the respective patients. The analysis was repeated with the same numbers of cases considered as in Table III but focused on the combined set of cancer-specific targets of these drugs, see Table VI. This set of target proteins was termed “specific drug targets” meaning that these are targets of the drugs that are given to patients with this specific tumor entity. By way of construction, the resulting numbers of affected genes were now far smaller. In 18 tumors, no CNA-amplifications affected the specific drug targets. In contrast, sarcoma behaved as an outlier to the other extreme with eight amplified targets. In the 12 remaining tumors, only one or two cases were observed. In contrast, in 23 tumors, CNA-deletions affected the specific drug targets of these tumor types. Among the three tumors (brain lower grade glioma, sarcoma, and mesothelioma) showing the largest number of CNA-deleted targets (10, 11, 14) only mesothelioma showed significantly more deletions than amplifications (adjusted p-value of 0.001, Fisher's exact test). When taking all tumor data together, the difference between specific amplified/deleted targets for the 31 tumors was significant (p-values of 0.00016, Wilcoxon rank test).
Following up on Table VI, Table VII lists the number of patient genomes where tumor-specific AN targets were affected by CNA mutations. This data show that, although the absolute number of CNA-affected AN target proteins is quite small (Table VII), the proportion of patients harboring these CNAs is in fact rather high. Respective target amplifications and deletions occur recurrently in a sizeable fraction (0 to 90%) of all patients.
To get more insight into the molecular mechanisms at place, Tables VIII and IX list the gene symbols of the tumor-specific AN targets that were affected by CNA amplifications and deletions (Table VI) and the respective drugs that were applied to patients of these tumors. Experimental drugs were marked by labelEXP, e.g. docetaxelEXP. For acute myeloid leukemia that contains a sizeable fraction of pre-treated patients (26.57%) no information about the applied drugs is provided in the TCGA clinical data files, so we could not identify recurrent CNA amplifications or deletions of cancer-specific drug targets in this case.
Comparison of Tables VIII and IX reveals that for some tumors, there exist targets of the same drugs that were both recurrently deleted and amplified in patients of the same tumor type. Table X lists all such pairs.
Discussion
In this project, CNA and clinical data for 31 types of tumors from the TCGA project were combined with information on AN drugs from Drugbank. As shown in Table III, in 29 studied tumors, the number of recurrently deleted genes exceeded the number of recurrently amplified genes. This finding is generally concordant with the results of the TCGA consortium who reported in their pan-cancer study that the 70 peak amplification regions contained a median of 3 genes each, whereas 70 peak regions of CNA deletions contained a median of 4 genes (6). Earlier studies (6, 9) reported that CNAs promote carcinogenesis and/or tumor progression by deleting tumor suppressor genes (TSGs). In agreement with this, in the dataset studied here the patient genomes of 29 tumors contained at least one of 71 known TSGs (13) in their list of genes recurrently deleted by CNAs. In the case of uterine corpus endometrial carcinoma and lymphoid neoplasm diffuse large B-cell lymphoma, even 22 of the 71 known TSGs were recurrently affected by CNA deletions (Table XI).
The recurrently amplified/deleted genes of the 31 tumor types had no protein-coding gene in common. This is not unexpected as will be argued in the following. As shown in Table III, recurrent CNA deletions affected on average 4150 genes, which is roughly 20% of all genes. If we assume that the 31 considered tumors are unrelated, we would expect that - by chance – an overlap of (0.2)31×20.000 genes=4×10−28 genes would be affected in all tumor groups. This number is even smaller for amplified genes. This led to the expected result that all three gene sets (AN targets, housekeeping genes, and hallmark of the cancer genes) had no gene in common that is affected by CNAs in all type of tumors.
Then, we compared how CNAs affect gene subsets comprising antineoplastic (AN) target genes, housekeeping (HK) genes, hallmark of cancer (HC) genes, or tumor-specific AN target genes. Importantly, in all these gene sets, significantly more genes were affected by deletions than by amplifications. Hence, this observation is not specific to AN target genes nor to tumor-specific AN target genes.
The tumor-specific AN target genes recurrently affected by CNA amplifications are epidermal growth factor receptor (EGFR), FLT4, TYMS, TOP2A, KDR, VEGFA, BRAF, KIT, PDGFRA, HDAC2, TUBB1, PTGS2 and FGFR1. These genes belong to 13 types of tumors (Tables III and VIII). In the 18 remaining tumor types, no tumor-specific AN target gene was amplified. As an example, amplifications of EGFR gene copy numbers and overexpression of EGFR are known to be one of the most common alterations in non-small-cell lung carcinoma (NSCLC) cells (35-38) and are associated with a poor prognosis and chemoresistance. Among the histological subtypes of NSCLC, EGFR is most frequently expressed in squamous cells (39).
On the other hand, in 23 tumors, CNA-deletions affected specific drug targets of these tumor types. As shown in Table IX, CNA deletions of AN targets affected (1) the two enzymes bifunctional purine biosynthesis protein PURH (gene name ATIC) (40) and a subunit of ribonucleotide reductase (RRM1) that are both important for cell replication (41); (2) the nuclear receptor NR1I2 that regulates the metabolism and efflux of xenobiotics via CYP3A4 and MDR1 (42); (3) the mitochondrial and nuclear DNA topoisomerases TOP1MT and TOP2A; (4) the members of the vascular endothelial growth factor receptor family VEGFA, FLT1, FLT3, and (5) fibroblast grown factor FGFR2; (6) estrogen receptor ESR2; (7) the signaling MAP kinase MAPK11 and (8) the B-Raf Proto-Oncogen BRAF that regulates the MAP kinase/ERK signaling pathway (43); (9) the inhibitory cell surface receptor PDCD1 that is involved in the regulation of T-cell function (44); and finally (10) beta tubulin TUBB and the microtubule-associated protein MAP1A that is almost exclusively expressed in the brain (45, 46) (and was CNA-deleted in glioblastoma). As all of these proteins have important roles in promoting carcinogenesis, they have likely been selected as targets of antineoplastic agents. As argued above, the CNA mutations pre-existed before the onset of the therapy.
These findings of rare CNA amplifications, but frequent CNA deletions of tumor-specific drug targets have clear consequences on drug development. In the future, considering CNA frequencies should certainly become a standard element of drug design efforts. These data also suggest that genomes of tumor patients may contain “compensating” mutations where one target protein of a drug is deleted and another target protein of the same drug is amplified. Unfortunately, due to space reasons we are restricted to discussing only a few of these cases in more detail.
In renal clear cell carcinoma patients that were subsequently treated with the drug molecules pazopanib, sunitinib, sorafenib, and axitinib, the target protein FLT1 of these drugs was recurrently deleted (in 55 samples), whereas another target protein, FLT4, of the same drugs was recurrently amplified (in 337 samples). Overall, 36 samples had both deleted FLT1 and amplified FLT4. FLT4 encodes a tyrosine kinase receptor of the same protein family as vascular endothelial growth factors C and D. In agreement with what is expected from the observed CNA amplification, FLT4 was previously reported to be overexpressed in kidney clear cell carcinoma (47). Besides being a recurrent target of CNA deletions here, FLT1 was also reported to be frequently silenced through promoter hypermethylation in renal clear cell carcinoma (48).
In lung squamous cell carcinoma patients subsequently treated with the drug erlotinib, one of its targets, NR1I2, was recurrently deleted (in 20 samples) and another target, epidermal growth factor receptor (EGFR), was recurrently amplified (in 186 samples). Nine samples had NR1I2 deleted and EGFR amplified at the same time. In brain lower grade glioma, NR1I2 and EGFR were also deleted and amplified, respectively. Beside these two genes, the target KIT of sorafenib was amplified while FLT4 and FGFR1 were deleted.
There exist also cases where the same target protein can be either amplified or deleted. For example, Table X shows that, FLT4 (target of sorafenib and pazopanib), and PTGS2 (target of sulindac) were observed to be either amplified or deleted in different sarcoma samples. FLT4 was amplified in 57 samples, and was deleted in 36 samples. PTGS2 was amplified in 63 samples, and was deleted in 32 samples.
The aim of this work was to test the hypothesis whether the protein targets of AN agents in tumors are affected by genomic copy number alternations (CNAs) more strongly than expected by chance. Based on CNAs and clinical data from the TCGA repository, we found that the genome sequences of tumor patients generally contain more recurrently deleted CNAs than recurrently amplified CNAs. This is also the case for CNAs affecting target genes of the specific AN for this tumor. Interestingly, we observed certain signs of apparently compensating effects of CNAs. The data available for this study enabled us to identify CNA alterations that existed prior to therapy and that may render certain chemotherapies more or less effective. In the future, it would be desirable to also collect time-series CNA data of tumor patients at the time of diagnosis and at later time points. This would point to CNA alterations caused by application of certain chemotherapies and thus reflect chemoresistance.
Acknowledgements
Ha Vu Tran was supported by a Ph.D scholarship from DAAD.
Footnotes
This article is freely accessible online.
- Received June 29, 2018.
- Revision received July 14, 2018.
- Accepted July 17, 2018.
- Copyright© 2018, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved