Abstract
Background/Aim: Chemotherapy with gemcitabine and cisplatin remains the cornerstone of treatment for advanced urothelial carcinoma (UC), yet response rates vary significantly among patients. Predicting treatment response is crucial to avoid unnecessary toxicity and optimize therapeutic strategies. This study aims to develop a deep learning model leveraging RNA sequencing data to predict chemotherapy response in UC patients.
Materials and Methods: We developed a deep learning model using RNA sequencing gene expression data from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus to predict chemotherapy (gemcitabine and cisplatin) response in UC patients. The model was externally validated using an independent cohort from the Pusan National University Yangsan Hospital. Model interpretation was performed through gene ontology and survival analyses using predictions from TCGA samples not included in the training set.
Results: The deep learning model demonstrated excellent predictive performance, achieving 94.7% accuracy in the training dataset and 90.0% accuracy in external validation. Gene ontology analysis revealed four key functional clusters associated with chemotherapy response: DNA damage response, cell cycle regulation, kinesins/microtubule dynamics, and mitotic cytokinesis. Notably, the model showed significant prognostic value in early-stage, with predicted responders displaying markedly better survival outcomes (p=0.019).
Conclusion: Our transcriptome-based deep learning approach offers a promising computational strategy for predicting chemotherapy response in urothelial carcinoma. By integrating high-dimensional RNA-seq data and advanced machine learning techniques, we provide a potential decision-support tool for personalized treatment planning.
- Urothelial carcinoma
- cisplatin
- gemcitabine
- chemotherapy response
- deep learning
- RNA sequencing
- precision oncology
Introduction
Urothelial carcinoma (UC), also known as bladder cancer, is one of the most common urological cancers. Over 90% of UC cases originate in the uroepithelium (1). New treatment agents for UC, including immune checkpoint inhibitors, fibroblast growth factor receptor inhibitors, and enfortumab vedotin have demonstrated efficacy and are being applied in clinical practice (2-4). Nevertheless, cisplatin-based chemotherapy (CBCT), particularly with gemcitabine and cisplatin (GC), remains the most important first-line chemotherapy option for advanced UC (5, 6) and is used in neoadjuvant chemotherapy (NAC) for localized UC (7). While CBCT has a response rate of nearly 60%, it can also cause toxicity without efficacy in some patients (5). Predicting treatment response to chemotherapy is crucial for developing treatment strategies that minimize unnecessary chemotherapy, particularly in the NAC setting. Avoiding unnecessary chemotherapy helps preserve quality of life and prevents delays in radical cystectomy.
The heterogeneity of UC prognoses is due to the varying aggressiveness and proliferative capacity of cancer cells, and this nature is known to be determined by genetic information in UC (8, 9). However, identification of patient candidates at highest risk for recurrence or progression is currently based on clinical information that may not reflect the entire biology of UC (10). The specific effects of genetic alterations that have been identified in UC need to be explored further.
Several studies have been conducted to identify molecular biomarkers for prognostic or predictive purposes in UC (11-13). Prior research has primarily focused on DNA repair genes as predictive biomarkers for CBCT response (14-17). However, these studies were limited to specific target molecules or pathways (18-20), and despite advances in molecular profiling (11, 21), comprehensive studies integrating whole transcriptome analysis for chemotherapy response prediction remain limited. RNA sequencing (RNA-seq) data enables comprehensive transcriptome profiling, allowing simultaneous evaluation of expression levels across human genes, thereby providing an extensive dataset for integrated transcriptomic analysis (22, 23). Deep learning models excel at capturing complex relationships among multiple features within large-scale datasets. This capability makes them particularly effective for analyzing gene expression data and improving prediction accuracy for personalized treatment strategies (24-27).
In this study, we developed a deep learning model to predict GC response in bladder cancer patients and evaluated its performance through external validation using independently collected patient data (Figure 1).
Overview of the study design and analytical workflow. Schematic representation of the study workflow. Bladder cancer cases with gemcitabine and cisplatin response data were collected from three sources: TCGA-BLCA (n=81), GSE247185 (n=13), and PNUYH (n=10). Gene expression data from TCGA and GSE datasets underwent quantification and augmentation to create the training dataset (12,961 genes, n=4,794), which was used to develop a 5-layer fully connected neural network for drug response classification. The PNUYH cohort served as an independent external validation set. The model was evaluated through internal validation, external validation, biological interpretation of predictive features, and survival analysis. PNUYH: Pusan National University Yangsan Hospital; TCGA-BLCA: the Cancer Genome Atlas Bladder Cancer cohort.
Materials and Methods
Data sources. We conducted a study using three independent datasets: The Cancer Genome Atlas (TCGA) bladder cancer cohort, Gene Expression Omnibus (GEO) dataset (GSE247185), and a clinical cohort from Pusan National University Yangsan Hospital (PNUYH). The study was approved by the Institutional Review Board of PNUYH (IRB Number: 05-2020-074).
Training dataset preparation. Primary training data were obtained from TCGA bladder cancer cohort using TCGA biolinks (version 2.30.4) in R (version 4.3.3). We extracted transcriptome quantification data (Transcripts Per Million, TPM) and matched clinical data from patients with bladder cancer who received GC regimen. Treatment response was categorized as responders (complete response or partial response) or non-responders (stable disease or progressive disease). Additional training data were acquired from GSE247185 (GSE), which included RNA sequencing data from 13 pre-treatment formalin-fixed paraffin-embedded (FFPE) tumor biopsies of muscle-invasive bladder cancer patients who underwent neoadjuvant cisplatin-based chemotherapy. Response in this dataset was determined by pathologic complete response (pCR) status after treatment. Gene expression values were normalized using log2 transformation of TPM values. To enhance model robustness, we implemented data augmentation by generating 50-fold more samples than the original dataset, where each new sample was created by introducing a 5% random variation to individual gene expression values while preserving the underlying biological signal (28).
Internal validation. To assess model performance and stability, we implemented a 5-fold stratified cross-validation strategy. The augmented training dataset was divided into five equally sized folds using StratifiedKFold (scikit-learn, version 1.2.2), maintaining the same proportion of responders and non-responders in each fold. The stratification ensured balanced representation of response classes across all folds. For each iteration, four folds of dataset were used for training and one-fold for validation, with random shuffling. Model performance was evaluated using multiple metrics including accuracy, sensitivity, specificity, precision, and F1-score. The receiver operating characteristic (ROC) curves and area under the curve (AUC) values were calculated to assess discriminative ability. Additionally, we monitored the training and validation loss curves across epochs to ensure proper model convergence and absence of overfitting.
External validation dataset preparation. The external validation cohort included patients diagnosed with advanced UC at PNUYH (from 2016 to 2019). Inclusion criteria were receipt of palliative GC chemotherapy, available response assessment based on RECIST version 1.1 criteria, and sufficient FFPE tissue for RNA-seq. Total RNA was extracted from FFPE tumor samples using the RNeasy FFPE kit (Qiagen, Hilden, Germany) and quantified using NanoDrop (ThermoFisher Scientific, Waltham, MA, USA). RNA-seq was performed on the Illumina platform using the SureSelectXT RNA Direct Reagent Kit (Agilent, Santa Clara, CA, USA) for paired-end sequencing. Raw sequencing data underwent quality control and trimming using Trimmomatic-0.39-1 (29). Sequence alignment was performed using STAR 2.7.8a (30), followed by expression quantification to TPM using RSEM 1.3.3 (31). Gene expression values were normalized using log2 transformation of TPM values. Patients were classified as responders (complete or partial response), or non-responders (stable or progressive disease) based on their treatment response.
Deep learning model architecture and training. Using gene expression data, we constructed a fully connected neural network (FCNN), a fundamental deep learning architecture (32, 33), to predict patient response to chemotherapy. The network architecture consisted of an input layer with 12,961 nodes corresponding to gene features, followed by five fully connected hidden layers with decreasing dimensions (1,024, 512, 128, 32 nodes), and a single output node with sigmoid activation. This architecture resulted in approximately 13.9 million trainable parameters. To mitigate potential batch effects and internal covariate shift arising from diverse data sources (TCGA, GSE, and PNU) and different sample preservation methods (Fresh Frozen vs. FFPE), hidden layers employed layer normalization and SiLU activation functions (34, 35). Layer normalization was specifically chosen over batch normalization to ensure stable feature representation across heterogeneous datasets by normalizing each sample independently. In addition, a dropout layer was implemented before the first dense layer to prevent overfitting (36). The model was implemented in PyTorch (2.5.1+cu124) and trained using stochastic gradient descent optimization with a dynamic learning rate scheduler. Binary cross-entropy was used as the loss function, and training was conducted using mini-batch gradient descent. To ensure robust performance and assess model stability, the training procedure was done by ensemble model which was repeated 100 times with independent initializations (37).
Biological interpretation of the prediction model. To interpret the complex relationships between gene expressions and chemotherapy response in the prediction model, we performed correlation analysis between model predictions and gene expression levels using the TCGA bladder cancer cohort. The absolute values of Pearson correlation coefficients were calculated between predicted probabilities and expression levels of individual genes. The top 300 genes showing the highest absolute correlation values were selected for functional enrichment analysis. Gene ontology analysis was performed using DAVID Bioinformatics Resources 6.8, focusing on two annotation categories: Gene Ontology Biological Process (GOTERM_BP_DIRECT) and UniProt Keywords Biological Process (UP_KW_BIOLOGICAL_PROCESS) (38). Representative genes were selected from each enriched cluster based on their presence in multiple significant ontology terms. The relationships between genes and enriched ontology terms were visualized using GOplot package in R (39).
Prognostic value of model predictions. We conducted survival analysis to evaluate the prognostic value of our model’s predictions. Using the TCGA bladder cancer cohort samples not included in the training set, patients were stratified into two groups based on their predicted response probabilities. Kaplan-Meier survival analysis was performed using the survminer package in R to compare overall survival between the predicted responder and non-responder groups. Statistical significance was assessed using the log-rank test, and survival curves were generated with risk tables showing the number of patients at risk at different time points.
Results
Dataset characteristics. Our study utilized two distinct datasets for training: the TCGA bladder cancer cohort and the GSE dataset. The training set included a total of 94 samples, with 81 samples from TCGA (37 non-responders, 44 responders) and 13 samples from GSE (5 non-responders, 8 responders). For external validation, we sampled an independent cohort of 10 stage IV bladder cancer patients from PNUYH, consisting of 7 responders and 3 non-responders to GC chemotherapy. The baseline characteristics of PNUYH patients are shown in Table I. To enhance model robustness, we implemented data augmentation, resulting in a total of 4,794 augmented samples used for model development. For input feature, we identified 12,961 genes that were consistently detected across all three datasets, ensuring comprehensive transcriptomic representation while maintaining cross-dataset compatibility.
Clinicopathologic features of patients with advanced urothelial carcinoma (N =10).
Model performance in internal and external validation. The model training converged successfully over 200 epochs, with consistent decreases in loss values for both training and validation sets (Figure 2B, E). In internal validation, the model demonstrated excellent discriminative ability with ROC curves showing high performance across multiple cross-validation folds (Figure 2A). We analyzed precision, recall, and F1-score metrics across different threshold values and determined 0.4 as the optimal prediction threshold (Figure 2C). The 5-fold cross-validation results showed consistent performance with average accuracy, sensitivity, specificity, precision, and F1-scores of 0.914, 0.906, 0.920, 0.902, and 0.904 for training, and 0.905, 0.892, 0.915, 0.895, and 0.893 for validation, respectively (Figure 2F).
Performance evaluation of the deep learning model in internal and external validation. (A-C) Internal validation results: (A) ROC curves are presented for the training and internal validation sets; (B) Training and validation loss curves over 200 epochs showing stable convergence; (C) Precision, recall, and F1-score metrics across different threshold values with optimal threshold (0.4) indicated by the vertical dashed line. (D-G) External validation results: (D) ROC curves for training and external test sets with respective AUROCs; (E) Loss curves during external validation showing good generalization; (F) Detailed performance metrics for each fold during 5-fold cross-validation; (G) Final performance metrics and confusion matrix of the model in training and external validation datasets. AUROC: Area under the receiver operating characteristic curve; ROC, receiver operating characteristic.
In external validation using an independent cohort of stage IV bladder cancer patients (PNUYH, n=10), the model maintained robust performance with an area under the receiver operating characteristic curve (AUROC) of 0.90 (Figure 2D). When applying the optimal threshold of 0.4, the model achieved an accuracy of 90.0% (95% CI=55.5-99.7), with a sensitivity for non-responders of 66.7% (95% CI=9.4-99.2), specificity of 100.0% (95% CI=59.0-100.0), and precision of 1.000 (Figure 2G), demonstrating strong generalizability in identifying treatment response in a completely independent patient population.
Biological functions associated with chemotherapy response prediction. Gene enrichment analysis of the top 300 correlated genes revealed four major functional clusters. The significance of enriched terms was visualized using negative log2-transformed p-values (Figure 3A). The first cluster contained terms related to DNA damage response and repair, with DNA damage and DNA repair being significantly enriched across multiple ontology databases. Key genes in this cluster included BARD1, BRCA2, RIF1, SMC3, USP1 and MSH6. The second cluster comprised cell cycle regulation terms including cell cycle, cell division, and mitosis. This cluster included genes KIF11, CDK8, PRC1 and CENPE. The third cluster was associated with kinesins and microtubule-based movement, featuring genes KIF11, CENPE, KIF4B, KIF1B and KIF23. The fourth cluster contained terms related to mitotic cytokinesis and RHOA GTPase cycle regulation. The relationships between representative genes and their associated biological processes were visualized using a chord diagram (Figure 3B), illustrating how many genes were involved in multiple functional pathways, particularly connecting cell cycle regulation with DNA repair mechanisms.
Gene ontology enrichment analysis of chemotherapy response-associated genes. (A) Bar plots showing enriched biological process terms in four major functional clusters. The x-axis represents negative log2-transformed p-Values, and bars are colored by gene ontology category. (B) Chord diagram illustrating the relationships between representative genes and their associated biological processes.
Prognostic significance and stage-specific predictions. We next investigated whether our model’s predictions correlated with patient survival outcomes across different disease stages (Figure 4). Kaplan-Meier analysis revealed that in early-stage (Stage I/II) bladder cancer patients, the model’s predictions demonstrated significant prognostic value (p=0.019), with predicted responders showing markedly better survival than predicted non-responders (Figure 4A). This prognostic distinction was not statistically significant in Stage III (p=0.69, Figure 4B) or Stage IV patients (p=0.2, Figure 4C). Notably, we observed a gradual increase in prediction values across disease stages, with the highest prediction scores observed in Stage IV patients (Figure 4D). The median prediction values were 0.275, 0.314, and 0.527 for Stage I/II, Stage III, and Stage IV patients, respectively, while mean prediction values increased from 0.376 in Stage I/II to 0.414 in Stage III and 0.514 in Stage IV. This stage-dependent pattern suggests that our model captures molecular features that may be more prevalent in advanced disease, potentially reflecting differences in tumor biology and treatment response mechanisms across disease stages.
Stage-specific survival analysis and prediction distributions. (A-C) Kaplan-Meier survival curves stratified by model predictions (Negative: predicted responders, Positive: predicted non-responders) across different disease stages: (A) Stage I/II patients showing significant survival difference (p=0.019); (B) Stage III patients showing no significant difference in survival outcomes (p=0.69); (C) Stage IV patients showing a non-significant trend toward better survival in predicted responders (p=0.2). Numbers of patients at risk are shown below each graph. (D) Box plot showing the distribution of prediction values across disease stages, with individual data points, medians (blue lines), and means (red dots) displayed. Higher prediction values indicate greater likelihood of non-response to chemotherapy.
Discussion
Our deep learning model demonstrated strong predictive performance, achieving 94.7% accuracy in the training dataset and 90.0% accuracy in external clinical validation from PNUYH. Notably, the model maintained 100.0% specificity, indicating its reliability in identifying true responders. The high accuracy in an independent cohort, derived from FFPE tissue samples, underscores its robustness across different patient populations and clinical settings. To justify the use of a deep learning architecture (FCNN) despite its complexity, we conducted a benchmarking analysis against traditional machine learning models, including Logistic Regression, Random Forest, and Support vector machines. Our FCNN significantly outperformed these simpler models, which showed limited generalizability on the external cohort (Table II). This suggests that the multi-layered deep learning approach is essential for capturing the high-order, non-linear interactions within the high-dimensional transcriptomic space (12,961 genes) that simpler models fail to distill. Our comprehensive transcriptomic approach offers advantages by integrating information from thousands of genes, capturing complex gene interaction patterns that contribute to treatment response, rather than relying on single-gene markers that may not fully reflect the heterogeneous mechanisms of gemcitabine and cisplatin resistance.
Performance comparison between the proposed deep learning model and traditional machine learning algorithms in the external validation cohort (N=10).
Functional analysis of the top 300 associated genes revealed four major clusters: DNA damage response, cell cycle regulation, kinesins/microtubule dynamics, and mitotic cytokinesis, all of which are closely linked to the mechanism of action of chemotherapy. Among these, genes involved in DNA damage response, including BARD1, BRCA2, RIF1, and USP1, emerged as key predictive markers, reinforcing the established role of DNA repair pathways in chemotherapy sensitivity (40-43). These findings reinforce the role of DNA repair pathways in chemotherapy sensitivity, suggesting that transcriptomic profiling of these genes could enhance response prediction. Particularly intriguing was our observation of increasing prediction scores with advancing disease stage, suggesting that molecular features associated with chemotherapy resistance become more prevalent in advanced tumors. This progressive molecular shift may explain why early-stage tumors showed significant survival stratification based on our model’s predictions, while advanced tumors exhibited generally higher resistance signatures regardless of individual response outcomes.
The ability to predict chemotherapy response before treatment initiation has significant clinical implications, particularly in the neoadjuvant setting where unnecessary chemotherapy can delay definitive surgery and impact quality of life (7, 44). In the field of UC, various deep learning models have been developed for early detection (45), and RNA-based approaches have also been explored for therapeutic target identification (46). Our model’s high specificity in identifying non-responders suggests it could be valuable in helping clinicians avoid ineffective treatments in patients unlikely to benefit from chemotherapy. The prognostic value demonstrated in early-stage patients further indicates potential utility in risk stratification and treatment planning. For patients predicted to have high resistance to GC regimen, alternative approaches such as immune checkpoint inhibitors or antibody-drug conjugates could be considered earlier in the treatment course (47, 48). Additionally, the model’s performance on FFPE samples–the standard specimen type in clinical practice–enhances its translational potential, suggesting feasibility for integration into routine pathological workflows after further validation. The consistent performance across different gene expression platforms also indicates robust technical transferability, which is essential for clinical implementation.
Despite the promising results, our study has several limitations. First, while our model demonstrated high accuracy, the sample size remains modest (n=94 for discovery, n=10 for PNUYH validation). To ensure stable model convergence given the high-dimensional transcriptomic features and limited sample size, data augmentation was applied to the discovery cohort prior to cross-validation. While we acknowledge that this strategy may lead to potential data leakage and subsequent inflation of internal validation metrics, the robust performance observed in the completely independent, non-augmented PNUYH external validation cohort (accuracy 90.0%, AUROC 0.90) confirms the model’s genuine generalizability and clinical utility. This indicates that the model successfully learned robust biological features associated with treatment response rather than overfitting to augmented noise. Nonetheless, prospective multicenter studies with larger cohorts are warranted to further validate these findings. Second, our current model relies on bulk RNA sequencing, which may not fully capture tumor heterogeneity or the influence of the tumor microenvironment on treatment response. Integration of spatial transcriptomics or single-cell approaches could provide deeper insights into resistance mechanisms. Third, the model’s interpretability, while improved through our functional analysis, still presents challenges for direct clinical translation. Future work should focus on developing more transparent predictive models that maintain performance while providing clearer biological insights.
Conclusion
This study demonstrates the potential of deep learning-based transcriptomic analysis in predicting GC chemotherapy response in UC patients. By leveraging high-dimensional RNA-seq data, our model effectively identified molecular features associated with treatment outcomes and maintained high predictive accuracy in external validation. These findings highlight the feasibility of integrating transcriptomic deep learning models into clinical workflows, emphasizing their potential as a clinically viable decision-support tool to guide personalized treatment strategies.
Footnotes
Data Availability Statement
The sequencing datasets have been deposited in the Gene Expression Omnibus (GEO) repository under accession number GSE290327.
Conflicts of Interest
Juwon Kang, Jihoon Kang, and Yi Rang Kim are employees of ONCOCROSS Co., Ltd.
Authors’ Contributions
Conceptualization, K.P., YR.K.; methodology, J.K. and HJ.L; software, JH.L; validation, JH.L. and J.C.; formal analysis, J.K. and JH.L. ; investigation, J.K. and J.C.; resources, SB.O., JK.N., TU.K., H.R., and YK.K.; data curation, J.K. HJ.L., J.K; writing – original draft preparation, J.K.; writing – review and editing, HJ.L., SB.O., JK.N., TU.K., H.R., YK.K., YR.K., J.C., and K.P; visualization, J.K. and JH.L.; supervision, J.C. and K.P.; funding acquisition, K.P. All Authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 202107620001) and by a grant (21153MFDS601) from the Ministry of Food and Drug Safety in 2023.
Artificial Intelligence (AI) Disclosure
During the preparation of this manuscript, a large language model (ChatGPT, OpenAI) was used solely for language editing and stylistic refinement in select paragraphs. No content related to the generation, analysis, or interpretation of research data were produced by generative AI. All scientific content and conclusions were developed and verified by the authors. No figures or visual data were generated or modified using generative AI or machine learning-based image enhancement tools.
- Received January 14, 2026.
- Revision received February 15, 2026.
- Accepted February 24, 2026.
- Copyright © 2026 The Author(s). Published by the International Institute of Anticancer Research.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).










