Abstract
Background/Aim: Lung adenocarcinoma (LUAD) is the most common subtype of non-small cell lung cancer (NSCLC) and remains associated with poor clinical outcomes due to pronounced molecular heterogeneity and limited prognostic biomarkers. Long non-coding RNAs (lncRNAs) have emerged as important regulators of cancer biology, yet their systematic association with disease progression and survival in LUAD remains incompletely defined. This study aimed to identify lncRNAs that robustly associate with LUAD progression and prognosis.
Materials and Methods: Pre-processed lncRNA expression data for 488 LUAD tumors and 58 normal lung tissues were obtained from The Cancer Genome Atlas (TCGA) via the TANRIC platform. Following expression filtering, stage-wise differential expression analysis was performed using Welch’s t-test with false discovery rate correction. Kaplan–Meier survival analysis was used to evaluate prognostic relevance. Rank-based trend analyses using Spearman correlation were conducted to assess monotonic expression changes across tumor stage, lymph-node status, and tumor size.
Results: We identified 68 lncRNAs consistently upregulated across all LUAD stages relative to normal lung tissue. Survival analysis revealed that higher expression of several lncRNAs was associated with poorer overall survival. Among these, FAM83A-AS1, CYTOR, and MIR4435-2HG emerged as prominent candidates, exhibiting consistent tumor-associated overexpression and adverse survival association. Importantly, these three lncRNAs also demonstrated significant monotonic trends across increasing lymph-node involvement and primary tumor size, indicating a close association with tumor burden and disease aggressiveness.
Conclusion: This integrative analysis identifies FAM83A-AS1, CYTOR, and MIR4435-2HG as robust poor-prognosis-associated lncRNAs in LUAD. Their coordinated behavior across expression, survival, nodal status, and tumor size highlights their potential utility as prognostic biomarkers and provides a framework for lncRNA-based risk stratification in lung adenocarcinoma.
- Long non-coding RNA
- long ncRNA
- LncRNA
- prognostic biomarker
- LUAD
- cancer transcriptomics
- functional enrichment analysis
Introduction
Lung adenocarcinoma (LUAD) is the most prevalent subtype of non-small cell lung cancer (NSCLC), accounting for approximately 40% of all lung cancer cases worldwide (1, 2). Despite advancements in diagnostic and therapeutic strategies, LUAD remains a major cause of cancer-related mortality, largely due to late-stage diagnosis, high metastatic potential, and the absence of reliable prognostic biomarkers (3, 4). Current prognostic indicators, including TNM staging and histopathological classification, provide essential clinical insights but fail to capture the molecular heterogeneity of LUAD comprehensively (5, 6). This limitation reduces their accuracy in predicting patient outcomes and guiding targeted therapies. Consequently, there is a pressing need to identify novel molecular biomarkers that can enhance LUAD classification, improve risk stratification, and facilitate personalized treatment approaches (7).
In recent years, long non-coding RNAs (lncRNAs) have emerged as critical regulators in cancer biology (8). LncRNAs, defined as RNA transcripts longer than 200 nucleotides that do not encode proteins, influence gene expression, chromatin remodeling, and cellular signaling pathways (9). Studies have demonstrated that lncRNAs contribute to several hallmarks of cancer, including uncontrolled proliferation, evasion of apoptosis, and metastasis (10). For example, lncRNA XIST has been shown to promote proliferation, migration, and invasion in esophageal squamous cell carcinoma by regulating the miR-186-5p/ZEB1 axis, highlighting a conserved oncogenic competing endogenous RNA (ceRNA) mechanism across epithelial cancers (11). Similarly, bioinformatic analyses have revealed that lncRNA H19 regulates broad gene expression networks involved in tumor growth and signaling pathways in glioma, underscoring the capacity of lncRNAs to act as master regulators of cancer-associated transcriptional programs (12). Consistent with these cross-cancer observations, lncRNAs such as MALAT1 and HOTAIR have been implicated in LUAD progression by promoting epithelial-to-mesenchymal transition (EMT) and modulating oncogenic signaling pathways (13-15). Additionally, LINC01234 and LINC01116 have been identified as prognostic indicators in LUAD, with their elevated expression correlating with poor survival outcomes (16, 17). Similarly, SNHG3, LOC100506691 and SNHG7 have been associated with LUAD metastasis, playing crucial roles in tumor invasion and migration (18-20). These findings highlight the potential of lncRNAs as both biomarkers and therapeutic targets in LUAD.
At the molecular level, lncRNAs exert their effects through diverse mechanisms, including acting as ceRNAs, scaffolds for protein complexes, and regulators of transcriptional and post-transcriptional processes (8). Many oncogenic lncRNAs function by sponging tumor-suppressive microRNAs, thereby enhancing the expression of oncogenic target genes (9). For example, LINC00336 promotes LUAD progression by functioning as a ceRNA for miR-6852, leading to the upregulation of the oncogene cystathionine-β-synthase (21). Other lncRNAs interact with chromatin-modifying complexes to reshape the epigenetic landscape; for instance, HOTAIR recruits the PRC2 complex to silence tumor suppressor genes, facilitating LUAD progression (15). Additionally, some lncRNAs regulate transcriptional and post-transcriptional processes, such as FEZF1-AS1, which enhances β-catenin signaling and activates the Wnt pathway, thereby promoting tumorigenesis (22). In addition to these molecular mechanisms, large-scale integrative analyses across multiple cancer types have demonstrated that subsets of lncRNAs are closely associated with tumor immune infiltration and patient prognosis, underscoring their emerging role as immunogenic biomarkers in cancer (23). The biochemical versatility of lncRNAs underscores their critical role in LUAD pathogenesis and highlights their potential as diagnostic and prognostic biomarkers.
To address these gaps, there is a need for integrative analyses that combine differential expression, survival association, and progression-aware evaluation across multiple clinical dimensions. Such approaches can distinguish lncRNAs that are merely dysregulated from those that consistently track with disease severity and patient outcome. In particular, identifying lncRNAs whose expression increases in a graded manner with advancing tumor stage, nodal involvement, and tumor size may provide insights into molecular drivers of LUAD aggressiveness and facilitate improved risk stratification.
In this study, we performed a comprehensive analysis of lncRNA expression in LUAD using The Cancer Genome Atlas (TCGA)-derived transcriptomic and clinical data. We systematically identified lncRNAs that are consistently dysregulated across tumor stages, evaluated their association with overall survival, and examined their expression dynamics in relation to pathological stage, lymph-node involvement, and primary tumor size. By integrating these complementary analytical layers, we aimed to identify lncRNAs that robustly reflect LUAD progression and prognosis. This strategy led to the identification of a small subset of lncRNAs with coherent behavior across multiple clinical axes, highlighting their potential utility as prognostic biomarkers and providing a framework for future lncRNA-based stratification in LUAD.
Materials and Methods
Data acquisition and processing. Transcriptomic datasets for TCGA-LUAD were obtained from TCGA, and pre-processed lncRNA expression data (reads per kilobase per million mapped reads, RPKM) were downloaded from the TANRIC platform, which quantifies lncRNA expression as RPKM values based on TCGA RNA-seq BAM files (24). The dataset included 488 LUAD tumors and 58 normal lung tissue samples, with >12,000 lncRNAs assessed for differential expression. This sample distribution reflects the inherent structure of the TCGA-LUAD dataset, where tumor samples outnumber normal tissues due to the primary focus on tumor characterization.
Expression filtering and normalization. To restrict analyses to robustly expressed transcripts and reduce noise associated with low abundance lncRNAs, an expression filter was applied before differential expression testing. Only lncRNAs with RPKM >1 in at least 20% of all samples (≥110 of 546 total samples) were retained. This filtering step reduced the dataset from >12,000 to 959 lncRNAs and was applied uniformly across all subsequent analyses. Filtered RPKM values were transformed using a single variance-stabilizing transformation, log2(RPKM + 1), which was used consistently for all tumor-normal and stage-wise comparisons.
Identification of dysregulated lncRNAs. Differential expression between LUAD tumors and normal lung tissues was assessed independently for each tumor stage. Given the unequal sample sizes between tumor and normal groups, differential expression testing was performed using a two-sided Welch’s t-test (unequal variances). Nominal p-values were adjusted for multiple testing using the Benjamini–Hochberg false discovery rate (FDR) procedure, and lncRNAs with FDR <0.05 were considered significantly dysregulated. For each stage, lncRNAs were classified as upregulated or downregulated relative to normal lung tissue using an absolute log2 fold-change threshold ≥1 (corresponding to ≥2-fold change on the linear scale). Among the 959 filtered lncRNAs, 168 were upregulated in stage I, 155 in stage II, 143 in stage III, and 96 in stage IV. Intersection of these stage-wise upregulated sets yielded 68 lncRNAs that were consistently upregulated across all four LUAD stages. Similarly, 92, 101, 109, and 78 lncRNAs were downregulated in stages I-IV, respectively, with 67 lncRNAs showing consistent downregulation across all stages.
Trend analysis across LUAD pathological stages. To formally assess whether lncRNA expression followed a monotonic trend across ordered tumor stages (I < II < III < IV), a non-parametric trend test was applied to the 68 lncRNAs consistently upregulated across all stages. For each lncRNA, expression values across all tumor samples were ranked, and stage-specific average ranks were computed to account for unequal sample sizes (stage I: n=170; stage II: n=70; stage III: n=62; stage IV: n=17). Spearman’s rank correlation coefficient (ρ) was then calculated between the ordered stage ranks and the corresponding expression ranks, providing a measure of monotonic association between lncRNA expression and tumor stage. Statistical significance of the trend was evaluated using a two-sided t-distribution approximation for Spearman’s ρ. Resulting p-Values were adjusted for multiple testing across the 68 lncRNAs using the Benjamini–Hochberg procedure, and lncRNAs with trend q-values <0.05 were considered to exhibit a significant monotonic expression trend.
Trend analysis across lymph-node status and tumor size. To evaluate whether lncRNA expression tracked with lymph-node involvement and primary tumor size, additional rank-based trend analyses were performed using the same statistical framework described above. LUAD samples were stratified according to pathological nodal status (Normal, N0, N1, N2, and N3) and tumor size categories (Normal, T1, T2, T3, and T4), as defined by TCGA clinical annotations. For each lncRNA, expression values across all samples within a given analysis were ranked, and ordered clinical categories were treated as ordinal variables. Spearman’s rank correlation coefficient (ρ) was calculated between expression ranks and ordered nodal or tumor-size categories to quantify monotonic expression changes. Statistical significance was assessed using a two-sided test for Spearman’s correlation and resulting p-values were reported as indicators of trend strength. These analyses were applied specifically to the prognostically relevant lncRNAs highlighted in the study.
Sex-based expression comparison. To assess whether lncRNA expression differed between male and female LUAD patients, sex-stratified expression analyses were performed for the selected prognostic lncRNAs. Patients were grouped according to biological sex as annotated in the TCGA clinical dataset. For each lncRNA, expression levels between male and female groups were compared using a two-sided Welch’s t-test to account for potential differences in group size and variance. Statistical significance was evaluated at a nominal p-value threshold of 0.05, and results were annotated as non-significant where appropriate.
Cytoplasmic-nuclear localization analysis. The subcellular localization of MIR4435-2HG, CYTOR, and FAM83A-AS1 was assessed using Cytoplasmic-Nuclear Relative Concentration Index (CN-RCI) data from the lncATLAS database (V24). CN-RCI values quantify the relative enrichment of transcripts in cytoplasmic versus nuclear compartments across multiple human cell lines. Positive CN-RCI values indicate cytoplasmic enrichment, whereas negative values indicate nuclear localization. Localization patterns were interpreted to infer potential functional roles, with cytoplasmic enrichment suggesting involvement in post-transcriptional regulation and nuclear localization implicating transcriptional or chromatin-associated functions.
Survival analysis. Kaplan–Meier survival analyses were performed to evaluate associations between lncRNA expression levels and overall survival in LUAD patients. For each lncRNA, patients were stratified into high- and low-expression groups based on median expression values. Survival curves were generated using ENCORI/starBase, which integrates TCGA clinical follow-up data with transcriptomic profiles. Statistical significance of survival differences between groups was assessed using the log-rank test, and hazard ratios were calculated to estimate relative risk.
Visualization and statistical tools. Data processing, tabulation, and graphical formatting were carried out using Microsoft Excel 2019 and Microsoft PowerPoint 2019 (Microsoft Corporation, Redmond, WA, USA). Data are presented as mean±standard deviation (SD) unless otherwise specified.
Results
Identification and characterization of differentially expressed lncRNAs across LUAD stages. To systematically identify prognostic lncRNAs associated with LUAD, we implemented a stepwise, objective analytical pipeline using transcriptomic and clinical data from the TCGA-LUAD cohort (Figure 1A). This cohort comprised 58 normal lung tissue samples and 488 LUAD tumor samples spanning clinical stages I-IV, with expression data available for approximately 12,000 annotated lncRNAs. As an initial quality-control step, we applied an expression-based filter to remove low-abundance transcripts that are more susceptible to technical noise. Specifically, only lncRNAs with RPKM >1 in at least 20% of samples were retained. This predefined and objective filtering criterion reduced the dataset to 959 lncRNAs, ensuring sufficient expression signal for robust downstream statistical analysis. Differential expression analysis was then performed independently for each tumor stage (Stage I, II, III, and IV) by comparing LUAD tumor samples to normal lung tissues. Expression values were assessed for statistical significance using a two-sided Welch’s t-test, which does not assume equal variance between groups. To control for multiple hypothesis testing, p-Values were adjusted using the Benjamini-Hochberg FDR correction. LncRNAs meeting the predefined thresholds of FDR <0.05 and a minimum of 2-fold change relative to normal samples were considered significantly dysregulated. Using these criteria, we identified 168 upregulated lncRNAs in Stage I, 155 in Stage II, 143 in Stage III, and 96 in Stage IV (Figure 1B). To avoid subjective prioritization of stage-specific candidates and to focus on robust tumor-associated signals, we adopted an intersection-based strategy across tumor stages. Only lncRNAs that were consistently upregulated in all four LUAD stages relative to normal lung tissue, while meeting the same statistical thresholds in each comparison, were retained. This approach yielded a core set of 68 lncRNAs that displayed reproducible and stage-independent upregulation across LUAD progression (Figure 1B). All subsequent survival and prioritization analyses were strictly restricted to this predefined candidate set.
Identification and characterization of differentially expressed long non-coding RNAs (lncRNAs) across lung adenocarcinoma (LUAD) stages. (A) Schematic overview of the analytical pipeline used for the objective identification and prioritization of prognostic lncRNAs in LUAD. lncRNA expression profiles from TCGA-LUAD (58 normal lung samples and 488 tumor samples) were analyzed in a stepwise manner, beginning with expression filtering, followed by stage-wise differential expression analysis, intersection-based candidate selection, and survival-based prioritization. (B) Flowchart depicting the stepwise filtering and selection of differentially expressed lncRNAs. From ~12,000 annotated lncRNAs, transcripts with low expression were removed by retaining only those with RPKM >1 in at least 20% of samples, yielding 959 ENSGs. These were then compared independently between normal samples and each tumor stage (Stage I-IV) using log2 (RPKM + 1) transformed values, Welch’s t-test, and Benjamini–Hochberg false discovery rate (FDR) correction (FDR <0.05, ≥2-fold change). This analysis identified 168 upregulated lncRNAs in Stage I, 155 in Stage II, 143 in Stage III, and 96 in Stage IV. Intersection analysis across all four stages yielded a core set of 68 lncRNAs consistently upregulated in all tumor stages relative to normal lung tissue, which were carried forward for subsequent analyses. (C) Selection of poor-prognosis lncRNAs from the 68 consistently upregulated candidates. Kaplan–Meier overall survival analysis was systematically performed for all 68 lncRNAs by stratifying patients into high and low-expression groups based on median expression. Five lncRNAs were selected for display based solely on predefined survival criteria: hazard ratio (HR) ≥1.5, log-rank p<0.05, and prior inclusion in the 68-lncRNA core set. These lncRNAs represent tumor-upregulated candidates whose higher expression is associated with poorer overall survival. (D) Identification of prognostic lncRNAs that are downregulated in LUAD tumors relative to normal tissue. Using the same statistical framework, stage-wise comparisons identified 92 downregulated lncRNAs in Stage I, 101 in Stage II, 109 in Stage III, and 78 in Stage IV. Intersection analysis across all four stages yielded 67 lncRNAs consistently downregulated in tumors. Kaplan–Meier survival analysis of these 67 lncRNAs identified four candidates whose higher expression was associated with significantly improved overall survival (HR ≤0.65, log-rank p<0.05), indicating potential tumor-suppressive or protective roles. These four lncRNAs’ survival plots are shown in Supplementary Figure 1. (E) Stage-wise expression of representative upregulated lncRNAs across LUAD progression. Samples are color-coded by stage: normal (green), stage I (yellow), stage II (blue), stage III (purple), and stage IV (red).
We next evaluated the prognostic relevance of these 68 consistently upregulated lncRNAs by performing Kaplan-Meier overall survival analysis using TCGA clinical data. Patients were stratified into high- and low-expression groups based on median lncRNA expression levels, and survival differences were assessed using the log-rank test, with corresponding hazard ratios calculated for each lncRNA. From this systematic and unbiased survival screen, five lncRNAs (AC245595.1, FAM83A-AS1, CYTOR, MIR4435-2HG, and AP001453.2) were selected based solely on predefined quantitative criteria: (i) hazard ratio ≥ 1.5, (ii) log-rank p-value <0.05, and (iii) prior inclusion in the 68-lncRNA core set (Figure 1B). Importantly, no additional biological assumptions or manual curation were applied at this stage. The complete survival analysis for all 68 lncRNAs is presented in Figure 2.
High expression of selected long non-coding RNAs (lncRNAs) correlates with poor overall survival in lung adenocarcinoma (LUAD) patients. (A-E) Kaplan–Meier overall survival plots for representative lncRNAs selected from the objectively defined candidate set described in Figure 1. Survival analysis was performed for all 68 lncRNAs consistently upregulated across all four LUAD stages relative to normal lung tissue. Patients were stratified into high- and low-expression groups based on median expression, and survival differences were assessed using the log-rank test, with corresponding hazard ratios (HRs) calculated. Panels (A-E) show lncRNAs that met predefined survival-based selection criteria, including significant association with overall survival (log-rank p<0.05) and elevated hazard ratios, indicating poorer prognosis in patients with higher expression. These lncRNAs were subsequently highlighted in Figure 1C as poor-survival–associated candidates. (F) Cytoplasmic–Nuclear Relative Concentration Index (CN-RCI) values for FAM83A-AS1, CYTOR, and MIR4435-2HG were obtained from the lncATLAS database (V24), which compiles RNA-seq-based nuclear and cytoplasmic fractionation data across multiple human cell lines. The dataset includes both lung-derived and non-lung cell lines representing diverse tissue origins, including A549 and NCI-H460 (lung cancer), GM12878 and K562 (hematopoietic/lymphoblastoid), H1-hESC (human embryonic stem cells), HeLa (cervical adenocarcinoma), HepG2 (hepatocellular carcinoma), HT1080 (fibrosarcoma), HUVEC (endothelial cells), MCF-7 (breast cancer), NHEK (normal human epidermal keratinocytes), SK-MEL-5 (melanoma), and SK-NSH (neuroblastoma). Violin plots depict the distribution of CN-RCI values for lncRNAs (blue) alongside protein-coding RNAs (orange) within each cell type. Positive CN-RCI values indicate cytoplasmic enrichment, whereas negative values indicate nuclear enrichment. The distributions shown reflect global localization tendencies across cell types rather than lung-specific localization. CN-RCI data were not available for AC245595.1 and AP001453.2, and these lncRNAs were therefore not included in the analysis.
In parallel, we also examined lncRNAs that were consistently downregulated in LUAD tumors relative to normal lung tissue using the same statistical framework. Stage-wise comparisons identified 92, 101, 109, and 78 downregulated lncRNAs in Stages I-IV, respectively. Intersection analysis across all four stages yielded 67 lncRNAs that were consistently downregulated in tumors (Figure 1D). Kaplan–Meier survival analysis of this group revealed four lncRNAs whose higher expression was significantly associated with improved overall survival (hazard ratio ≤0.65, log-rank p<0.05), suggesting potential tumor-suppressive or protective roles (Figure 1D). These four candidates are shown in Figure 1D, and their corresponding Kaplan–Meier survival plots are provided in Supplementary Figure 1. In addition, pairwise comparisons between LUAD tumor stages were performed to assess stage-to-stage lncRNA dysregulation as shown in Supplementary Figure 2.
Finally, to visualize expression dynamics across disease progression, we examined the stage-wise expression patterns of representative upregulated lncRNAs across normal lung tissue and LUAD Stages I-IV (Figure 1E). This analysis demonstrated that selected lncRNAs are consistently elevated across LUAD stages relative to normal lung tissue, supporting their relevance as tumor-associated and prognostically informative candidates.
High expression of selected lncRNAs in LUAD correlates with poor survival outcomes. Following the objective screening pipeline described in Figure 1, we next assessed whether the 68 lncRNAs that were consistently upregulated across LUAD Stages I-IV (relative to normal lung) also carried prognostic significance. Briefly, lncRNAs were filtered for sufficient expression (RPKM >1 in ≥20% of samples), differentially expressed using log2(RPKM + 1) transformation, and tested stage-wise by two-sided Welch’s t-test with Benjamini–Hochberg FDR correction (FDR <0.05; ≥2-fold change), followed by an intersection step to retain only those robustly upregulated across all stages. These steps were adopted to ensure statistical control and fully reproducible candidate selection.
To determine clinical relevance, Kaplan-Meier overall survival analyses were performed systematically for all 68 lncRNAs using TCGA clinical follow-up data. Patients were stratified into high- and low-expression groups based on the median expression of each lncRNA, and survival differences were evaluated by the log-rank test, with hazard ratios calculated (Supplementary Table I). From this complete survival screen, five lncRNAs-AC245595.1, FAM83A-AS1, CYTOR, MIR4435-2HG, and AP001453.2, are shown as representative candidates that satisfied the predefined selection criteria (log-rank p<0.05 with elevated hazard ratios), demonstrating that higher expression is associated with poorer overall survival in LUAD patients (Figure 2A-E).
Importantly, several of these candidates have independent support in the published literature, consistent with the biological plausibility of the pipeline-derived hits. FAM83A-AS1 has been reported as an oncogenic lncRNA in LUAD, where its higher expression associates with worse survival and functional studies implicate it in tumor progression (25). CYTOR has also been described as upregulated in LUAD and linked to aggressive behavior and poor prognosis (26, 27). Similarly, MIR4435-2HG has been reported to promote lung cancer progression and has been repeatedly associated with unfavorable outcomes across cancers, including lung cancer-related contexts (28). For the less-characterized candidates, published prognostic-signature studies in LUAD have also included AC245595.1 in immune-related prognostic models and AP001453.2 in LUAD survival signatures (e.g., necroptosis-related lncRNA prognostic models), supporting their recurrence as clinically relevant transcriptomic markers (29, 30).
Finally, to add functional context to the survival-prioritized lncRNAs, we examined subcellular localization using lncATLAS CN-RCI data (Figure 2F). CYTOR and MIR4435-2HG show predominantly positive CN-RCI values across the displayed cell lines, consistent with cytoplasmic enrichment and potential roles in post-transcriptional regulation. FAM83A-AS1 is included for comparison in the same panel, while CN-RCI localization data were not available for AC245595.1 and AP001453.2, and thus, they were not assessed in this analysis.
Progressive upregulation of lncRNAs across LUAD stages identified by trend-test analysis. Having identified a robust set of lncRNAs that are consistently dysregulated across LUAD stages using the objective pipeline described in Figure 1, we next asked whether a subset of these candidates exhibits progressive expression changes with advancing tumor stage. To address this, we evaluated monotonic expression behavior across ordered LUAD stages (Stage I < Stage II < Stage III < Stage IV) using a trend-test framework with FDR control. This approach allowed us to specifically identify lncRNAs whose expression changes track with disease progression, rather than relying on multiple pairwise stage comparisons. The trend analysis was performed after applying statistical preprocessing and was restricted to the candidate lncRNAs defined by consistent tumor-associated dysregulation. Using this ordinal framework, 59 lncRNAs showed statistically significant monotonic trends across LUAD stages (FDR-adjusted q<0.05), indicating that their expression levels change in a consistent direction as tumors advance from early to late stages. These results are summarized, with stage-wise tumor sample sizes explicitly accounted for (Stage I: n=170; Stage II: n=70; Stage III: n=62; Stage IV: n=17) in the trend analysis (Figure 3).
Trend-based identification of long non-coding RNAs (lncRNAs) showing progressive expression changes across lung adenocarcinoma (LUAD) stages. (A) Rank-based trend analysis of lncRNA expression across ordered LUAD stages. Following the objective selection of 68 lncRNAs that were consistently upregulated across all four tumor stages relative to normal lung tissue (as described in Figure 1), a formal trend test was performed by treating tumor stage as an ordinal variable (Stage I < Stage II < Stage III < Stage IV). For each lncRNA, expression values from all LUAD tumor samples were ranked, and stage-wise average expression ranks were calculated. The strength and direction of monotonic expression change across stages were quantified using a rank-correlation–based trend coefficient, and statistical significance was assessed with FDR correction. Panel A displays lncRNAs that showed statistically significant monotonic trends (FDR-adjusted q<0.05), ordered according to the strength of the trend. Positive values indicate increasing expression with advancing tumor stage. (B) Stage-wise expression patterns of representative lncRNAs showing strong monotonic trends across LUAD progression. From the set of trend-positive lncRNAs identified in panel A, a subset of 12 lncRNAs with the strongest positive trends (≥0.8) was selected for visualization. Expression levels are shown across normal lung tissue and LUAD Stages I-IV, illustrating a progressive increase in expression with tumor advancement. Each data point represents an individual sample, and samples are grouped by clinical stage. This subset was chosen solely to facilitate visualization of the trend-based behavior and does not alter the conclusions derived from the full trend analysis. Sample sizes for each stage were as follows: Stage I (n=170), Stage II (n=70), Stage III (n=62), and Stage IV (n=17).
A representative subset of 12 lncRNAs (from BCASL3 to GAS5) displayed the strongest positive monotonic trends (trend coefficient >0.8, as indicated in the figure), illustrating a clear and progressive increase in expression from Stage I through Stage IV (Figure 3B). The identification of these strongly trending lncRNAs suggests that, beyond being differentially expressed in tumors, a subset of lncRNAs may serve as molecular indicators of LUAD progression, reflecting increasing disease severity. Notably, some of the lncRNAs identified in this trend-based analysis have prior support in the literature for roles in cancer biology and disease progression. For example, GAS5 is a well-characterized lncRNA reported to function as a tumor suppressor, with altered expression linked to poor prognosis and disease progression in multiple cancers, including lung cancer (31). Conversely, several other lncRNAs within the trending set, such as BCASL3, BCAN-AS2, TSNAX-DT, and LINC02067, remain relatively underexplored in the context of LUAD. Their emergence from an ordinal, FDR-controlled trend analysis highlights them as high-confidence, progression-associated candidates warranting further functional and clinical investigation.
Together, these findings demonstrate that a subset of lncRNAs shows progressive, stage-dependent expression changes across LUAD, complementing the differential expression and survival analyses presented earlier. This trend-based perspective adds an additional layer of biological insight, prioritizing lncRNAs whose expression patterns consistently track with tumor advancement and may contribute to LUAD progression or serve as markers of disease severity.
Expression dynamics of prognostic lncRNAs across lymph-node status and tumor size in LUAD. To further examine whether the prognostically significant lncRNAs identified in earlier analyses also track with indicators of tumor burden and disease spread, we evaluated their expression patterns across increasing lymph-node involvement and primary tumor size. Relative expression of AC245595.1, FAM83A-AS1, CYTOR, MIR4435-2HG, and AP001453.2 was assessed across ordered nodal stages (Normal, N0-N3) and tumor size categories (Normal, T1-T4) (Figure 4A, B).
Expression and monotonic trend analysis of prognostic long non-coding RNAs (lncRNAs) across lymph-node status and tumor size in lung adenocarcinoma (LUAD). (A) Relative expression of selected lncRNAs across lymph-node status in LUAD patients (Normal, N0-N3). Sample sizes were as follows: Normal (n=58), N0 (n=316), N1 (n=180), N2 (n=72), and N3 (n=2). Statistical significance was assessed using a rank-based trend test with FDR correction, with significance levels indicated (*p<0.05; **p<0.01; ***p<0.001; ns, not significant) (see Supplementary Figure 3A), data are shown as mean±SD. (B) Relative expression of selected lncRNAs across primary tumor size categories (Normal, T1-T4). Sample sizes were: Normal (n=58), T1 (n=154), T2 (n=261), T3 (n=42), and T4 (n=18). Statistical significance is annotated as in panel A (see Supplementary Figure 3B). (C) Comparison of lncRNA expression between male and female LUAD patients. Sample sizes were: Male (n=223) and Female (n=265). Statistical comparisons were performed using a two-sided Welch’s t-test. No statistically significant sex-specific differences were observed for any of the analyzed lncRNAs; ns: not significant.
To determine whether the observed pattern represented a true monotonic change rather than isolated group differences, nodal status was treated as an ordinal variable and evaluated using rank-based trend analysis. This analysis revealed statistically significant positive monotonic trends for FAM83A-AS1, CYTOR, and MIR4435-2HG across increasing nodal stages, as quantified by Spearman rank correlation (Figure 4A; Supplementary Figure 3A). Rank-based trend testing across ordered tumor size categories further supported these observations, revealing significant positive correlations between expression rank and tumor size for these lncRNAs (Figure 4B; Supplementary Figure 3B). Furthermore, expression of the selected lncRNAs was compared between male and female LUAD patients, and no significant differences were observed, indicating that their expression is not influenced by sex (Figure 4C).
Together, these results indicate that a subset of poor-prognosis–associated lncRNAs, most notably FAM83A-AS1, CYTOR, and MIR4435-2HG, exhibit progressive increases in expression with advancing lymph-node involvement and tumor size. These findings extend the stage-based analyses presented earlier by demonstrating that expression of specific lncRNAs also tracks with clinically relevant measures of tumor burden and metastatic spread.
Discussion
In this study, we adopted a stringent and unbiased analytical approach to identify lncRNAs that show robust associations with LUAD biology as well as clinical outcomes. By integrating tumor-normal expression analyses with survival associations, measures of disease progression, and multiple independent clinical stratifications, we aimed to move beyond purely descriptive differential expression. This approach allowed us to focus on lncRNAs whose expression patterns consistently reflect tumor aggressiveness. This integrated framework enabled us to narrow down a small subset of lncRNAs whose expression behavior remains consistent across diverse clinical parameters, thereby supporting their potential relevance as prognostic markers.
A key observation arising from our analysis is the consistent overexpression of FAM83A-AS1, CYTOR, and MIR4435-2HG in LUAD tumor samples when compared with normal lung tissue. This elevated expression is not restricted to a particular tumor subgroup but is observed broadly across patients, suggesting that CYTOR and MIR4435-2HG are closely associated with the tumor state itself rather than reflecting isolated or context-specific signals. The reproducibility of this tumor-associated upregulation across independent clinical stratifications further indicates that these lncRNAs are likely embedded within core regulatory programs active in LUAD.
Notably, increased expression of these three lncRNAs is accompanied by unfavorable clinical outcomes. Patients with higher expression levels consistently exhibit poorer overall survival, highlighting the clinical relevance of these transcripts. This association with survival supports the view that these lncRNAs are not merely passive correlates of tumor biology but are linked, either directly or indirectly, to disease aggressiveness. CYTOR has been repeatedly reported as an oncogenic lncRNA in LUAD and other cancer types, with elevated expression associated with poor prognosis and enhanced tumor progression (32-34). MIR4435-2HG has similarly been implicated in lung cancer progression and adverse clinical outcomes across multiple independent studies (28, 35, 36). FAM83A-AS1, although comparatively less characterized, has been reported as a poor-prognosis-associated lncRNA in LUAD and has been included in several transcriptome-based prognostic signatures (3).
Beyond survival outcomes, the expression of these lncRNAs shows a clear relationship with disease progression. The expression of CYTOR and MIR4435-2HG increases in a graded manner with advancing tumor stage, indicating that they track disease severity rather than merely distinguishing tumor tissue from normal lung.
The relevance of these three lncRNAs is further supported by their association with specific components of tumor progression. Their expression patterns parallel increasing lymph-node involvement and larger primary tumor size, both of which are clinical features strongly linked to metastatic potential and patient prognosis. Previous studies have connected CYTOR and MIR4435-2HG with enhanced migratory and invasive capacities in lung and other epithelial cancers, providing a biological context for the association with nodal spread observed in the present analysis (32, 37). Although mechanistic evidence for FAM83A-AS1 remains limited, its consistent association with advanced clinical parameters across independent analyses suggests a possible role in pathways regulating tumor growth or dissemination (26, 38).
Taken together, the convergence of tumor-associated overexpression and adverse survival association, together with the observation that CYTOR and MIR4435-2HG show consistent progression-linked trends across pathological stage, nodal status, and tumor size, while FAM83A-AS1 exhibits associations primarily with nodal involvement and tumor size, positions these lncRNAs as particularly compelling candidates. The strength of this study does not lie in any single observation but rather in the coherence of these patterns across multiple, independent readouts. Such consistency is a key requirement for biomarker identification, as it reduces the likelihood that observed associations are driven by statistical artifacts or cohort-specific effects.
From a biological standpoint, these findings raise important questions regarding the functional roles of these lncRNAs in LUAD. CYTOR and MIR4435-2HG have been reported to participate in regulatory networks influencing proliferation, migration, epithelial–mesenchymal transition, and cellular metabolism, frequently through cytoplasmic mechanisms involving RNA-protein interactions (32, 33, 36). The expression patterns observed in this study are consistent with these reported functions, particularly given their association with tumor burden and nodal involvement. FAM83A-AS1 remains less well characterized; however, its consistent alignment with aggressive disease features suggests that it may act as a regulatory modulator of oncogenic signaling or transcript stability, a hypothesis that merits further investigation (25, 38).
While the present study provides strong evidence for the prognostic relevance of these lncRNAs, it is important to acknowledge that the findings are derived from transcriptomic and clinical association analyses. Functional validation and confirmation in independent cohorts will therefore be essential to further establish their mechanistic roles and translational potential. Nevertheless, the robustness of the associations observed across multiple clinical dimensions provides a solid foundation for future studies. In conclusion, this work identifies FAM83A-AS1, CYTOR, and MIR4435-2HG as a core set of lncRNAs that consistently mark aggressive LUAD biology. Their coordinated behavior across tumor-normal comparisons, survival outcomes, lymph-node involvement, and tumor size highlights their potential value as prognostic markers and as entry points for mechanistic investigation. More broadly, these findings underscore the utility of integrative, multi-dimensional analyses for prioritizing lncRNAs that are both biologically informative and clinically relevant in lung adenocarcinoma.
Acknowledgements
We acknowledge the facilities of JNU and project numbers 91/04/2020-TFGTR/BMS, 52/01/2020-BIO/BMS, and 52_29_2022-BIO_BMS from the Government of India.
Footnotes
Supplementary Material
The complete set of supplementary data supporting the findings of this study has been made publicly available via Figshare at https://doi.org/10.6084/m9.figshare.31102162
Data Availability
Transcriptomic analysis was conducted utilizing The Cancer Genome Atlas (TCGA) datasets for TCGA-LUAD, obtained from the TANRIC database (https://tanric.org/tanric/_design/basic/download.html). Corresponding clinical data for this analysis were retrieved from the National Cancer Institute’s Genomic Data Commons (GDC) data portal for TCGA-LUAD (https://portal.gdc.cancer.gov/projects/TCGA-LUAD).
Conflicts of Interest
The Authors declare that they have no conflicts of interest with the contents of this article.
Authors’ Contributions
M.S. is the principal investigator of this study who conceptualized the project, designed the overall analytical framework, performed all statistical and bioinformatic analyses, and wrote the manuscript. M.S. designed the primary lncRNA screening strategy that led to the objective identification of dysregulated lncRNAs in the TCGA-LUAD dataset, including MIR4435-2HG, CYTOR, and FAM83A-AS1. M.S. conducted integrative cancer transcriptomic analyses, rank-based trend analyses across clinical parameters, statistical testing with false discovery rate control, and data visualization. P.P., N.C., M.G., S.N., and S.S.S. help in data processing and analysis. S.S. contributed to the overall conceptual framework of the study, guided the analytical strategy, and provided critical input in the interpretation of results, particularly with respect to clinical relevance, prognostic stratification, and disease progression. S.S. also contributed to shaping the study design, evaluating the biological significance of the identified lncRNAs, and critically revising the manuscript for important intellectual content. While P.K.D., A.N., and D.P.S. provided supervision. All Authors reviewed the manuscript.
Funding
This research was supported by a grant from the Indian Council of Medical Research (ICMR), India (project numbers 91/04/2020-TFGTR/BMS, 52/01/2020-BIO/BMS, and 52_29_2022-BIO_BMS from the Government of India).
Artificial Intelligence (AI) Disclosure
No artificial intelligence (AI) tools, including large language models or machine learning software, were used in the preparation, analysis, or presentation of this manuscript.
- Received October 16, 2025.
- Revision received February 19, 2026.
- Accepted March 2, 2026.
- Copyright © 2026 The Author(s). Published by the International Institute of Anticancer Research.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).












