Abstract
Background/Aim: In this study, we provide a comprehensive characterization of HPV-positive primary cervical cancers (CC) and HPV-positive head and neck squamous cell carcinomas (HNSCC) through whole genome next-generation sequencing. Human papillomavirus (HPV) infection, recognized as a definitive human carcinogen, is increasingly acknowledged for its role in development of human cancers. HPV-driven cervical cancers are among the leading causes of cancer-related deaths worldwide, while HPV-driven head and neck cancers exhibit distinct biological and clinical characteristics. Recent data has provided convincing evidence that HPV-related cervical cancer, like HPV head and neck cancer also predict better outcomes, with viral integration patterns further predicting disease related outcomes.
Materials and Methods: We designed an experimental study that encompasses four pairs of HPV-positive patient samples with controls, utilizing state-of-the-art Next Generation Sequencing (NGS) technology including whole genome sequencing, transcriptome sequencing and virus integration.
Results: Multiple mutated genes, including TTN, COL6A3, and FLNA, were identified shared between CC and HNSCC. Additionally, we observed a notable proportion of pathways affected by oncogenic alterations, particularly in the RTK-RAS and NOTCH pathways, in both CC and HNSCC. Furthermore, we discovered a shared down-regulation of the Hedgehog signaling pathway based on transcriptome expression analysis in KEGG. We also identified RUNX2 and TFPI as sites of virus integration, and upstream as well as downstream pathway modulators, and represent potential targets for therapeutic interventions.
Conclusion: Overall, this study showed a thorough comparison between CC and HNSCC from multiple aspects, including gene variations, oncogenic pathways, KEGG enrichment and virus integration sites. However, further studies, which involve larger patient cohorts should be undertaken to further support these findings.
- HPV-positive cancers
- cervical cancer (CC)
- head and neck squamous cell carcinoma (HNSCC)
- whole genome sequencing
- next-generation sequencing (NGS)
- viral integration
Introduction
Human papillomavirus (HPV) is a diverse family of viruses with numerous distinct types capable of infecting humans (1, 2). While many HPV types are relatively benign, several have been identified as significant causative agents in various forms of cancers (3). HPV-related cancers account for 4.5% of all cancers worldwide (4). Among these, HPV is a primary etiological factor in cervical carcinoma (CC), one of the most prevalent and life-threatening malignancies affecting women globally (5). Cervical cancer is the eighth most common cancer worldwide, with more than 600,000 new cases diagnosed annually, predominantly in regions such as Sub-Saharan and East Africa (6). High-risk HPV types, particularly 16 and 18, are established as the leading causes of nearly all cervical cancers (7). Beyond the cervix, HPV has also been implicated in head and neck squamous cell carcinoma (HNSCC), originating from various anatomical sites within the head and neck region (8). Notably, HPV infection has overtaken smoking and alcohol as the primary risk factor for oropharyngeal cancers (9), which are more frequently diagnosed in males (10). However, recent trends indicate a rising incidence of these cancers among females. The true prevalence of HPV-related head and neck cancers remains uncertain, primarily due to inconsistencies in testing practices (4). Despite this, cervical cancer and head and neck cancers are recognized as the most common HPV-related malignancies. Several studies have been conducted to characterize the genomic landscapes of both CC and HNSCC (9, 11). These efforts are driven by the significant differences in clinical outcomes between HPV-related and non-HPV-related head and neck cancers, while nearly all cervical cancers are HPV-related (12, 13).
A study of 228 primary cervical cancer cases revealed notable APOBEC mutagenesis patterns, with SHKBP1, ERBB3, CASP8, HLA-A, and TGFBR2 identified as significantly mutated genes in cervical cancer (7). Amplifications of immune targets, including CD274 (commonly known as PD-L1) and PDCD1LG2 (commonly known as PD-L2), were also observed, along with the discovery of BCAR4, a long non-coding RNA associated with response to lapatinib (7). HPV integration was detected in all HPV18-related samples, while 76% of HPV16-related samples were linked to structural aberrations and increased expression of target genes (7). Additionally, a unique subset of endometrial-like cervical cancers, predominantly HPV-negative, was identified. These tumors exhibited relatively high frequencies of KRAS, ARID1A, and PTEN mutations (11).
The Cancer Genome Atlas (CGA) profiled 279 HNSCCs to provide a comprehensive landscape of somatic genomic alterations. The study identified key distinctions in HPV-associated tumors, highlighting frequent helical domain mutations in the oncogene PIK3CA, novel alterations involving loss of TRAF3, and amplification of the cell cycle gene E2F. In contrast, smoking-related HNSCCs exhibited widespread TP53 loss-of-function mutations and CDKN2A inactivation, alongside common copy number alterations like amplification in 3q26/28 and 11q13/22 (14). Notably, a subset of oral cavity tumors with favorable prognoses showed limited copy number variations but harbored activating mutations in HRAS or PIK3CA, combined with inactivating mutations of CASP8, NOTCH1, and TP53. Additional subgroups featured alterations such as NSD1 inactivation, WNT pathway disruptions involving AJUBA and FAT1, and NFE2L2 activation – primarily observed in laryngeal cancers (14).
Besides, alterations in the Hedgehog (HH) signaling pathway have been found to correlate with local recurrence of cervical cancer and, head and neck cancer progression (15, 16). The HH pathway appears to be one of the HPV-mediated up-regulated gene pathways, and the expression of the E7 oncogene is associated with advanced stage of cervical cancer as well as HH pathway overexpression (17, 18).
Notably, the underlying mechanisms behind HPV-positive cancers remain unclear, and the relationship between HPV-positive CC and HPV-positive HNSCC remains to be more carefully explored (19). Therefore, there is a compelling need to unravel the shared and unique aspects of these two malignancies for better treatment strategies. To address this gap in our understanding, we analyzed and compared the oral and cervical genomes in HPV-positive samples to provide insights into the similarities and differences between HPV-associated CC and HNSCC.
Despite advancements in understanding HPV-associated cancers, the mechanisms underlying HPV-positive CC and HNSCC remain inadequately defined, and the interplay between these malignancies is not fully elucidated. This study aims to address these gaps by leveraging state-of-the-art Next Generation Sequencing (NGS) technologies, including whole genome sequencing (WGS) and transcriptome sequencing, to comprehensively compare the genomic landscapes of HPV-positive CC and HNSCC. By identifying shared and distinct molecular features, this research seeks to enhance our understanding of the pathophysiology of these cancers and to uncover potential targets for improved treatment strategies.
Materials and Methods
Sample preparation. Tumor and control samples are from deidentified pathology samples from the Pathology Laboratory at the Ministry of Health Botswana under IRB approval number (#820159). The sequencing data generated and analyzed in the study came from two sources, HiSeq3000 1×50 and MiSeq 2×150, respectively. The HiSeq3000 1×50 data included 12 oral and cervical specimens, 4 for each cancer type totaling 8 samples altogether, and 2 controls each with a total of 4 controls altogether. All 12 samples were pooled into a single lane for HiSeq and each sample was dual indexed. Six oral specimens included 4 tumor samples and 2 control samples, while 6 cervical specimens included 4 tumor samples and 2 control samples. For MiSeq 2×150, the data corresponds to CC and HNSCC specimens used for transcriptomics analyses. A total of 4 samples for each cancer type and 2 controls each were used for the transcriptomics RNA-Sequencing analyses. The tumor samples were run together for sequencing, and the two control samples were run together for sequencing. Therefore, for each cancer type there are six samples, four tumors and two controls, which were sequenced together for whole genome DNA-sequencing and similarly for RNA-Sequencing. The entire program from sample preparation to data analysis is shown in a schematic in Figure 1.
A schematic showing the work pipeline. The tumors and controls cohorts and next generation sequencing and analysis pipeline. DNA-seq analysis shown of the left and RNA-seq analysis is shown on the right.
DNA sequencing and mutation calling. Whole genome data pre-processing was conducted in accordance with the GATK Best Practices using GATK 4.0 (20). Briefly, aligned BAM files were processed by read group, with alignments and attributes removed using ‘RevertSam’ to generate one unaligned BAM (uBAM) file per group (20). Uniform read groups were then assigned using ‘AddOrReplace Readgroups’. These uBAM files were converted to interleaved FASTQ format with ‘SamToFastq’ and underwent quality control via ‘FastQC.’ Adapter sequences were identified and trimmed using ‘Trim_galore’. The cleaned FASTQ files were aligned to the b37 genome with ‘BWA MEM,’ followed by attribute restoration through ‘MergeBamAlignment’. Aligned BAM files from multiple read groups were merged, and PCR/optical duplicates marked using ‘MarkDuplicates.’ Next, base recalibration was performed using ‘BaseRecalibrator’ and ‘ApplyBQSR’. Coverage metrics were generated with ‘CollectHsMetrics,’ while alignment quality was assessed using ‘Validate SamFile’ and summarized with ‘MultiQC’. Finally, ‘Crosscheck Fingerprints’ verified sample integrity by confirming all read groups belonged to the same individual, ensuring consistency across samples. Any identified mismatches were flagged and excluded from further analysis (20). The mapping qualities of the samples are summarized in Supplementary Table I. The average reads per sample is approximately 5 million and uniquely mapped reads percentage is 65% on average for Miseq 2×150 sequenced samples as shown in Table I.
Mapping quality shown for the sequencing reads of the CC and HNSCC samples to hg38.
Variant detection followed the GATK Best Practices using GATK4. Germline variants were identified from control samples with Mutect2 in artifact-detection mode and compiled into a cohort-wide panel of normal samples. Somatic variants in tumor samples were detected using Mutect2 in matched normal mode, incorporating parameters such as the matched normal sample, reference FASTA, panel of normal variants, and gnomAD germline resources for additional filtering. Cross-sample contamination was assessed with ‘GetPileupSummaries’ and ‘CalculateContamination’ for both tumor and matched normal samples. Read orientation artifacts were analyzed using ‘CollectF1R2Counts’ and modeled through ‘LearnReadOrientationModel.’ Finally, additional filtering, including artifact-in-normal and contamination thresholds, was applied using ‘FilterMutectCalls’. The filtering strategies are default settings with allele frequency in tumor at a minimum frequency of 5% or higher. In addition, gnomAD was used to filter out common germline variants from the somatic calls, with a threshold of 1% allele frequency in population datasets.
RNA-seq pre-processing. RNA data preprocessing followed the mRNA analysis pipeline established by the National Cancer Institute (21). Quality control was first performed on raw FASTQ files using ‘FastQC,’ and low-quality bases were trimmed with ‘Trim_galore’. The cleaned FASTQ files were aligned to the b37 genome using ‘STAR’ to produce BAM files. Subsequent quality assessment of these BAM files utilized ‘Collect RNASeqMetrics,’ generating metrics that detailed base distribution across transcripts.
Differential gene expression analysis of RNAseq data. After alignment, BAM files were analyzed using the RNA Expression Workflow to quantify RNA expression levels. Gene-level read counts were obtained with ‘HTSeq-Count’ (22). These raw counts were normalized using ‘DESeq2,’ which applies a negative binomial distribution model and its own statistical algorithm. DESeq2 outputs included base means across samples, log2 fold changes, standard errors, test statistics, p-values, and adjusted p-values. Significant genes were visualized using a volcano plot to highlight differential expression patterns.
Gene set enrichment analysis. Pathway analysis was performed using ‘fgsea’, a fast implementation of preranked gene set enrichment analysis (GSEA) (23). Significant genes were ranked and analyzed alongside the KEGG pathway dataset as inputs. ‘fgsea’ produced outputs including pathway names, enrichment scores, normalized enrichment scores (NES), and adjusted p-values. Results were prioritized based on the NES values.
Virus integration sites. Virus integration detection was conducted using ‘Virus-Clip’, a fast and memory-efficient viral integration site detection tool at single-base resolution with annotation capability (24). First the DNA-seq was aligned to both virus reference genome using BWA-MEM and human reference genome hg38 using BLASTN. Then we extracted soft-clipped reads (potentially containing sequences of virus-integrated human loci) using SAMTools (25). All integration sites were automatically annotated with the affected human genes and their corresponding gene regions using ANNOVAR (26).
Protein-protein interaction. We applied STRING (version 12) for protein-protein interaction analysis, a database of known and predicted protein-protein interactions (27). The interactions encompass both direct (physical) and indirect (functional) associations. These relationships are derived from computational predictions, knowledge transfer across species, and data compiled from primary interaction databases.
Statistical analysis. All statistical analyses were conducted in R version 4.1, and data visualization was performed using Maftools version 2.22 (28). Results were considered statistically significant at a p-value <0.05 in differential expression analysis and pathway enrichment analysis. Statistical significance was determined based on a false discovery rate (FDR) threshold of 0.05 using the Benjamini-Hochberg correction (29).
Results
Frequently of mutated somatic genes of HPV-positive CC and HNSCC. In the study cohort, genomic analysis identified 3129 somatic variations (including SNVs and small indels) in CC, and 1597 somatic variations in HNSCC. Somatic mutation types were summarized in Table II.
Somatic mutation types identified in CC and HNSCC tumor samples.
Of the somatic variations, multiple genes have 100% somatic mutation enrichment, which is likely due to the size of the samples as shown in Figure 2. The average number of mutations including single nucleotide variants (SNVs) and small insertions and deletions (INDELs) are 657 and 490 for CC and HNSCC, respectively. Among the top enriched genes of CC and HNSCC, many of them have been reported to be significantly associated with cancers. The top genes with 100% significant mutations in CC were BPTF, COL5A1, EEF2, KDMP5B, KLF10, KMT2C, LRP1, RNF213, RYR1, SPEN, TERF, TTN and UBR4. In addition, the top 100% mutated genes in HNSCC were TTN and WNK1, while COL6A3 had a 75% mutation rate (Figure 2). We also identified TTN, COL6A3, and FLNA as significantly mutated shared genes in CC and HNSCC with greater than 50% mutation rates (Figure 2).
Mutational oncoplots of most frequently mutated genes with driver mutations in HPV-positive cervical cancer (CC) (A) and head and neck squamous cell carcinomas (HNSCC) (B). The oncoplots show a summary of the most frequent driver mutations of the CC and HNSCC cohorts based on the exome gene list. The oncoplots show the patients in a horizontal orientation and the gene and the corresponding driver mutations per patient in the vertical orientation. The upper area is the tumor mutation burden including all somatic single nucleotide variants (SNVs) and small insertions and deletions (INDELs) per tumor.
Oncogenic pathway analysis in CC and HNSCC. We expanded our analysis of mutated genes to encompass recognized oncogenic signaling pathways. As illustrated in Figure 3A-B, both CC and HNSCC exhibited a notable proportion of pathways that were affected by oncogenic alterations. CC showed that 100% samples were affected in five pathways, including RTK-RAS with 23/85 genes affected in the pathway, NOTCH (19/71), HIPPO (10/38), MYC (3/13) and PI3K (5/29) pathways. HNSCC showed that 100% of the samples were affected in two cellular pathways, RTK-RAS (14/85) and NOTCH (13/71). Notably, CC tended to exhibit a greater proportion of genes in each pathway compared to their HNSCC counterparts. However, we also noticed that many oncogenic pathways had gene mutations in both CC and HNSCC, including RTK-RAS, NOTCH, HIPPO, WNT, TP53, PI3K, MYC, Cell cycle and the TGF-Beta pathways. It should be noted that the NRF2 oncogenic pathway was exclusively affected in CC (Figure 3).
Overview of oncogenic pathway alterations in the tumor cohort. (A-B) left plots show the oncogenic pathways affected in the HPV-positive cervical cancer (CC) and head and neck squamous cell carcinomas (HNSCC) patients, with the number of affected genes in relation to the total number of genes linked to each pathway. A-B right plots display the fraction of tumors bearing mutations in a given pathway. The squared pathways have a 100% fraction of samples affected; the asterisk marked pathways are found in both CC and HNSCC. (C-F) For each pathway, Maftools allows visualization of the mutated genes for each tumor sample to easily spot the recurrently mutated genes. Oncogenes and tumor suppressor genes are indicated in blue and red, respectively. The mutations exclusively mutated in CC and HNSCC are marked by yellow and blue arrows, respectively.
Key oncogenes and tumor-suppressor genes in CC and HNSCC. In addition to pathways, we also explored genomic alterations, focusing particularly on identifying potential driver mutations. We depicted gene mutations in the RTK-RAS and NOTCH pathways (Figure 3C-F) in both CC and HNSCC. This analysis unveiled several pivotal genes involved in the pathogenesis of CC and HNSCC, as well as some distinct genes in each group marked by arrows. In the RTK-RAS pathway, tumor suppressor genes IGF1R and PDGFRA were mutated in both groups, while numerous oncogenes, including EGFR, ERBB2, ERBB3, MAPK1, NTRK2, FGFR2, FGFR4, KIT, and NRAS, were mutated exclusively in CC (Figure 3, yellow arrows). Conversely, tumor suppressor genes, including ERBB4, FGFR1, JAK2, and NTRK3 (Figure 3, blue arrows), were mutated in HNSCC. Regarding the NOTCH pathway comparison, both CC and HNSCC showed mutations in a large number of genes, such as SPEN, CREBBP, EP300, NCOR1, and NCOR2, while NOTCH2, NOTCH3, FBXW7, and NOTCH1 mutations were observed only in CC (Figure 3). To investigate the functional consequences of somatic variants in the most frequently mutated genes in HPV-positive CC and HNSCC, we depicted their relative positions, counts, and types within annotated protein domains (Figure 4). We observed that all HPV-positive CC samples harbored missense mutations in SPEN, which disrupts the hormone-inducible transcriptional repressor (30). Particularly noteworthy is a somatic mutation found within the RNA recognition motif region, which is known to bind to a steroid receptor RNA coactivator (31). This binding can modulate the activity of both liganded and nonliganded steroid receptors (Figure 4A). Additionally, we found a 50% somatic mutation rate in EP300, encompassing frame shift deletions, nonsense and missense mutations, as well as in-frame deletions. Several of these mutations were identified in KAT11, KIX, DUF902, and CREB binding domains (Figure 4A). Apart from these genes, we also observed missense mutations in the EGFR Recep L domain and ERBB2 Phospho kinase Tyr domain (Figure 4B-D). In HNSCC samples, we observed a missense mutation in JAK2 within the PTK-Jak2-Jak3-rpl1 domain. Moreover, a nonsense mutation and a missense mutation were harbored in ERBB4 cytoplasmic fragment (Figure 4E-F).
Somatic mutated variants in genes. (A-D) SPEN, EP300, EGFR, and ERBB2 in HPV-positive cervical cancer (CC). (E-F) JAK2 and ERBB4 in head and neck squamous cell carcinomas (HNSCC). The type of mutation and anatomical localization are annotated below each figure. The variant prevalence and spectrum of these genes are shown in the plots. Green circles indicate missense mutations, blue circles indicate frame shift deletion, red circles indicate nonsense mutations and yellow circles indicate in frame deletion.
Transcriptomic sequence analysis reveals key pathways in HPV-positive CC and HNSCC. To expand our investigation, we conducted RNA-seq analysis to discern genes exhibiting alterations between tumor samples and their corresponding normal tissues (Supplementary Material 2). The CC and HNSCC samples showed a closer relationship compared to their control, indicating similar gene expression patterns (Figure 5A). In addition, this analysis identified 212 up-regulated and 1346 down-regulated differentially expressed genes (DEGs) out of 6,514 detectable genes in CC, which met the criteria of |logFC| >3. Similarly, 297 up-regulated and 1014 down-regulated DEGs were identified in HNSCC (Supplementary Material 3-4). Nineteen up-regulated genes were shared between CC and HNSCC, including ABO, ADGRF4, ANGPTL3, CIDEC, DCAF12L2, GPC4, IFNA21, IL37, KCNF1, KERA, KIF7, KRBOX4, MYOZ2, NDUFAF4, NUDT15, OR5K1, PEX7, PIGA, and ZNF223. Among these genes, only PIGA was reported as an oncogene causing paroxysmal nocturnal hemoglobinuria (32). Furthermore, 514 down-regulated genes were shared between CC and HNSCC, where 24 of them are previously reported oncogenes, including CEBPA, CLP1, EPOR, FEV, FGF3, FH, FOLH1, FOXA1, GATA1, H3-3A, HIRA, HOXC13, HRAS, KMT2B, LMO1, LYL1, MLF1, MN1, NKX2-1, NTHL1, PDCD1, S1PR2, SERPINB4, and SH2D1A (Figure 5B-C).
(A) Sample relation based on 33245 Genes after PCA dimension reduction for oral tumor (OralT), oral control (OralC), bot-cervical tumor (BCT) and bot-cervical control (BCC). X-axis refers to PC1, and y-axis is PC2. (B-C) Venn diagram showing the number of overlap of up-regulated and down-regulated genes between HPV-positive cervical cancer (CC) and head and neck squamous cell carcinomas (HNSCC). (D-E) Enrichment analysis of CC and HNSCC in KEGG pathways. Dotplot with x-axis as normalized enrichment score and y-axis as the name of the pathways. Up-regulated pathways facing right and down-regulated pathways are facing left. The size scale is based on number of affected genes.
Our study conducted an in-depth analysis of biological pathways using Kyoto Encyclopedia of Genes and Genomes (KEGG) (33), with the aim of identifying the specific pathways enriched in CC and HNSCC. This investigation provides insight into the molecular mechanisms underlying these malignancies (see Supplementary Material 5-8, Figure 5D-E), with the pathways selected based on an adjusted P-value of less than 0.1 with false discovery rate (FDR) adjustment. We found three up-regulated pathways in CC, including JAK/STAT, complement and coagulation cascades, and valine, leucine and isoleucine degradation; three down-regulated pathways, including Huntington’s disease, drug metabolism and other enzymes, and the hedgehog signaling pathway (Figure 5D-E). On the other hand, we found four up-regulated pathways in HNSCC, including purine metabolism, cell adhesion molecules, cell cycle and oxidative phosphorylation; as well as five down-regulated pathways, including cytokine-receptor interaction, hedgehog signaling, leukocyte trans-endothelial migration, starch and sucrose metabolism and MTOR signaling. We showed that hedgehog-signaling pathway was down-regulated in both CC and HNSCC, while many other pathways are either immune related or tumor proliferation related (Figure 5D-E).
Viral integration analysis. In our study, an investigation into the virome composition of CC and HNSCC samples revealed a complex and diverse viral landscape. The analysis of virus composition uncovered shared viral entities between both CC and HNSCC samples, while also highlighting notable differences. Some viral entities that were detected in both CC and HNSCC samples were HHV-4, HHV-5, HHV-6A, HHV-6B, HHV-7, HHV-8, and a number of HPV types including HPV-108, HPV-16, HPV-18, HPV-2, HPV-9, and HPV-4, highlighting the presence of HPV infections at these anatomical sites (Figure 6A-B). Notably, HPV-18 and HPV-9 were found to have higher counts in CC samples compared to HNSCC samples, indicating a potentially more significant role in CC malignancies. In CC, FMN1, RUNX2, EYS, EHBP1, and NUP93 were identified as viral integration sites. In HNSCC, viral integration sites were detected in TFPI, ADAMTS13, GLRA1, NEDD4, CD83, PLA2G2A, RIN2, LRMDA, and CDH13 (Figure 6A-B).
Virus integration sites cycle plots in HPV-positive cervical cancer (CC) (A) and head and neck squamous cell carcinomas (HNSCC) (B). The circle contains 22 chromosomes as well as the X and Y chromosomes. Integration sites facing inside are from tumors while ones facing outside are from the control group. 13 different virus types are indicated by color by the right side. (C-E) Protein-to-protein interaction maps for RUNX2 and TFPI with JAK/STAT, Hedgehog signaling and cytokine-receptor intersections. Colored nodes are query proteins and first shell of interactors; white nodes are second shell of interactors; edges are assoications based on known or predicted interactions.
Integrated analysis reveals RUNX2 and TFPI as key factors in HPV-positive CC and HNSCC. Our study delved into the intricate relationships among the genome, transcriptome, and viral integrations in both CC and HNSCC samples. To explore the regulatory mechanisms underlying the expression of significant genes, we utilized the QIAGEN Ingenuity Pathway Analysis (5) program. This analytical approach allowed us to identify key upstream regulators of transcriptionally significant genes, shedding light on the intricate interplay between viral integrations and gene expression. In CC samples, a notable finding was the insertion of the HPV-18 virus on chromosome 6, coinciding with the presence of the RUNX2 gene (Figure 5A). This gene was identified both as an upstream regulator in the IPA analysis and as a site of viral detection within the CC cohorts. Several genes regulated by RUNX2 support various aspects of tumor progression. Therefore, we plotted the interaction map between RUNX2 and JAK/STAT, and the hedgehog signaling pathway (Figure 5C-D). RUNX2 interacts with IL1A and ADAMTS5 directly, which affects the expression of IL1RAP, IL1RL2 and IL36B; it also has direct regulation of WNT1 and WNT10B, which showed down regulation in their transcription expression. Similarly, although TFPI is not upstream, its recognized inhibitor behavior showed strong interaction with the cytokine-receptor intersection pathway through interaction with KLK11 (Figure 5E). KLK11 was initially isolated from the human hippocampus and is considered a potential biomarker for prostate, ovarian, and breast cancers (34). Its expression in peripheral tissues might be stimulated by tumor necrosis factor α (TNFα) secretion from tumor cells. TNFα, a key inflammatory cytokine, is notably overexpressed in advanced breast cancer (35). Notably, its involvement in a range of different biological responses were previously described. For example, increased expression of metalloproteases, epithelial–mesenchymal transition, angiogenesis, inflammation, and drug resistance (34-36). The role of TFPI in HPV-positive oral cancer prognosis, and its underlying mechanism will be more clearly understood with additional investigations, which utilizes a larger study sample size.
Discussion
Cervical cancer (CC) and head and neck squamous cell carcinoma (HNSCC) are prevalent and deadly malignancies. Despite the availability of effective prophylactic vaccines targeting key carcinogenic HPV types, vaccine uptake remains low (11). Furthermore, the impact of vaccines will only be appreciated 3 decades after initiation (37). Nonetheless, the countries where the diseases are most prevalent are the ones with the lowest levels of vaccine uptake (38). The interaction between the host genome and virus drives the clinical presentation and therapeutic response that is distinct from the non-virally associated cancers. Therefore, understanding the molecular mechanisms and genetic predisposition of these cancers are of paramount importance for clinical intervention. Over the last few decades, oral HPV infection has surpassed smoking and alcohol as the primary risk factor for oropharyngeal cancers and are more common in males. However, there appears to be a recent increase in incidence among females. Our study aims to provide a comparison between oral and cervical cancers in women with positive HPV tests to a negative HPV control group, providing insights into the similarities and differences between HPV-associated CC and HNSCC.
In this study, we did a thorough analysis based on somatic mutations, oncogenic pathways, enrichment pathways, and virus integrations between HPV-positive CC and HNSCC. We found both similarities and unique characteristics between these two types of cancers. Broadly, the genes with high mutation rates are shown in Figure 2, among which many are reported as oncogenes or tumor suppressors (14, 39-46). We also found several important oncogenic pathways which were previously reported to be associated with HPV-positive cancer progression (47-49). The mutated genes in the RTK-RAS and NOTCH pathways were different between CC and HNSCC, which were reported to be the leading cause for cancer progression in these cancers (Figure 2 and Figure 3). Moreover, we found that the CC and HNSCC relationship is closer compared their control groups based on gene expression. Nineteen up-regulated shared genes and 514 down-regulated genes were found between the CC and HNSCC tumors. Enrichment analysis using the KEGG pathway analyses showed a greater number of up-regulated pathways and down-regulated pathways in CC and HNSCC (Figure 4). In our virus integration analyses, we found several key factors that were linked to cancer progression and their interactions with the regulation of the pathways identified (Figure 5).
Integration across multiple platforms revealed a diverse array of mutations in CC. Among the top 10 mutated genes, with mutation rates varying from 12% to 33% were TTN, PIK3CA, MUC4, KMT2C, MUC16, KMT2D, SYNE1, FLG, DST, and EP300 (50). EP300 regulates cAMP-gene expression by binding specifically to phosphorylated CREB protein and functions as a histone acetyltransferase, controlling transcription through chromatin remodeling, which is crucial in the processes of cell proliferation and differentiation. Additionally, we observed mutations in KLF10, which acts as a tumor suppressor through the TGFβ signaling pathway (46); KMT2C, a frequently mutated epigenetic modifier gene essential in DNA replication fork restart (51); and RNF213, whose role in CC necessitates further investigation and validation as a prognostic factor (51). SPEN mutation showed a significant correlation with tumor mutation burden (TMB) and microsatellite instability (MSI) in pan-cancer and exhibited promise as a biomarker, especially in patients undergoing Immune Checkpoint Inhibitors (ICI) therapy, regardless of cancer type (52). In HNSCC, the top mutated genes include P53, CDKN2A, CASP8, FAT1, NOTCH1, HRAS, PIK3CA, MLL2, and FBXW7 (53). Additionally, we detected a mutation in WNK1, suggesting a potential involvement of WNK family of genes in cancers (54). Most of these top mutated genes aligned with the previously published results including NOTCH1, HRAS and WNK1 as seen in the Cancer Genome Atlas Project (9).
Furthermore, the oncogenic pathways enriched due to the mutated genes may offer some insights into signaling pathways associated with CC and HNSCC that can be therapeutically targeted such as PI3K/AKT and NOTCH pathways. Mutations affecting RTK signaling often result in cell transformation, observed in several different malignancies. These mutations impact RTKs or downstream pathways like MAP kinase and PI3K/AKT, leading to increased cell proliferation, survival, invasion, and metastasis (47). NOTCH activity alteration, either gain or loss, promotes HPV-associated cancer development. Additionally, HPV oncogenes likely disrupt NOTCH tumor-suppressive activities, enabling the oncogenic pathways activated by NOTCH to support tumor growth (55). NOTCH signaling can both promote and inhibit tumor development in different cancers. In CC progression, partially differentiated progenitors, which proliferate faster than basal cells, are likely the origin of tumor cells (48). In addition, the NOTCH pathway plays a crucial role in maintaining oral tissue homeostasis and that of other organ systems (49). Therefore, HNSCC-specific NOTCH pathway therapy would need to target specific NOTCH isoforms to avoid systemic and gastrointestinal toxicities. Apart from NOTCH, deregulation of RTK-RAS/PI3K was reported in HPV-associated CC and HNSCC, leading to further deregulation in MYC and HIPPO pathways (14). Additionally, the NRF2 pathway is highly mutated in both CC and HNSCC. NRF2 has been linked to treatment resistance, with significant down-regulation of NRF2-signaling observed in the subtype of HPV-positive HNSCC (56).
KEGG enrichment pathway analyses showed that CC and HNSCC have significantly different enriched pathways. In CC, we observed an up-regulation of JAK-STAT signaling. Previous studies have shown that inhibiting JAK2 reduces STAT3 activation, leading to decreased cell proliferation and increased sensitivity to Cisplatin (57). Additionally, we noted a down-regulation of Hedgehog signaling, which is known to be activated in various cancers, including human hepatocellular carcinoma (58). In HNSCC, we observed an up-regulation of cell cycle pathway, consistent with prior studies (59). For instance, Erufosine has been shown to down-regulate cell cycle processes in OSCC cells, highlighting its potential as a cell cycle inhibitor in treatment of HNSCC, either alone or in combination with other therapies (59). Furthermore, we also observed activation of oxidative phosphorylation (OXPHOS). Resistant cancer cells often exhibit reprogrammed metabolism, leading to an energy metabolism shift towards OXPHOS, mediated by certain oncogenes (18). Similarly, we found down-regulation of Hedgehog signaling in HNSCC. Moreover, we also found down-regulation of cytokine-receptor interaction in both CC and HNSCC. These cytokine-mediated processes were shown to be significantly associated with positive regulation of the immune response in various immune cells (60). Importantly, engineered cytokines such as IFN, IL-2, and IL-4 are well-known to have therapeutic potential in regulating the immune response (61).
Using a multi-omics approach, we identified RUNX2 as a crucial gene associated with HPV integration sites. Previous studies have demonstrated an indirect modulation of the PI3K/AKT signaling pathway by RUNX2 in both prostate and breast cancer cells (16). Our research demonstrates interactions between RUNX2, the JAK/STAT and Hedgehog signaling pathways through various mediators, including IL1A, WNT1, and WNT10B. Notably, WNT1 was found to be down-regulated in CC, supporting our conclusions that WNT1 is involved in driving the oncogenic phenotype in this cancer. Additionally, we discovered that TFPI, a tissue factor pathway inhibitor (62), was adjacent to inserted HPV-18 on chromosome 17 in HNSCC. TFPI exhibits anti-thrombotic actions and can associate with lipoproteins in plasma (63). We now show the involvement of TFPI, and the cytokine-receptor interactions are likely through KLK11, which is important for maintaining tumor environment stability, and inhibition of invasiveness, neoplasm growth, and metastasis formation. TFPI has also been shown to induce apoptosis and inhibit angiogenesis, significantly contributing to tumor growth inhibition (64).
In this study, we explored the potential genetic and gene-reprogramming connections that occur in CC and HNSCC, emphasizing their interactions to the somatic mutations, enriched pathways, and HPV oncogenes present in these cancers. Our findings shed light on oncogenic pathways including RTK/RAS and NOTCH, which were present in both cancer types, and corroborated in part by other previous studies (47-49, 55, 65). Moreover, we underscore the importance of the RUNX2 and TFPI genes as insertion sites for oncogenic HPV genomes, influencing downstream transcriptional expression, and pathways such as JAK/STAT, Hedgehog-signaling, and cytokine-receptor interactions. These discoveries underscore the intricate nature of gene reprogramming activities in these cancers and revealing numerous areas that were previously unknown. This opens new avenues for future research. Notably, we did not explore the link between PI3K/AKT and NOTCH mutations with patient survival outcomes in our cohort due to lack of sufficient clinical data. However, our plan is to expand our study in the future, where we can do follow-up studies on patients to assess the impact of these gene regulatory changes on outcomes. Consequently, further experimentation is imperative to refine treatment strategies and to chart future research avenues important for successful interventions.
Footnotes
Supplementary Material
Available at: https://figshare.com/articles/dataset/Supplementary/27911886
Conflicts of Interest
The Authors declare that they have no conflicts of interest related to this study.
Authors’ Contributions
E.R. conceived and designed the study. J.R. performed the experiments and data analysis. J.R., S.G., T.S., S.B., N.Z., and N.M. contributed to data interpretation and manuscript drafting. E.R. and Z.W. supervised the study and critically revised the manuscript. All Authors read and approved the final version of the manuscript.
- Received November 26, 2024.
- Revision received December 24, 2024.
- Accepted January 10, 2025.
- Copyright © 2025 The Author(s). Published by the International Institute of Anticancer Research.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).













