Elsevier

Gene

Volume 692, 15 April 2019, Pages 119-125
Gene

Research paper
Identification of biomarkers associated with diagnosis and prognosis of colorectal cancer patients based on integrated bioinformatics analysis

https://doi.org/10.1016/j.gene.2019.01.001Get rights and content

Highlights

  • Colorectal cancer related differentially expressed genes were identified based on GEO database and TCGA database.

  • The top 10 hub genes with relatively high diagnostic values for the patients of CRC were identified.

  • A 9-gene prognostic signature was identified as a potential prognostic predictor for CRC patients.

Abstract

Background

The current study aimed to identify potential diagnostic and prognostic gene biomarkers for colorectal cancer (CRC) based on the Gene Expression Omnibus (GEO) datasets and The Cancer Genome Atlas (TCGA) dataset.

Methods

Microarray data of gene expression profiles of CRC from GEO and RNA-sequencing dataset of CRC from TCGA were downloaded. After screening overlapping differentially expressed genes (DEGs) by R software, functional enrichment analyses of the DEGs were performed using the DAVID database. Then, the STRING database and Cytoscape were used to construct a protein–protein interaction (PPI) network and identify hub genes. The receiver operating characteristic (ROC) curves were conducted to assess the diagnostic values of the hub genes. Cox proportional hazards regression was performed to screen the potential prognostic genes. Kaplan-Meier curve and the time-dependent ROC curve were used to assess the prognostic values of the potential prognostic genes for CRC patients.

Results

Integrated analysis of GEO and TCGA databases revealed 207 common DEGs in CRC. A PPI network consisted of 70 nodes and 170 edges were constructed and top 10 hub genes were identified. The area under curve (AUC) of the ROC curves of the hub genes were 0.900, 0.927, 0.869, 0.863, 0.980, 0.682, 0.903, 0.790, 0.995, and 0.989 for CCL19, CXCL1, CXCL5, CXCL11, CXCL12, GNG4, INSL5, NMU, PYY, and SST, respectively. A prognostic gene signature consisted of 9 genes including SLC4A4, NFE2L3, GLDN, PCOLCE2, TIMP1, CCL28, SCGB2A1, AXIN2, and MMP1 was constructed with a good performance in predicting overall survivals of CRC patients. The AUC of the time-dependent ROC curve was 0.741 for 5-year survival.

Conclusion

The results in this study might provide some directive significance for further exploring the potential biomarkers for diagnosis and prognosis prediction of CRC patients.

Introduction

Colorectal cancer (CRC) is one of the most common cancers with high morbidity and mortality worldwide. Nearly 1.4 million new cases of CRC and 700,000 CRC-related deaths were reported each year in the world (Torre et al., 2015). Brenner et al. (2014) found that the 5-year survival rate of CRC patients was >90% when diagnosed at early stages. Due to the lack of adequate diagnostic methods, CRC is often diagnosed at an advanced stage. Despite the significant improvements in diagnosis and treatment, the 5-year survival rate for CRC patients diagnosed with metastatic is still low at approximately 12% (Siegel et al., 2015). Thus, it's urgently needed for understanding the molecular mechanisms of CRC development and identification of novel biomarkers are used for the early detection and prognosis evaluation of CRC.

Over the past decades, developing of molecular biology has increased our understanding of the pathogenesis of CRC. Previous researches have indicated that CRC is a genetic disease, which depends on alteration of numerous of oncogenes and tumor suppressor genes (Bogaert and Prenen, 2014). A growing number of genes and their coding proteins related to CRC have been explored. Previous studies revealed that they play crucial roles in a large number of physiological and pathological processes including cell proliferation, differentiation, apoptosis, and metastasis (Hisamuddin and Yang, 2006; Testa et al., 2018). However, the precise molecular mechanisms of CRC are still far from being deep understood. Recently, several studies have discovered a group of CRC related candidate genes by bioinformatics analysis. For example, Huang et al. (2018) identified hundreds of CRC associated differentially expressed genes (DEGs) based on the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) database, and five genes of which can used as diagnostic biomarkers for CRC patients. Hou et al. (2018) found a collection of DEGs and DNA methylation aberrations in CRC. Their results indicated that the combination of DEGs, DNA methylation aberrations, and tumor stages result in more effective prognostic evaluation of patients with CRC. In addition, lncRNAs and the associated regulatory network consisting of transcription factors, microRNAs, mRNAs, and RNA-binding proteins could also be identified using bioinformatics analysis (Zhang et al., 2018). But it is still paramount to find more potential biomarkers for effective diagnosis and prognosis assessment of CRC patients.

In this study, we downloaded large-scale gene datasets regarding CRC from GEO and TCGA databases. After integrated analysis of both two databases, we identified 10 hub genes from the common DEGs by constructing protein-protein interaction (PPI) network. The results of receiver operating characteristic (ROC) curves showed that the top 10 hub genes had high diagnostic values for patients with CRC. Then, we conducted a gene signature for prognosis of CRC patients by univariate and multivariable Cox regression analyses, which performed well in predicting overall survivals of CRC patients.

Section snippets

Data collection

Series matrix files of GSE32323, GSE74602, and GSE113513 were downloaded from the GEO (http://www.ncbi.nlm.nih.gov/geo/) database. The platforms they based on were GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array), GPL6104 (Illumina humanRef-8 v2.0 expression beadchip), and GPL15207 (Affymetrix Human Gene Expression Array), respectively. The datasets of GSE32323 contained 17 cancer tissues and paired adjacent non-cancerous tissues. The datasets of GSE74602 comprised 30 paired normal and

Identification of DEGs in CRC

In the present study, 1445 DEGs in GSE32323, 1484 DEGs in GSE74602, 1284 DEGs in GSE113513, and 2150 DEGs in TCGA were identified. Among the DEGs, 700, 815, 511, and 1205 genes were upregulated while 745, 669, 773, and 945 genes were downregulated in GSE32323, GSE74602, GSE113513, and TCGA, respectively (Fig. 1A). We used Volcano Plots to visualize the DEGs in different studies. Red dots represent the upregulated genes and green dots represent the downregulated genes. The consistently

Discussion

In this study, we integratedly analyzed three microarray datasets from GEO and RNA sequencing data from TCGA. A total of 207 DEGs consisting of 57 upregulated DEGs and 150 downregulated DEGs were identified between CRC tissues and normal tissues. The functional enrichment analyses demonstrated that the DEGs were enriched in some biological processes such as EMC organization, angiogenesis, cell adhesion, cell differentiation, and cell migration. The results consistent with previous knowledge

Acknowledgments

This study was supported by the Zhejiang Provincial Natural Science Foundation of China (No. LY16H160004) and Natural Science Foundation of Ningbo (No. 2018A610381). The authors thank the contributors of the TCGA (https://tcga-data.nci.nih.gov/) database and GEO (http://www.ncbi.nlm.nih.gov/geo/) database for sharing their data on open access.

Conflict of interest

The authors declare that they have no conflict of interest.

References (44)

  • Y. Zhang et al.

    The regulatory network analysis of long noncoding RNAs in human colorectal cancer

    Funct. Integr. Genom.

    (2018)
  • J. Bogaert et al.

    Molecular genetics of colorectal cancer

    Ann. Gastroenterol.

    (2014)
  • S. Cheng et al.

    Crk-like adapter protein regulates CCL19/CCR7-mediated epithelial-to-mesenchymal transition via ERK signaling pathway in epithelial ovarian carcinomas

    Med. Oncol.

    (2015)
  • C.H. Chin et al.

    cytoHubba: identifying hub objects and sub-networks from complex interactome

    BMC Syst. Biol.

    (2014)
  • J. Hamanishi et al.

    Activated local immunity by CC chemokine ligand 19-transduced embryonic endothelial progenitor cells suppresses metastasis of murine ovarian cancer

    Stem Cells

    (2010)
  • P.J. Heagerty et al.

    Survival model predictive accuracy and ROC curves

    Biometrics

    (2005)
  • I.M. Hisamuddin et al.

    Molecular genetics of colorectal cancer: an overview

    Curr. Color. Cancer Rep.

    (2006)
  • X. Hou et al.

    Genome-wide network-based analysis of colorectal cancer identifies novel prognostic factors and an integrative prognostic index

    Cell. Physiol. Biochem.

    (2018)
  • W. Huang da et al.

    Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources

    Nat. Protoc.

    (2009)
  • Z. Huang et al.

    Identification of critical genes and five prognostic biomarkers associated with colorectal cancer

    Med. Sci. Monit.

    (2018)
  • M. Itakura et al.

    High CC chemokine receptor 7 expression improves postoperative prognosis of lung adenocarcinoma patients

    Br. J. Cancer

    (2013)
  • J. Lu et al.

    Antitumor efficacy of CC motif chemokine ligand 19 in colorectal cancer

    Dig. Dis. Sci.

    (2014)
  • Cited by (134)

    View all citing articles on Scopus
    1

    Feng Xu and Xianpeng Li are co-corresponding authors of this work.

    View full text