Abstract
Background: The stage of colorectal cancer (CRC) at the day of diagnosis has the greatest influence on survival rate. Thus, for CRC, which is mainly identified as advanced disease, non-invasive, molecular blood or stool tests could boost the diagnosis and lower mortality. Evaluation of miRNA expression levels in serum of patients diagnosed with CRC is a potential tool in early screening. Screening can be supported by machine learning (ML) as a tool for developing a cancer risk predictive model based on genetic data. Materials and Methods: miRNA was isolated from the serum of 8 patients diagnosed with CRC and 10 patients from a control group matched for age and sex. The expression of 179 miRNAs was determined using a serum/plasma panel (Exiqon). Determinations were conducted using real-time PCR technique on an Applied Biosystems QuantStudio3 device in 96-well plates. A predictive model was developed through the Azure Machine Learning platform. Results: A wide panel of 29 up-regulated miRNAs in CRC were identified and divided into two subgroups: 1) miRNAs with significantly higher serum level in cancer patients vs. controls (24 miRNAs) and 2) miRNAs detected only in cancer patients and not in controls (5 miRNAs). Re-analysis of published miRNA profiles of CRC tumours or CRC exosomes revealed that only 2 out of 29 miRNAs were up-regulated in all datasets including ours (miR-34a and miR-25-3p). Conclusion: Our research suggests the potential role of overexpressed miRNAs as diagnostic or prognostic biomarkers among CRC patients. Such clustering of miRNAs may be a potential direction for discovering new diagnostic panels of cancer (including CRC), especially using ML. The low correspondence between deregulation of miRNAs in serum and tumour tissue revealed in our study confirms previously published reports.
Colorectal cancer (CRC) is one of the most common cancers and the third leading cause of cancer-related deaths, especially in developed countries (1). Morbidity from CRC is strongly related to certain environmental factors, such as low-fibre diet, addictions (alcohol consumption and smoking) and lack of physical activity (2, 3). Because of the high frequency and impact of CRC on general public health, the studies on its genetic background include searching early genetic markers of this cancer at the chromosomal, molecular, and epigenetic levels (4, 5). Although some non-invasive techniques for CRC screening exist, colonoscopy still remains the gold standard (6). As some patients avoid or refuse this procedure, establishing a new genetic screening test from blood would be greatly beneficial for CRC prevention and early diagnosis.
The stage of CRC at the day of diagnosis plays the largest role in survival rates. Notably, when CRC is diagnosed at a localized stage, the 5-year survival rate ranges between 89% and 92%. Overall survival is significantly shorter when CRC is diagnosed at an advanced stage (7). Thus, for CRC, which is mainly diagnosed after noting specific symptoms of advanced disease (e.g. blood in stool and then by biopsy during colonoscopy), non-invasive molecular tests from blood or stool could influence the percentage of early stage recognition and lower the mortality (5, 8).
The role of microRNA (miRNA) in initiation, development and metastasis is widely described beside the known genetic background in various cancers including colorectal cancer (3, 4, 9-12). MiRNAs are small 19- to 25-nucleotide noncoding RNAs that by binding to target messenger RNAs (mRNAs) can regulate the expression of different genes. They can act either as oncomirs which are responsible for down-regulation of tumour suppressor genes or as suppressor miRNAs down-regulating oncogenes (3). It is estimated that about one-third of human genes are regulated in this way. Therefore for cancers, where this regulation is aberrant, miRNAs are good potential biomarkers suitable for patient screening and diagnosis even in early stages (3, 13-15).
The role of some miRNAs affecting genetic pathways involved in oncogenesis of CRC has been previously identified (3). It was revealed that the miR-135 family and miR-34 family were involved in the Wnt signalling pathway together with miR-145, miR-29b and miR-146a (16, 17). Also, miR-224 was reported to activate this pathway (18). Another signalling pathway crucial for CRC, the EGFR pathway, can be activated in different ways by the miRNAs let 7, miR-143, miR-145, miR-18a, miR-126, miR-21, miR-32, miR-92a and miR-181a (19, 20). Moreover, miR-520a and miR525a acting on PIK3CA can modulate activation of Akt-dependent signalling (21). Also KRAS expression can be inhibited by let-7, miR18a, miR30b, miR143 and miR-145 (3, 11). For the TGF-β signalling pathway miR-21 and miR-135b and also miR106a and miR301a were reported to play an important role while changes in expression of miR-25 and miR187 influenced this pathway via the SMAD7 gene (22, 23). Additionally, for TP53 deficiency the miR-34 family, miR-125b and miR-215 were reported to be dysregulated, promoting cancer cell growth and proliferation (3, 11).
Machine learning, a subdiscipline of artificial intelligence, has been previously described as a useful tool for analysis of large, complex data sets (24, 25). Computer algorithms involved in machine learning are expected to improve with experience. In the field of genomics, the method can be used, for example, to recognize the transcription start sites in genome sequence or to analyse the miRNA expression patterns (26, 27). Azure Machine Learning (Azure ML, Microsoft, Redmond, WA, USA; https://azure.microsoft.com/en-us/services/genomics/) is a cloud service that enables the machine learning process to be performed. This website provides a workspace that helps users build and test predictive models.
The aim of our study was to evaluate the level of expression of a miRNA panel in serum of patients diagnosed with CRC in comparison to serum of healthy patients and to find potential and applicative biomarkers.
Materials and Methods
Genetic analysis. miRNA was isolated from serum of 8 patients diagnosed with CRC (Table I) and 10 patients from a control group matched for age and sex. The study was accepted by the Ethical Committee of Wroclaw Medical University (approval number 644/2014).
Clinicopathological characteristics of patients.
Serum samples from study and control groups were used for total RNA isolation (using EXIQON miRCURY, Qiagen, Hilden, Germany). The control of extraction efficacy was checked with two internal standards – UniSp2 and UniSp4 – but the cDNA synthesis control was checked with UniSp6. The signal of listed controls (Cp) over 37 was an indicator of a weak yield during material preparation. To assess the level of possible haemolysis in the material and the impact of reaction inhibitors in each sample, the levels of miR-451 and miR-23a expression were determined. Samples with expression of tested isoforms – ΔCp (23a-451) above 7 were rejected from further analysis due to high haemolysis. Expression of 179 circulating miRNA isoforms in serum was determined using a kit provided by EXIQON (Serum/plasma miRNA miRCURY LNA, Qiagen). Determinations were conducted using real-time PCR technique on an Applied Biosystems QuantStudio3 device in 96-well plates (Thermo Fisher Scientific, Waltham, MA, USA). Expression profile in cancer samples compared to expression levels of miRNAs in healthy individuals’ serum were calculated by:
From the obtained log ratio score (LR) a base 2 logarithm was calculated. LR values over 1 (overexpression) and lower than -1 (suppression) were taken into consideration in further analysis.
Machine learning. We built a supervised classification learningmachine model using Azure Machine Learning (Microsoft, Redmond, WA, USA) (Figure 1). First, to reduce data dimension we used Principal Component Analysis to select 20 miRNAs, for which we observed a statistically significant difference in expression levels between the control and cancer groups. Subsequently, normalisation of the columns of dataset to zero mean was performed.
Workflow of modelling using Azure ML service showing the steps of bioinformatics analysis using Two-Class Bayes Point Match machinelearning algorithm.
We constructed predictive models using various two-class algorithms, such as Decision Forest, Decision Jungle, Bayes Point Machine, Support Vector Machine, and Neural Network. After accuracy and precision analysis we decided to train our set using the Bayes Point Machine method for tumour prediction based on the expression of selected serum miRNAs. Bayes Point Machine was run with 30 iterations, bias inclusion and allowance of unknown values. Small sample size limited our ability to split data into training and test sets, therefore, we used the cross validation algorithm (10 folds were used) which is dedicated to evaluate machine learning models on a limited data sample. The data set was randomly divided into 10 subsets of equal size. One subset was selected for each iteration and was used to test the trained model on 9 subsets (Figure 1). The experiment has been published to link: https://gallery.cortanaintelligence.com/Experiment/New-CRC-miRNA
miRNA data re-analysis. Data on previously published miRNA expression profiles of CRC and healthy adjacent tissues were obtained from Gene Expression Omnibus (GEO), accession numbers: GSE35834 and GSE41655 (28). GSE35834 included 23 normal tissues and 31 tumours, whereas GSE41655 included 15 normal tissues and 33 tumours. In addition we re-analysed GSE40247 including miRNA profiles obtained from serum exosomes from 11 healthy volunteers and 68 CRC patients (we excluded individuals with stage I of disease) (29). To assess which miRNAs were differentially expressed we first limited data to intersection with 29 miRNAs that were up-regulated in our initial study (see Results). Subsequently we employed non parametric ANOVA type analysis using the npmv 2.4.0 R package to assess the significance of differences. A plot was generated using the nmf 0.23.0 R package.
Results
Genetic analysis. Congruent with previously published studies, our findings revealed the up-regulation of dozens of miRNAs in CRC-diagnosed patients’ serum, in comparison to healthy cases. We identified a wide panel of 29 up-regulated miRNAs in CRC divided into two subgroups: 24 of statistically significantly higher serum level in cancer patients and 5 detected repeatedly only in cancer patients compared to the control group (full list in Table II).
Two groups of up-regulated miRNAs in analysed CRC samples (p<0.05).
miRNA data re-analysis. Results of previously published miRNA data re-analysis are visualized in Figure 2. Out of 29 up-regulated miRNAs (see Table II) 23 miRNAs were present in all three re-analysed datasets. Significant upregulation in CRC was detected in all three datasets for miR-34a and miR-25-3p. We also detected significant downregulation of several miRNAs in colorectal tumours including miR-324-3p, miR-125b-5p, miR-199a-5p, miR-29c-3p and miR-30e-5p. Interestingly, three of these miRNAs (miR-125b-5p, miR-199a-5p and miR-29c-3p) were up-regulated in serum exosomes of CRC patients (see Figure 2, GSE40247). The remaining 16 miRNAs (out of 23) displayed no significant changes in gene expression between normal and CRC status.
Heatmap illustrating the results of re-analysis of three miRNA datasets using nonparametric ANOVA. Stars indicate miRNAs that were detected exclusively in the serum of CRC patients in our study. Note that GSE40247 includes miRNA profiles obtained from serum exosomes.
Machine learning. We developed a predictive model with 19 cases through the Azure ML platform (Figure 1). First, we tested a model using various algorithms, such as Two-class Decision Forest, Two-class Decision Jungle, Two-class Bayes Point Machine, Two-class Support Vector Machine, and Two-class Neural Network. Two-class Bayes Point Machine showed the best accuracy and precision, which were respectively 0.947 and 0.917. The AUC of the ROC curve was 0.955 (Figure 3). The threshold has been set to the optimum cut-off value of 0.4.
Results of the validation of Two-class Bayes Point Machine. (A) Accuracy, Precision, Recall, F1 Score and Threshold. (B) ROC curve.
Discussion
Recently, the dysregulated expression of various miRNAs was highlighted in terms of CRC diagnostics (30). It has been shown that miRNAs contribute to the initiation and progression of numerous oncogenic molecular events (10). The miRNA’s involvement in CRC development may be bidirectional: tumour-promoting or tumour-suppressing (31). It means that miRNAs can induce cell proliferation and inhibit apoptosis - leading to tumour initiation - or act as suppressors through reducing tumour susceptibility to metastasis and invasion. Transcripts up-regulated in cancers, called oncogenic miRNAs (oncomirs), usually inhibit the expression of tumour-suppressor genes and stimulate carcinogenesis (31). Thus, miRNA may be utilized as clinical cancer biomarkers (CRC included), since they exhibit high tissue specificity, altered expression in tumour cells and relative stability (32). MiRNAs which are expressed significantly more highly or only in CRC patients, can be used in the early detection of this cancer in non-invasive screening tests (33, 34). Moreover, miRNA-oriented diagnostics offers flexibility in choice of clinical material, because it can be obtained from tumour tissue, the patient’s blood serum or stool. Examples of widely described CRC oncomirs are miR-21, 92a, 96, 135a/b, 155, 224, 214, 31, 210, 182/503, 200c, 301a (10, 35). Well-identified tumoursuppressive CRC miRNAs include let-7, miR-194, 143/145, 34a, 126, 27b, 7, 18a-3p, 26b, 101, 144, 320a, 330, 455, 149 (3, 10). Nevertheless, the function of multiple miRNAs can be ambiguous, and some transcripts are described as both oncomirs and tumour suppressors, such as mir155 (36, 37).
Most identified miRNAs mediate in various pathophysiological conditions, such as cancer (38), chronic inflammatory diseases (39, 40), osteoporosis and diabetes (41), through interfering in different molecular and metabolic pathways: targeting tumour-suppressor genes and activating oncogenic transcription factors (42), affecting cell cycle arrest, pluripotency (43), glucose metabolism (44) and angiogenesis (45). Some miRNAs have already been associated with CRC risk, patient prognosis or treatment outcomes (11). Interestingly, many sources indicate that above-mentioned CRC miRNAs present contrasting functions.
Our research also suggests the potential role of overexpressed miRNAs as a diagnostic or prognostic panel among CRC patients. We revealed a group of 29 miRNAs which present significantly higher expression in CRC patients than in the healthy population. Compared to other studies, our project developed a wider panel of 29 overexpressed miRNAs, found regularly in every analysed cancerous sample, with high precision and a high accuracy rate - up to 0.947. In the study of Kanaan et al. a panel of 8 miRNAs (miR-532-3p, miR-331, miR-195, miR-17, miR-142-3p, miR-15b, miR-532, and miR-652) was detected with a lower accuracy rate of 0.868 (46). In the project of Tan et al. a panel of 3 miRNAs (miR-144-3p, miR-425-5p, and miR-1260b) was identified with sensitivity and specificity of 93.8% and 91.3%, respectively, while Zhang et al. detected a seven-miRNA signature for CRC diagnosis (miR-103a-3p, miR-127-3p, miR-151a-5p, miR-17-5p, miR-181a-5p, miR-18a-5p and miR-18b-5p) with an accuracy up to 0.895 (47, 48). Thus, our study developed an expanded and specific potential model for CRC diagnostics. Moreover, all the above-mentioned projects are based mostly on overexpression or dysregulated expression of various miRNAs, while our panel combines overexpressed CRC miRNAs with miRs detectable only in cancer patients.
Despite the overexpression in analysed samples, it has been reported that some of the defined miRNAs may act as CRC oncomirs (such as miR-19a-3p) and tumour suppressors as well [such as miR-34a (49) or let-7 (50)]. Similarly to other research related to CRC miRNA panels, we show the overexpression of the following transcripts: miR-144-3p (47), 19a-3p (51), miR-103a-3p, miR-151a-5p (48). Compared to other studies, our project revealed a wider panel of 29 overexpressed miRNAs, found regularly in every analysed sample.
We also proposed a predictive model using machine learning, which could be a potentially useful and readily available tool for diagnosing a colorectal cancer risk group using serum miRNA tests. Using machine learning, it is possible to analyse many biological variables that may affect the process of cancer formation (52). On the basis of the entered data, the tool creates an in silico model of the probability of cancer occurrence according to a given biological profile, in our case a specific serum miRNA panel. To assess the suitability of this model, it is necessary to test a large group of people.
Finally, by re-analysing published miRNA profiles of CRC tumours or CRC exosomes we revealed that only miR-34a and miR-25-3p were up-regulated in all datasets including ours. Such low correspondence between miRNA deregulation serum and tumour tissue has been previously reported by others. For example, Zhu et al. found that only 10 out of ~100 deregulated miRNAs are shared between serum and corresponding breast tumours (53). Recently, Gmerek et al. (2019) found no overlap in miRNAs regulated in CRC tissue and serum of CRC patients. This suggests that serum deregulated miRNAs may not be directly secreted from colorectal tumour cells but rather corresponds to other cancer-related conditions, i.e. systemic inflammation and oxidative stress (54).
The small sample size in the discovery dataset limited our ability to split data into training and test sets, which is the limitation that this study hopes to partially mitigate through the application of cross-validation. Another limitation is that the public miRNA data sets included in this analysis were profiled from different platforms which may lead to poor extrapolation of the findings.
Conclusion
Our preliminary results show that numerous miRNAs of various functions discovered to date can be overexpressed in CRC. As described, the identified panel consists of multiple miRNAs presenting different spectra of molecular involvement, which together appear to be a predictive panel with high diagnostic efficiency for CRC. To confirm our results the study should be continued on a larger group of samples. However, even for the present group the analysis clearly indicated repeatability of co-incidence of detected miRNAs. The possible detectability of above-mentioned miRNAs in CRC may be related to interference or functional synergism of individual miRNAs in the mediation of carcinogenesis. Such clustering of miRNAs may be promising for discovering new diagnostic panels of cancer, including CRC.
Footnotes
↵# These Authors contributed equally to this study.
Authors’ Contributions
IL, LL, PK and PZ carried out the molecular genetic studies. DP, IL, LL and SS drafted the manuscript. DP, PK, JP and BK created a database including the clinical data of patients. DP, IL and LL conceived of the study, participated in its design and coordination. LL and PK prepared bioinformatic analyses. WW provided financial support. All authors read and approved the final manuscript.
Conflicts of Interest
The Authors declare no conflicts of interest.
- Received March 1, 2022.
- Revision received April 15, 2022.
- Accepted April 19, 2022.
- Copyright © 2022, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).