Abstract
Background: Lung cancer (LC) is the leading cause of deaths caused by cancer worldwide. A diagnostic test for LC is needed for monitoring high-risk populations. Patients and Methods: Fifty-seven markers were measured using multiplex immunoassays of plasma of patients with non-small cell lung cancer (NSCLC); (245 men, 114 women, 1 unknown), asthma (67 men, 111 women, 2 unknown) and of healthy controls (165 men, 122 women, 1 unknown). Mass spectrometry was used for biomarker discovery. A support vector machine (SVM) was used for data analysis. Results: When all biomarkers and both genders were co-analyzed, SVM classified NSCLC and asthma with an accuracy of 0.94. Restricting to NSCLC versus healthy using best subsets of variables (males: epidermal growth factor (EGF), interleukin-8 (IL-8), soluble Fas (sFas), matrix metalloproteinase-9 (MMP-9), plasminogen activator inhibitor-1 (PAI-1); females: EGF, soluble cluster of differentiation 40 (sCD40) ligand, IL-8) yielded sensitivity and specificity of 1. Expression of eleven mass spectrometric biomarkers differed between pathologies. Conclusion: Significant inter-pathology and gender differences between biomarkers may improve diagnosis of LC.
- Biomarker
- non-small cell lung cancer
- asthma
- plasma
- multiplex assay
- autoantibodies
According to the American Cancer Society estimates, in 2011 approximately 220,000 new cases of lung cancer (LC) will be diagnosed and LC will kill over 157,000 people in the U.S. Non-small cell lung cancer (NSCLC) constitutes about 79% of all LC cases (1-2). The state of the art for the diagnosis of LC is low-dose spiral computed tomography (CT) (3). The efficacy of CT screening for early LC detection in heavy smokers is currently being investigated in clinical trials (4). However, concerns over the effects of repeated radiation exposure (5) and variability of scans (6) have lead to recommendations to minimize the use of diagnostic CT. The 5-year survival rate of LC when the cancer is still localized is 46%, but only 15% of LC cases are discovered while the tumor is in the early stages of development.
The American Lung Association reports that almost 20 million Americans suffer from asthma. Asthma diagnostic tests are typically performed long after the presentation of symptoms, such as recurrent wheezing, coughing, and chest tightness. The current methods of detecting asthma are typically restricted to lung function tests, such as spirometry, or challenge tests, which are often performed along with other tests to rule out other pathologies or reactive airway diseases. The increased risk of LC in never smoking individuals with asthma suggests a relation between asthma and LC (7).
There is no simple, reliable method for diagnosing pathologies of human lung tissues early in their development. Despite some disenchantment with the current state of research on blood molecular markers, reflected in a statement that they are currently unsuitable for clinical application (8), diverse approaches in search for biomarkers of aggressive lung disease are slowly yielding promising results. MicroRNA expression profiles of lung tumors, normal lung tissues and plasma samples from cases with different prognoses identified in a completed spiral-CT screening trial produced cancer signatures that were validated in an independent cohort from a second randomized spiral-CT trial (9). Protein blood markers for diagnosis of human diseases including cancer are coming of age (10), however, no proteomic blood test available today can indicate the presence of particular lung pathologies. As yet, studies on proteomic markers for LC are limited by low numbers of tested analytes or by relatively small and statistically underpowered cohorts of examined subjects. A recent study employed multiplex assays for the analysis of tumor associated auto-antibodies in 117 NSCLC, 30 chronic obstructive pulmonary disorder (COPD)/asthma, 13 nonmalignant lung nodules and 31 normal controls to develop a blood test for the detection of NSCLC (11). Altered levels of three circulating adhesion molecules were reported to have diagnostic value for LC patients (12). A larger study that employed mass spectrometric methods identified a few biomarkers that discriminated NSCLC from non-cancer controls (13). Yet another study that evaluated expression of six serum proteins in 50 controls and 50 LC sera identified four biomarkers for potential diagnosis of LC (14).
Our study used one of the largest cohorts of specimens representing LC, asthma and healthy controls reported to date, as well as the largest number of simultaneously measured analytes (15). We employed two independent methods, multiplex immunoassays and mass spectrometry, to identify panels of plasma biomarkers that can detect NSCLC and asthma and evaluated the potential of these biomarkers for inter-pathology discrimination power to differentiate asthma from NSCLC and from healthy controls.
Patients and Methods
Specimens. Plasma specimens were purchased from Virginia BioFluids, Inc. (Virginia Beach, VA, USA), Bioserve Biotechnologies Ltd (Beltsville, MD, USA) and Cureline Inc. (Burlingame, CA, USA). All samples were processed, stored and analyzed under identical conditions. Demographic information are summarized in Table I.
Multiplex immunoassays. The following analytes were assayed using assay kits from Millipore (St. Charles, MO, USA): leptin, GLP-1, C-peptide, insulin, amylin (total), sE-selectin, sVCAM-1, soluble inter-cellular adhesion molecule-1 (sICAM-1), MPO, CRP, serum amyloid A, serum amyloid P, soluble Fas ligand (sFasL), sFas, MIF, adiponectin, resistin, PAI-1, interleukin-1α (IL-1α), IL-1β, IL-1 receptor agonist (IL-1RA), IL-2, IL-4, IL-5, IL-6, IL-7, IL-8, IL-10, IL-12(p40), IL-15, IL-17, fracalkine, granulocyte colony-stimulating factor (G-CSF), granulocyte macrophage colony-stimulating factor (GM-CSF), interferon γ (IFNγ), monocyte chemotactic protein (MCP-1), macrophage inflammatory protein-1α (MIP-1α), MIP-1β, transforming growth factor-α (TGF-α), tumor necrosis factor-α (TNF-α), and vascular endothelial growth factor (VEGF). In addition, IL-12(p70), IL-13, MMP-2, MMP-1, MMP-3, eotaxin, leptin, interferon γ-induced protein 10 (IP-10), MMP-9, MMP-13, PAI-1, MMP-8, interferon inducible T-cell α chemoattractant (I-TAC), MMP-12, hepatocyte growth factor (HGF), MMP-7, EGF, and sCD40L were assayed using Procarta kits from Affymetrix (Fremont, CA, USA). All specimens were assayed in duplicate following kit manufacturer protocols. Two analytes, leptin and PAI-1, were assayed twice in separate kits from the two different vendors.
Quality control samples were obtained from whole-blood preparations purchased from The South Texas Blood and Tissue Center from which plasma samples were separated using BD Vacutainer CPT Cell Preparation Tubes (Beckton Dickinson, Franklin Lakes, NJ, USA). Multiplex immunoassays were performed using Luminex 100 IS System (Luminex Corp., Austin, TX, USA). Analyte concentrations were calculated from standard curves using Bio-Plex Manager 4.1.1 (Bio-Rad Laboratories, Hercules, CA, USA).
Western blotting. Four pools of eight randomly selected asthma, NSCLC, and control specimens were used to evaluate expression levels of selected target proteins following a modified protocol (16). Briefly, plasma samples were thawed on ice and cleared of insoluble material by centrifugation. Protein concentration was measured using micro bicinchoninic acid assay (BCA) protein assay kit (Pierce Biotechnology, Rockford, IL, USA). Samples containing 20 μg protein were boiled in reducing sodium dodecyl sulfate (SDS) buffer and resolved in 7.5% SDS polyacrylamide gels (Bio-Rad Laboratories, Hercules, CA, USA) with biotinylated protein ladder (Cell Signaling Technology, Beverly, MA, USA) and Kaleidoscope pre-stained protein standard (Bio-Rad Laboratories). The gels were transblotted to Hybond-C nitrocellulose (Amersham Biosciences, Piscataway, NJ, USA) and the membranes were blocked in 5% enhanced chemiluminescence (ECL) Advance Blocking Reagent (Amersham Biosciences). The membranes were incubated overnight at 4°C with the following primary rabbit anti-human antibodies (all from Abcam, Cambridge, MA, USA; dilutions as listed): syntaxin 11 (1:10,000), PDE7B (1:2000), arginase (1:2000), carcinoembryonic antigen (1:250), α1-antitrypsin (1:1000), retinol-binding protein (1:1000). Anti-rabbit IgG horseradish peroxidase (HRP) conjugate was used as secondary antibody (Cell Signaling, Danvers, MA) plus anti-biotin HRP conjugate (1:10,000) to visualize the protein ladder. All antibodies were diluted with 2.5% ECL Advance Blocking reagent. Target proteins were visualized by enhanced chemiluminescence with ECL Advance Western Blot detection kit and captured on Hyperfilm-ECL film (both from Amersham Biosciences). Molecular masses of target proteins were verified against standards. Images were obtained with a densitometer and quantified using ImageQuant software (Molecular Dynamics, Sunnyvale, CA, USA).
Mass spectrometry. The same pooled plasma samples used for western blotting were used for spectrometry analyses. Plasma aliquots were diluted 1:100 in 0.05 M ammonium bicarbonate buffer and digested with proteomic grade trypsin for 2 hours. Digested samples were further diluted 1:100 in 50% acetonitrile for mass spectral analysis. Separations were performed using a ternary water/acetonitrile/0.05% heptafluorobutyric acid gradient on a custom packed 75 μm x 100 mm with Supelcosil ABZ+ stationary phase. Plasma proteins were analyzed using liquid chromatography electrospray ionization mass spectrometry (LC-ESI MS/MS) to identify biomarkers expressed in a significantly different manner in individuals with NSCLC or asthma in comparison with control subjects. Data were analyzed using Mascot software (Matrix Science Inc, Boston, MA, USA). Selection of characteristic ion signals was performed by comparing the mass spectral data for tryptic peptide digests of subjects in different physiological states. The data was the mass of peptide fragments, represented as graphical indications of the intensities of the proteins containing those fragments expressed across time in a single dimension. Thousands of proteins were compared, resulting in the selection of 11 proteins, which were expressed in substantially different intensities between populations of individuals free of any lung tissue pathologies and populations of individuals having asthma as diagnosed by a physician.
Data analysis. Immunoassay data were reduced using inter-pathology comparisons with statistical significance determined using the Kruskall-Wallis U-test with p<0.05 considered significant and p<0.001 highly significant. Scalar sums of inter-pathology percentage differences (NSCLC versus control, asthma versus control and NSCLC versus asthma) in marker expression were used for hierarchically ranking the markers. Support vector machine (SVM) was applied for prediction. Prediction was a two-step process. In the first step a classifier was built describing a pre-determined set of data; this learning step was carried out on a training database. In the second step, the classifier was applied in a validation database and various measures of accuracy, including sensitivity and specificity, were observed. In this SVM application, we used a kernel function known as the Gaussian Radial Basis Function (RBF) (17), defined by k(x,x’)=exp(-σllx-x’II2) where x and x’ are two k-tuples. The RBF is often used when no a priori knowledge is available with which to choose from a number of other defined kernel functions, such as the polynomial or sigmoid kernels (18). The RBF projects the original space into a new space of the same dimension. All statistical computations were performed using R 2.10.0 (19). SVMs were fitted using the ksvm function in the kernlab package. To reduce the number of biomarkers used by the SVM to classify the observations, the F_SSFS method of Lee (20) was extended to an arbitrary number of groups. With this method, a set of variables which are good candidates to be kept in the model is determined. Candidates are selected on the basis of their F-score, where g is the number of groups and nj is the number of observations from group j, which quantified the separation between the values of the variable between the groups. Forward model selection was applied to this variable set, with variables added to the model on the basis of their improvement in the accuracy of the SVM. In this application, variables were biomarkers and groups were pathology categories.
Results
Multiplex immunoassays. In order to develop a proteomic profile for lung pathologies we cast a wide net to identify analytes for inclusion in a screen of biomarkers that may differentiate AST and NSCLC from each other and from normal controls. We selected analytical targets through searches of the scientific literature databases for proteins involved in processes relevant to etiology of asthma and LC and/or other types of human cancer, such as inflammation (cytokines, chemokines, bioactive peptides, growth factors and hormones), apoptosis, invasiveness (adhesion molecules, matrix metalloproteinases), and endorcine/obesity (adipokines, bioactive peptides, growth factors and hormones). Ultimately we selected 57 discrete blood serum or plasma proteins and peptides implicated in the pathogenesis of AST and LC. Most of the selected proteins and peptides were not recognized at the time as diagnostic markers for LC or asthma. Subsequently, we searched for commercially available multiplex immunoassay kits that encompassed the largest numbers of analytes per kit to perform the screen.
In the first stage of data analysis, we compared levels of the markers expressed as percentage differences, between asthma and controls, NSCLC and controls, and asthma and NSCLC (Figure 1A). Plasma specimens from subjects with NSCLC had prominently (>10-fold) higher levels of endothelial growth factor (EGF) and matrix metalloproteinase 9 (MMP-9) in comparison with controls. In asthma specimens, interferon inducible T-cell a chemoattractant (I-TAC), EGF and MMP-9 levels were >10-fold higher than these in controls. When AST was compared to NSCLC, we found much lower levels of interleukin-10 (IL-10) (>10-fold). Subsequently, we segregated and analyzed biomarker expression by sex. When NSCLC was compared to controls (Figure 1B), more proteins were up-regulated in men than in women. In men, levels of hepatocyte growth factor (HGF) and MMP-8 were >300-fold higher in NSCLC versus controls, MMP-1 and plasminogen activator inhibitor (PAI-1) were >10-fold higher, and MIF and MPO were >5-fold higher. In women, the number of differentially expressed markers and the magnitude of their up-regulation were smaller. Among the ≥5-fold up-regulated markers, we noted SAA, EGF, and MMP-9. When we compared differences between women and men with asthma versus controls (Figure 1C), again greater number of proteins were up-regulated in men: HGF (>200-fold), I-TAC (>150-fold), PAI-1 and MPO (∼20-fold), MMP-8 and MMP-9 (>10-fold). For women, MMP-9 was up-regulated >20-fold and I-TAC was up-regulated nearly 10-fold. Interestingly, relatively few markers and small quantitative differences in marker expression were observed in men and women when we compared asthma versus NSCLC (Figure 1D).
Multiplex immunoassays data mining. For data mining and selection of statistically significant disease biomarkers, only specimens with complete records were used (Table II). The database comprised of 824 records with one record per subject; each record contained a lung pathology category (asthma, NSCLC, control), sex, and the values of 59 biomarkers (two markers assayed twice with different kits). One NSCLC, one control and two asthma cases were removed due to the lack of gender information. A further 37 (36 male controls, 1 asthmatic male) were removed from the analysis due to missing biomarker levels. The records were then randomly split into a training set (N=398) and a test set (N=389).
An application of the SVM methodology to the entire test set (Table III) yielded accuracy=0.94 (standard error=0.012); the classifier erroneously assigned 12 asthmatics and 9 controls to the NSCLC lung pathology category.
Excluding the asthma category, individuals were classified with regard to all biomarkers and subsets of biomarkers on the whole test set, with restriction to males, and with restriction to females (Table IV). Without restriction by gender and using all available data, the sensitivity (SE) and specificity (SP) were 0.99 (0.004) and 0.95 (0.013), respectively. SE and SP were reduced on the best subset of four and the best subset of five biomarkers [best subset of four: (SE, SP) 0.93 (0.014) and 0.87 (0.02); the best subset of five biomarkers: 0.95 (0.103) and 0.79 (0.024)]. The performance of the classifier was improved with restriction by gender [males: 1.0, 0.94 (0.018) and females: 1.0, 0.97 (0.016)], and became perfect (both SE and SP=1.0) with restriction to the best subset of four biomarkers among males and three biomarkers among females.
Mass spectrometric discovery. Through semi-quantitative MS analyses followed by Mascot searches, we identified 11 proteins that were differentially expressed in the two pathologies and control specimens, including three putative proteins identified by gene sequencing and representing yet unnamed protein products and a protein corresponding to chromosome X open reading frame 38. In addition, we discovered several known proteins whose role as biomarkers for asthma or LC has not been investigated to date: FERM domain-containing protein 4, which is involved in an intra-cytoplasmic membrane anchorage, JCC1445; a proteasome endopeptidase complex chain C2, AAK13093; UK16 binding protein 3 expressed on the surface of natural killer (NK) cells and T cells, phosphodiesterase 7 (PDE7), arginase, syntaxin 11, and phospholipase D. We used western immunoblots to investigate the presence of four of these proteins, for which human specific antibodies were commercially available. As shown in Figure 2, the levels of PDE7B and arginase were elevated in asthma and NSCLC. In contrast, syntaxin 11 was present at high levels in control plasma but its expression was lower in plasma from asthma and NSCLC. Phospholipase D was undetectable by the antibody used in our test.
In addition, we examined plasma expression levels of three selected markers from the panel assembled by the Duke team (14). In our study, carcinoembryonic antigen was undetectable, retinol-binding protein showed no bands at specified molecular mass and the α1-antitrypsin antibody specificity was low as evidenced by multiple bands (data not shown).
Discussion
The diagnostic methodologies used to identify lung pathologies are in need of improvement. NSCLC is usually clinically silent until the disease becomes advanced. The current best practice is the regular use of spiral CT to monitor persons at risk. The radiation exposure of monitoring carries the risk of inducing malignancies. The diagnosis of asthma rests on functional testing, with little reference to biochemical parameters. Asthma is among the differential diagnoses for LC. In addition, there is evidence of a predisposition to LC among asthma sufferers. These facts taken together with numerous cases of misdiagnosed asthma and the hazards of CT for the early detection of LC warrant a search for better diagnostic tests for these lung diseases. We used multiplex immunoassays and mass spectrometry to identify plasma biomarkers capable of distinguishing asthma from NSCLC.
Large quantitative differences in expression levels of the two markers, leptin and PAI-1, which were assayed twice using different sets of immunobead-coupled antibodies, as well as lack of detection of several protein targets identified by MS in immunoblots, underscore a dependence of marker detection both on the analytical method used and on the specificity of antibodies for target proteins.
The study provides a proof of concept for detecting and quantifying diagnostic protein biomarkers in plasma, which may allow for screening of multiple biomarkers in patient samples. We have identified several panels of markers that correctly classify the physiologic state of an individual with regards to NSCLC and asthma. Our multiplex immunoassays identified some protein markers that had previously been associated with asthma or NSCLC, but none of them had been found to have sufficient diagnostic power. In contrast, the combination of multiple biomarkers in a panel is likely to have clinical utility, and we have identified several strong candidate sets and patterns of expression for further development.
We have discovered unexpected gender differences in biomarker patterns and expression levels. Even though there was some gender and ethnicity imbalance in procured specimens, as men predominated in the NSCLC clinical data set and asthma was dominated by women, the sizes of respective groups allow for unbiased analysis of the experimental data.
In our study, EGF and IL-8 emerged as strong predictors of NSCLC for both men and women. The epidermal growth factor receptor (EGFR) pathway has been identified as a driving force in NSCLC pathogenesis. Diverse therapeutic strategies targeting mainly EGFR (21), but also EGF (22), have shown positive responses in the clinic. In contrast, the association of IL-8 with NSCLC is less strongly established, although IL-8/CXCL8 is known to play crucial roles in chronic inflammation, cancer, and in various other types of lung pathologies.
Several biomarkers with strong predictive value for NSCLC are different for men (sFas, MMP-9, and PAI-1) and women (CD40). The reasons for these gender differences in LC are not clear at present. Scattered information in the public domain indicates that these proteins play different roles in men and women due to differences in hormonal control. For example, estrogen-mediated reduction in macrophage MMP-9 production causes gender-related differences in development of abdominal aortic aneurysm (23). Estrogen is also implicated in the regulation of adipogenesis and adipose metabolism; reduced levels of the prothrombotic protein PAI-1 were found in post-menopausal women (24). Studies in vitro indicate that androgens prime the Fas/FasL dependent apoptotic pathway in kidney tubule cells. The mechanisms of cell death are consistent with a role for androgens in the promotion of chronic renal injury in men (25). Androgens up-regulate atherosclerosis-related genes, including CD40, in macrophages from males but not females (26).
An analysis of gene expression in somatic tissues of male and female mice using analysis of >20,000 transcripts revealed a large degree of sexual dimorphism in the expression of thousands of genes with highly tissue-specific expression patterns (27). Thus, genome differences of <2% between men and women are also likely to translate into dramatic gender-specific differences in gene expression, with significant consequences for elaboration of different biomarkers at the proteome level both in health and disease. Indeed, gender differences were described in autoimmune disease (28). The female sex hormones may be immunostimulatory by increasing immune interactions, including production of cytokines including IFNγ, IL-6 and IL-1. Examples of sex differences in inflammatory cytokine production have been described (29, 30). A recent report described gender differences in LC, however, transcriptome analysis linked these differences exclusively to sex chromosomes (31). Based on the results of our study, the markers we identified as having diagnostic value are products of genes residing on multiple chromosomes and are not limited on sex chromosomes.
Mass spectrometric biomarker discovery has promise in early detection of human cancer, provided there is validation of the findings through clinical studies (32). In our study, mass spectrometric discovery revealed novel markers that may have diagnostic potential. For example, expression of syntaxin 11 has not been examined in studies of LC or asthma, although a related syntaxin 1 gene was included in a three-gene prognostic classifier that can stratify early-stage NSCLC patients (33). PDE7 was found in inflammatory cells from patients with asthma and COPD, and there is evidence of altered PDE7 mRNA transcript levels in peripheral blood of asymptomatic asthmatic patients and individuals with stable COPD (34). The cAMP-specific phosphodiesterase isoenzyme family PDE4 represents the highest cAMP-hydrolyzing activity in human cancer cell lines (35). Elevated levels of phospholipase D were reported in vitro in a human lung carcinoma cell line that expresses an activated K-RAS gene (36). Detection of high levels of arginase activity in a mouse model of asthma suggested that arginine pathways are critical in the pathogenesis of asthma (37). High levels of ARG2 that correlated with tumor grade were found in NSCLC (38).
We anticipate the combination of disease biomarker panels identified through immunoassays with other markers discovered through mass spectrometry in order to maximize the diagnostic power of the biomarker sets. Thus, future plans include continued validation of marker expression by immunodetection, with both commercially available and newly generated antibodies to putative target proteins, followed by development of multiplex immunoassays that encompass the strongest biomarker candidates from the present study.
Despite all efforts, no new major cancer biomarkers have been approved for clinical use for over 25 years and cautionary tales of a number of disappointing studies have been published in eminent journals. This fact has been vocally addressed in the public arena (39). Our findings are expected to have implications for the diagnosis of NSCLC and AST and assist in early detection of these diseases through simple blood tests and possible characterization of disease progression and in differentiation among the pathologies. We hope thereby to improve current clinical practice and further advance the struggle to bring personalized medicine into routine use. Our results may also find applications in the discovery and development of novel therapeutic strategies for a variety of human diseases and for monitoring responses to therapeutic interventions.
Footnotes
-
Abbreviations: AST: Asthma; BCA: bicinchoninic acid assay; CD: cluster of differentiation; COPD: chronic obstructive pulmonary disorder; CRP: C-reactive protein; CT; computed tomography; ECL: enhanced chemiluminescence; EGF: epidermal growth factor; G-CSF: granulocyte colony-stimulating factor; GLP: glucagon-like peptide; GM-CSF: granulocyte macrophage colony-stimulating factor; HGF: hepatocyte growth factor; HRP: horseradish peroxidase; IL: interleukin; IP: interferon gamma-induced protein; I-TAC: interferon inducible T-cell α chemoattractant; LC: lung cancer; LC-ESI: liquid chromatography electrospray ionization; MCP: monocyte chemotactic protein; MIF: macrophage migration inhibitory factor; MIP: macrophage inflammatory protein; MMP: matrix metalloproteinase; MPO: myeloperoxidase; MS: mass spectrometry; NK: natural killer; NSCLC: non-small cell lung cancer; PAI: plasminogen activator inhibitor; RA: receptor agonist; RBF: Radial Basis Function; sCD40: soluble CD40; SDS: sodium dodecyl sulfate; sFas: soluble Fas; sFasL: soluble Fas ligand; sICAM: soluble inter-cellular adhesion molecule; sVCAM: soluble vascular cell adhesion molecule; SVM: support vector machine; TGF: transforming growth factor; TNF: tumor necrosis factor; VEGF: vascular endothelial growth factor.
-
Author Contributions
Conception and design; E. Izbicka, R. Streeper, collection and assembly of data; R. Streeper, E. Izbicka, A, Diaz, D. Campos, data analysis and interpretation; R. Streeper, E. Izbicka, J. Michalek, C. Louden, manuscript writing and final approval; E. Izbicka, R. Streeper, J. Michalek, C. Louden.
-
Conflict of Interest
All Authors are employees or consultants for Cancer Prevention and Cure, Ltd.
-
Source of Funding
Cancer Prevention and Cure, Ltd.
- Received October 9, 2011.
- Revision received November 14, 2011.
- Accepted November 15, 2011.
- Copyright© 2012 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved