Abstract
Background/Aim: The recombination of V, D, and J immunoglobulin (IG) gene segments leads to many variations in the amino acids (AAs) encoded at that site, the complementarity determining region-3 (CDR3). Thus, cancer patients may have varying degrees of CDR3 AA binding specificity for cancer proteases, for example, matrix metalloproteinase 2 (MMP2). MMP2 in breast cancer has been found to contribute to metastasis and is used as a marker for tumor staging. Thus, this report evaluated the tumor resident, patient specific IG CDR3 binding affinities to cancer proteases to test the hypothesis that greater binding affinities would be associated with a better outcome. Materials and Methods: Using two independent bioinformatics tools, we evaluated the IG CDR3-MMP2 binding affinities throughout the cancer genome atlas breast cancer (TCGA-BRCA) dataset. Results: Results indicated that the better the CDR3-MMP2 binding, the better the survival probability. An analogous evaluation for four other proteases, including calpain-1 and thermolysin, displayed no such associations with survival probabilities. Conclusion: This study is consistent with the possibility that patient IG-cancer protease interactions could impact outcomes and raises the question of whether therapeutic antibody targeting of MMP2 would reduce breast cancer mediated tissue destruction and breast cancer mortality rates.
The impact of proteases on proteins can influence cell proliferation and differentiation; DNA replication and transcription; and apoptosis and immunity (1, 2). In the cancer setting, proteases and their inhibitors and substrates are referred to as the cancer degradome, and this system ultimately plays a role in cancer cells spreading across tissue membranes (3). It appears that the most appropriate time to use a protease inhibitor is when the cancer is first detected (4-6). Overall, data indicate that therapeutic protease inhibitors reduce tumor growth and decrease the spread of cancer cells (7, 8).
The numerous functions of matrix metalloproteinases (MMPs) include degradation of extracellular matrix (ECM) proteins, membrane receptors, growth factors, and cytokines (9). Twenty-eight MMPs have been identified, with at least 23 distinct MMPs expressed in humans (10). MMPs can be sub-classified into collagenases, gelatinases, metrilysins, stromelysins, and membrane-type MMPs (10). MMP2 is a gelatinase, which serves to degrade the ECM and is involved in tumor cell growth, invasion, and metastasis.
Overexpression of MMP2 is a common biomarker of high-risk breast cancer, due to its strong, positive correlation with increased metastasis, presumably due to the perforation of the basement membrane, allowing for the movement of cancer cells into the circulatory system. MMP2 overexpression is highly correlated with shorter overall survival (OS) (11). When common prognostic indicators related to breast cancer are contradictory, MMP2 has still been shown to be independently linked to reduced survival probabilities. Normal breast tissue and benign breast lesions rarely express MMP2, as opposed to malignant tumor cells and tumor microenvironment cells. Likewise, invasive breast cancer has a higher level of MMP2 activity than noninvasive breast cancer (11-14).
MMPs were initially targeted utilizing MMP inhibitors (MMPIs) such as Batimastat and Marimastat, both of which are drugs that lacked specificity. This absence of specificity resulted in the inhibition of certain MMPs that were not the intended drug targets. As such, these initial MMPIs displayed significant toxicity, as seen with the development of musculoskeletal syndrome in several patients who were administered the MMPIs. Novel MMPIs, which are monoclonal antibodies, seem to effectively reduce tumor growth and metastasis; one such example is DX-2400, which is a monoclonal antibody directed against MMP14. Specifically, such recent MMPIs have been engineered to be more specific, more selective, and to have lower toxicity (9).
To obtain a better understanding of the relationship between endogenous, patient immunoglobulin (IG) interactions with cancer proteases, such as MMP2, we bioinformatically assessed the binding affinities of IG complementarity determining region-3 (CDR3) amino acid (AA) sequences and cancer proteases, with results indicating that in the breast cancer setting, a higher CDR3-MMP2 affinity associated with a better survival probability.
Materials and Methods
Recovery of the IG recombination reads from genomics files. The recovery of the IGH (heavy chain), IGL (lambda light chain), IGK (kappa light chain) recombination reads from the cancer genome atlas (TCGA), breast cancer (BRCA) exome files (WXS files) was performed using methods previously described (15-18). The collection of the IG recombination reads recovered was sourced from the following publications for previous analyses (19, 20) via database of genotypes and phenotypes (project approval number 6300). The computer code for the recovery of the recombination reads is publicly available at https://github.com/bchobrut-USF/blanck_group, with a readme file. There is a container version of the code, also with a readme file, at https://hub.docker.com/r/bchobrut/vdj. And, there are some refinements of the basic code here: https://github.com/kcios/2021. The data representing the recombination reads for this report are in the supporting online material (SOM, Table S1). Refer to URL at the end of this article for accessing the SOM tables.
Programmatic collection of protease sensitivity scores based on the algorithm of the SitePrediction web tool. The TCGA-BRCA IG CDR3 AA sequences were evaluated according to the algorithm used by the SitePrediction browser-based application (21) (https://www.dmbr.ugent.be/prx/bioit2-public/SitePrediction/). Protease cleavage site scores for the IG CDR3s were established for the following proteases: caspase 3 (CASP3) calpain 1 (CAPN1), cathepsin B (CTSB), thermolysin (MME), and matrix metalloprotease-2 (MMP2). For each CDR3 AA sequence, only the top three most probable cut sites, as defined by the top three “average score” parameter generated by the SitePrediction tool, were retained and compiled for each protease. The cut site predictions for the MMP2 protease with their corresponding parameters are available in the SOM (Table S2). Additional details are in presented in the Results section.
IG CDR3-cancer protease peptide complementarity scoring algorithm. The chemical complementarity scores (CSs) for the IG CDR3s and protease AA sequences were obtained via the algorithm first described by Chobrutskiy and colleagues (22). The computer code for application of the algorithm is freely available at https://github.com/bchobrut/brca_swcs. In addition, the complementarity scoring, along with a matchup of the CSs and survival data, can be done via the publicly available adaptivematch.com (23-25), with the web tool having been specifically benchmarked in (25). The Adaptive Match web tool has instructions for use. See also SOM Tables S3-S7 for Adaptive Match input and output files for this article. For this article, Combo CSs, which are CDR3-candidate epitope CSs based on a simultaneous evaluation of electrostatic and hydrophobic attractiveness, were primarily evaluated. However, there are also evaluations based on Electrostatic CSs, where only the AA charges at physiological pH were used in the CS calculation; and Hydro CSs, where only Uversky hydropathy (hydrophobicity) values (26) were used in the calculations. As noted in the Results section, MMP2 was fragmented for use of the Adaptive Match web tool. The individual MMP2 peptides represented: Segment 1 (Collagenase-like 1 domain, AAs 110-221), Segment 2 (Collagen binding domain, AAs 220-396), Segment 3 (Collagenase-like 2 domain, AAs 397-465), Segment 4 [“Required for inhibitor tissue inhibitor of metalloproteinases 2 (TIMP2) binding” domain, AAs 414-660], and Segment 5 (Cysteine switch domain, AAs 100-110). The parenthetical AA numbers represent MMP2 isoform 1 catalogued at www.ncbi.nlm.nih.gov/protein/NP_004521.1. In the case of CTSB, with further details also available in the Results section, the segments (peptides) used in the Adaptive Match analysis were generated by subdivision of the full-length CTSB AA sequence into six approximately equal lengths.
Results
Identifying IG CDR3 MMP2 sensitivities associated with a survival distinction. The TCGA-BRCA tumor and blood WXS files were searched for IG recombination reads and the CDR3 AA sequences resulting from those recombination reads were translated (Table S1) (19, 20). All CDR3 AA sequences for each case ID were assessed for their level of sensitivity to several proteases, including MMP2, particularly due to the strong association of microenvironment MMP2 expression with breast cancer cells (27, 28) (Tables S1 and S2). The assessment was done with the SitePrediction web tool (See Methods section). The multiple protease sensitivity scores, for a given protease, for each case ID, based on the collection of CDR3 AA sequences available for that case ID, were averaged. Then, the case IDs were grouped into the top and bottom 25th percentiles based on the average sensitivity score for their CDR3s, and the two percentile groups were compared for a survival distinction, using KM analyses (29) (Table I). Results indicated that TCGA-BRCA tumor, IG CDR3s with a higher level of MMP2 sensitivity also represented a high overall survival (OS) probability (logrank p=0.039, Table I, Figure 1A). When the same analysis was repeated with LUAD primary tumor IG CDR3’s and MMP2, such a survival distinction was not indicated (logrank p=0.476, Table I, Figure 1B). When the analysis was repeated with BRCA blood IG CDR3’s and MMP2, the survival distinction noted with the analysis of the BRCA primary tumor IG CDR3s was also not apparent (logrank p=0.581, Table I, Figure 1C). And, with analysis via the MME protease, tumor resident, BRCA IG CDR3s, no survival distinctions were apparent (logrank p=0.4857, Table I, Figure 1D). The OS correlations were also not observed for BRCA IG CDR3 AA protease sensitivity distinctions for CAPN 1, CASP3, and CTSB (Table I).
Survival distinctions related to IG CDR3, protease sensitivities for the combined IG set (IGH, IGK, IGL) representing TCGA-BRCA IG recombination reads extracted from primary tumor WXS files.
Survival distinctions represented by breast cancer (BRCA) immunoglobulin (IG) CDR3 AA sequence protease sensitivities. Kaplan-Meier (KM) analyses of the overall survival (OS) for case ID’s representing the top 25th percentile of CDR3 protease sensitivity (black) versus the bottom 25th percentile of sensitivity (gray), according to the SitePrediction algorithm (See Methods section). (A) BRCA primary tumor IG CDR3s and MMP2; logrank p=0.0392. (B) LUAD primary tumor IG CDR3s and MMP2; logrank p=0.4766. (C) BRCA blood derivative IG CDR3’s, matrix metalloproteinase 2 (MMP2). Logrank p=0.581. (D) BRCA primary tumor IG CDR3’s, thermolysin (MME). Logrank p=0.4857.
TCGA-BRCA IG CDR3-MMP2 peptide complementarity scoring associated with an OS probability distinction. Using the algorithm described in (22), the IG CDR3 AA sequences representing the BRCA primary tumor case IDs were assessed for their CSs for MMP2 AA sequences. The results were grouped into the upper and lower 50th percentiles, with regard to the CSs. We found that the Combo CSs representing MMP2 peptide AA sequence Segments 1, 4 and 5 represented OS probability distinctions (Methods section; Table S6; Figure 2). These results indicated that instances of high chemical complementarity, associated with a higher survival rate, were consistent with the results of the preceding subsection indicating that a greater attractiveness of the protease for the CDR3 correlated with a higher OS probability (Table I). To further assess the apparent specificity of the OS distinction based on IG CDR3 AA-MMP2 interactions, the chemical CS process was repeated for the LUAD IG CDR3-MMP2 segments. The same MMP2 segments 1, 4 and 5, used in the BRCA analyses above, were used in the LUAD analyses. The top 50th percentile of the Combo CSs for each segment (independently) was compared to the bottom 50th percentile for OS distinctions. These results indicated that the lack of OS distinctions, for LUAD, based on IG CDR3-MMP2 interactions using the protease sensitivity algorithm above (Table I, Figure 1), were confirmed by the complementarity scoring approach (Figure 3).
Survival distinction represented by chemical complementarity of breast cancer (BRCA) primary tumor immunoglobulin (IG) CDR3 AA sequences and matrix metalloproteinase 2 (MMP2) AA sequence segments. (A) Kaplan-Meier (KM) analysis for BRCA primary tumor IG CDR3 AA sequences and MMP2 Segment 1 based on the Combo complementarity score (CS) calculations (See Methods section). Upper 50th percentile CSs, black; bottom 50th percentile CSs, gray; logrank comparison p=0.0189. (B) KM analysis for BRCA primary tumor IG CDR3 AA sequences and MMP2 Segment 4 based on the Combo CS calculations. Upper 50th percentile CSs, black; bottom 50th percentile CSs, gray; logrank comparison p=0.0286. (C) KM analysis for BRCA primary tumor IG CDR3 AA sequences and MMP2 Segment 5 based on the Combo CS calculations. Upper 50th percentile CSs, black; bottom 50th percentile CSs, gray; logrank comparison p=0.0465.
Survival distinction represented by Combo complementarity scores (CSs) for LUAD primary tumor immunoglobulin (IG) CDR3 AA sequences and matrix metalloproteinase 2 (MMP2). (A) Kaplan-Meier (KM) analysis for LUAD primary tumor IG CDR3 AA sequences and MMP2 Segment 1 based on Combo CS calculations. Upper 50th percentile, black; bottom 50th percentile, gray; logrank comparison p=0.951. (B) KM analysis for LUAD primary tumor IG CDR3 AA sequences and MMP2 Segment 4 based on the Combo CS calculations. Upper 50th percentile CSs, black; bottom 50th percentile CSs, gray; logrank comparison p=0.8288. (C) KM analysis for LUAD primary tumor CDR3 AA sequences and MMP2 Segment 5 based on the Combo CS calculations. Upper 50th percentile CSs, black; bottom 50th percentile CSs, gray; logrank comparison p=0.620.
TCGA-BRCA IG CDR3 CTSB peptide CS associated with survival distinction. To analyze the survival distinction for another protease, for the BRCA primary tumor IG CDR3 sequences, CTSB was considered, because it indicated with the SitePrediction based algorithm, a trend in the direction of a survival distinction (logrank p=0.085) (Table I, Figure 4A). Thus, we repeated the chemical CS calculation process, using the BRCA tumor IG CDR3s and CTSB protein AA sequence segments. This approach indicated that the upper 50th percentile of Electrostatic CSs for the IG CDR3s and CTSB Segment 2 was associated with a higher survival probability (log rank p=0.0098, Figure 4B). Likewise, the upper 50th percentile of the Segment 5 Hydro CSs represented a higher OS, with a logrank p=0.0175 (Figure 4C). And, the upper 50th percentile of the Segment 4 Combo CSs represented a higher OS, with a logrank p=0.0286 (Figure 4D).
Survival distinctions represented by breast cancer (BRCA) primary tumor immunoglobulin (IG) CDR3-CTSB protease sensitivities and complementarity scoring. (A) Kaplan-Meier (KM) analysis for BRCA cases representing the upper and lower 25th percentiles, based on the cathepsin B (CTSB) protease sensitivity of the primary tumor IG CDR3’s, in turn based on the SitePrediction algorithm. Upper 25th percentile sensitivity scores, black; bottom 25th percentile sensitivity scores, gray logrank comparison p=0.0850 (Table I). (B) KM analysis for BRCA cases based on the primary tumor, IG CDR3-CTSB Segment 2 Electrostatic complementarity score (CS) calculations. Upper 50th percentile CSs, black; bottom 50th percentile CSs, gray; logrank comparison p=0.0098. (C) KM analysis for BRCA cases based on the primary tumor, IG CDR3-CTSB Segment 5 Hydro CS calculations. Upper 50th percentile CSs, black; bottom 50th percentile CSs, gray; logrank comparison p=0.0175. (D) KM analysis for BRCA cases based on primary tumor, IG CDR3-CTSB Segment 4 Combo CS calculations. Upper 50th percentile CSs, black; bottom 50th percentile CSs, gray; logrank comparison p=0.0098.
Discussion
Results above indicated that TCGA-BRCA tumor-resident IG CDR3s with a higher level of MMP2 binding represented a higher overall survival (OS) probability. However, LUAD primary tumor, IG CDR3-MMP2 interactions, as defined by the indicated bioinformatics approaches, did not represent a survival distinction.
As noted in the Introduction section, there is a great deal of evidence indicating that high level MMP2 expression is associated with a worse breast cancer outcome. Extensive and in some cases significantly successful efforts have been made to blunt protease actions in the cancer setting. However, to authors’ knowledge, this is the first indication that patients may vary in the immune response to protease activity. Such an immune response may be adventitious, in the sense that CDR3s or other IR features may lead to a better recognition of the proteases due to protease AA structures similar to pathogens to which a patient had been previously exposed. Or, the anomalous expression of proteases associated with cancer could specifically lead to an immune response.
Regardless of the source of the potential anti-MMP2 CDR3s, there is the potential for IG CDR3 connection to MMP2 to represent a biomarker for breast cancer aggressiveness.
It is important to keep in mind that the preceding analysis was based on retrospective data and based on correlations. Thus, possible next steps could be a clinical trial to determine whether the association of IG CDR3-MMP2 binding relationships, based on bioinformatics assessments, would be confirmed when the confounding variables of a prospective clinical trial could be minimized. Also, the CDR3 AA sequences identified here (Table S7) via bioinformatics based, MMP2 binding approaches, could be tested in vitro for MMP2 binding. In particular, this latter approach could lead to the identification of CDR3 AA sequences that are good candidates for designing anti-MMP2 therapies, for example cyclic CDR3s (30) or partial or full antibody molecules with the CDR3s pre-screened as in this report and with the further in vitro assessments.
Acknowledgements
Authors wish to acknowledge the support of USF research computing; and Corinne Walters, for extensive administrative support related to NIH dataset access; and the taxpayers of the State of Florida.
Footnotes
Supplementary Online Material (SOM)
SOM can be accessed at the following link: https://usf.box.com/s/n2czsrvikk5azifpe8o4xhk1hyln3ylp
Conflicts of Interest
The Authors declare that they have no conflicts of interest.
Authors’ Contributions
SRM: Formal analysis; Methodology; Visualization; Writing - review & editing. AJT: Conceptualization; Formal analysis. ECG: Conceptualization; Methodology; Software. DNP: Formal analysis; Methodology. AC: Methodology; Software. BIC: Methodology; Software. GB: Methodology; Project administration; Resources; Supervision; Writing - review & editing.
- Received January 30, 2023.
- Revision received March 4, 2023.
- Accepted March 16, 2023.
- Copyright © 2023, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC-ND) 4.0 international license (https://creativecommons.org/licenses/by-nc-nd/4.0).