Article Text

Common variants in mismatch repair genes and risk of colorectal cancer
  1. T Koessler1,
  2. M Z Oestergaard1,
  3. H Song1,
  4. J Tyrer1,
  5. B Perkins1,
  6. A M Dunning1,
  7. D F Easton2,
  8. P D P Pharoah1
  1. 1
    Strangeways Research Laboratory, Cancer Research UK Department of Oncology, University of Cambridge, Cambridge, UK
  2. 2
    Cancer Research UK, Genetic Epidemiology Group, Strangeways Research Laboratories, Cambridge, UK
  1. Dr Thibaud Koessler, Strangeways Research Laboratory, Cancer Research UK Department of Oncology, University of Cambridge, Worts Causeway, Cambridge CB1 8RN, UK; Thibaud{at}srl.cam.ac.uk

Abstract

Background and aim: The mismatch repair (MMR) genes are in charge of maintaining genomic integrity. Mutations in the MMR genes are at the origin of a familial form of colorectal cancer (CRC). This syndrome accounts for only a small proportion of the excess familial risk of CRC. The characteristics of the alleles that account for the remainder of cases are unknown. To assess the putative associations between common variants in MMR genes and CRC, we performed a genetic case–control study using a single-nucleotide polymorphism (SNP) tagging approach.

Patients and methods: A total of 2299 cases and 2284 unrelated controls were genotyped for 68 tagging SNPs in seven MMR genes (MLH1, MLH3, MSH2, MSH3, MSH6, PMS1 and PMS2). Genotype frequencies were measured in cases and controls and analysed using univariate analysis. Haplotypes were constructed and analysed using logistic regression. We also carried out a two-locus interaction analysis and a global test analysis.

Results: Genotype frequencies were found to be marginally different in cases and controls for MSH3 rs26279 with a rare homozygote OR = 1.31 [95% confidence interval (CI) 1.05 to 1.62], ptrend = 0.04. We found a rare MLH1 (frequency <5%) haplotype, increasing the risk of colorectal cancer: (OR = 9.76; 95% CI, 1.25 to 76.29; p = 0.03). The two-locus interaction analysis has exhibited signs of interaction between SNPs located in genes MSH6 and MSH2. Global testing has showed no evidence of interaction.

Conclusion: It is unlikely that common variants in MMR genes contribute significantly to colorectal cancer.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

In the UK, colorectal cancer (CRC) is the third most prevalent cancer after lung and breast cancer. Each year, almost 35 000 new cases are diagnosed with 16 000 deaths from the disease.1 Like other complex diseases, the aetiology of CRC is linked to both environmental and hereditary factors, with environmental factors contributing two thirds of the variance of the risk of developing CRC and inheritable factors one third.2

The known familial syndromes – familial adenomatous polyposis (FAP), caused by germline mutations in APC3 and hereditary nonpolyposis colorectal carcinoma (HNPCC), caused by germline mutations in the mismatch repair (MMR) genes4 – account for less than 5% of all cases and only a small proportion of the excess familial risk of CRC.5 The characteristics of the alleles that account for the remainder are unknown. Further high penetrance alleles, if they exist, are likely to be very rare, as the majority of multiple case families can be explained by the known genes. However, a range of genetic models is plausible from a handful of modest risk alleles to a very large number of small risk alleles: the so-called polygenic model of complex disease.6 7 The existence of common CRC susceptibility alleles has recently been confirmed by the initial results from two whole genome association studies (rs6983267 and rs10505477 in the region 8q24.21).8 9

The MMR pathway, first described in bacteria, is involved in the maintenance of genomic integrity. The MMR proteins are responsible for correcting base substitution mismatches and small insertion–deletion mismatches generated during DNA replication. There are seven MMR genes in humans: MLH1, MLH3, PMS1, PMS2, MSH2, MSH3 and MSH6.10 These genes are distributed over five chromosomes: MSH6, MSH2 and PMS1 are on chromosome 2, MLH1 is on chromosome 3, MSH3 on chromosome 5, PMS2 on chromosome 7, and MLH3 on chromosome 14. The genes range in size from ∼24 kb (MSH6) to ∼222 kb (MSH3). A defect in function of one of these genes creates a mutator phenotype characterised by the frequent presence of point mutations across the genome, microsatellite instability (MSI), and loss of heterozygosity (LOH) at multiple loci.11 To date, MSI has been found in >90% of HNPCC and 15% of non-familial colorectal cancers.5 Germline mutations in MLH1, MSH2 and MSH6 are detected in the majority (>95% ) of HNPCC tumours with MSI.11 12 Recently, a rare variant in MLH1 was found to confer an increased risk of non-familial CRC.13 The presence of MSI has also been found in primary cancers at other sites including prostate,14 endometrial,15 pancreas,16 gastric17 18 and other cancers.19

The MMR system recognises and corrects DNA base pairing errors in newly replicated DNA. The recognition of mispaired DNA involves two different heterodimers: MutSα or MutSβ, each with specificity for different mutations. MutSα is composed of one molecule of MSH2 and one of MSH6. MutSβ consists of one MSH2 and one MSH3. The MSH2/6 pair recognises single base–base mispairs and 1–2 base insertion–deletion loop whereas the pair MSH2/3 recognises insertion–deletion loop >2 bases; at the same time it is able to recognise some single-base insertion–deletion loops, which confer to this last pair a redundancy with the first one.10 20 Repair enzymes are three heterodimers: MutLα, MutLβ and MutLγ. These systems interact with the MutS system at the site of the mismatch. MutLα is composed of MLH1 and PMS2; MutLβ, formed of one molecule of MLH1 and one of PMS1, has an unknown function; while MutLγ is made of MLH1 and MLH3 and is involved in repairing a subset of insertion–deletion mutations.10 20

Common variants in genes involved in this pathway may also alter risk of CRC. The aim of this study was to comprehensively evaluate common variation in the MMR pathway using a SNP tagging approach in a large case–control study.

MATERIAL AND METHODS

Study subjects

Our case–control study set comprised 2299 cases and 2284 unrelated controls from the SEARCH study. The SEARCH colorectal study is an ongoing study of colorectal cancer in the region served by the Eastern Cancer Registration and Information Centre in the UK (formerly East Anglian Cancer registry). The eligible patients are those aged between 18 and 69 years and diagnosed since 1996 with CRC. The case study set also contains 120 people aged over 69 years. The controls (n = 2284) are also from the SEARCH study,21 a population-based set recruited in East Anglia since 2003 with no history of cancer. Controls were identified from age–sex registries of ten representative general practices in the region (East Anglia) and were frequency matched to cases on age. We obtained informed consent from all participants.

Tag SNP selection

To select tagging SNPs (tSNPs) for genotyping we used data from the International HapMap Project (Release number 20#)22 database for the seven genes in the MMR pathway (MSH2, MSH3, MSH6, MLH1, MLH3, PMS1, PMS2). We used the data for the CEU population (90 individuals comprising 30 parent–child trios) with western and northern Europe ancestries. The appropriateness of this approach has been previously discussed.2325

The aim of the SNP tagging approach is to identify a set of tSNPs that efficiently tags all the known common variants [minor allele frequency (MAF) >0.05]. We aimed to tag all known common SNPs with a minimum value of r2 of 0.8 (where r is the pairwise correlation coefficient between SNPs). Some SNPs were poorly tagged by a single tSNP, but were well tagged by a combination of two or more (table 1). The use of multi-marker tags reduces the total number of tSNPs without a loss of tagging efficiency. tSNPs were selected using the TAGGER algorithm implemented in HAPLOVIEW26 (table 2).

Table 1 Efficiency of tag single-nucleotide polymorphism (tSNP) selection for each gene
Table 2 Haplotype tagging single-nucleotide polymorphisms (SNPs) analysis

Genotyping

All samples were genotyped using the Taqman 7900HT Sequence Detection System according to the manufacturer’s instructions. Each assay was carried out using 10 ng genomic DNA in a 5 µl reaction using Taqman Universal PCR Master Mix (Applied Biosystems, Warrington, UK), forward and reverse primers, and FAM- and VIC-labelled probes designed by Applied Biosystems (ABI Assay-by-Design). Primer and probe sequences and assay conditions used for each polymorphism analysed are available from the corresponding author on request. All assays were carried out in 384-well arrays with 12 duplicate samples in each plate for quality control. Where discordant genotypes were observed in duplicates, the genotyping was repeated. Genotypes were determined using Allelic Discrimination Sequence Detection software (Applied Biosystems). DNA samples that did not give a clear genotype result at the first attempt were not repeated. Hence, there are variations in the number of samples successfully genotyped for each polymorphism. Call rates ranged from 98.5 to 99.9% for all the SNPs and overall concordance between duplicate samples was 100%. DNA was extracted from blood samples by Whatman International (Ely, UK) using a chloroform/phenol method.

Statistics

Deviations of the genotype frequencies in the cases and in the controls from those expected under Hardy–Weinberg equilibrium (HWE) were assessed by χ2 tests [1 degree of freedom (df)] (supplementary table 1). The primary tests of association were the univariate analyses between each tagging SNP and colorectal cancer. Genotype frequencies in cases and controls were compared using χ2 tests for heterogeneity (2 df) and trend (1 df). Genotypic specific risks with the common homozygote as the baseline comparator were estimated as odds ratios (ORs) with associated 95% confidence intervals (95% CIs) (supplementary table 2).

We used the admixture maximum likelihood (AML) test27 as a single experiment-wise test. In brief, the AML method formulates the alternative hypothesis in terms of the proportion of SNPs (α) that are associated with disease and the effect size when an association exists (η). The parameters α and η can be estimated by maximum likelihood and the test statistic derived as a likelihood ratio test. The significance of this statistic was assessed using a permutation test.

In addition to the univariate analyses, we carried out specific haplotype tests for those combinations of alleles that tagged specific SNPs (table 1). We also carried out a general comparison of common haplotype frequencies in each gene haplotype block utilising the data from all the tSNPs in that block. Haplotype blocks were defined such that the common haplotypes (>5% frequency) accounted for at least 80% of the haplotype diversity. We considered haplotypes with greater than 2% frequency to be “common”. Rare haplotypes were pooled. For both specific haplotype marker tests and the general comparison of haplotype frequencies by haplotype block, haplotype frequencies and subject-specific expected haplotype indicators were calculated separately for each study using the program TagSNPs.28 This implements an expectation substitution approach to account for haplotype uncertainty given unphased genotype data. Subjects missing >50% genotype data in each block were excluded from haplotype analysis. We used unconditional logistic regression to test the null hypothesis of no association between specific multi-marker tagging haplotype and cancer, by comparing a model with terms for subject-specific haplotype indicator with the intercept-only model. The global null hypothesis of no association between haplotype frequency (by haplotype block) and cancer was tested by comparing a model with multiplicative effects for each common haplotype (treating the most common haplotype as the referent) to the intercept-only model. Haplotype specific odds ratios were also estimated with their associated confidence intervals (supplementary table 3).

The absence of significant single-locus main effects does not exclude the possibility of important interactions between the loci. We therefore investigated possible two-locus interactions using both a case–control and case-only approach (Oestergaard et al, unpublished). In the case–control analysis, all two-way SNP combinations were tested by comparing the likelihood of a logistic regression model with interaction effect and main effects (4 df) to the likelihood of a model with main effects (3 df). For the case-only analysis, all two-way combinations between SNPs on different chromosomes were tested while combinations on the same chromosome were tested only if the test for association in controls had a p-value of >0.1. A Pearson χ2 test is used under conditions when the distribution for the χ2 statistic approximates a χ2 distribution (4 df). Otherwise, Fisher’s exact test was used. The number of SNP combinations tested was 2346 with the case–control analysis and 1820 with the case-only analysis.

For the case–control analysis, the problem of multiple testing and calculation of an experiment-wise significance is addressed by permutation testing. The most significant test statistic derived from the original data set is compared to an empirical null distribution of the test statistic, which is created by permuting or shuffling the labels of cases and controls. The assumption is that the shuffling of the case–control label will break any possible association between genotype and the disease while maintaining the correlation structure of the genetic markers. The proportion of permutation samples in which the test statistic is at least as significant as the test statistic in the original data set is the significance level.

For the case-only analysis, p-values were compared to the Bonferroni correction threshold and to expected values with a probability plot. For the 1820 tests with the case-only analysis the Bonferroni threshold is 2.7×10−5. However, this threshold is very conservative as tests are highly dependent.

RESULTS

tSNPs selection for colorectal cancer association study

The HapMap data included 391 common variants (MAF>0.05) in total with a mean density ranging from 1.03 kb/SNP for MSH3 to 4.58 kb/SNP for PMS2. Seventy-five SNPs were chosen as tSNPs, of which seven failed assay design (table 2). These seven were not correlated with any other SNP and so alternative tags could not be selected. Thus 68 tSNPs were genotyped. These tagged the 323 remaining common SNPs with a mean r2 of 0.94. Ninety-five per cent of SNPs were tagged with r2>0.8 (table 2). These figures include tagging of 11 SNPs by seven combinations of tSNPs used as multi-marker tags.

Association analysis of SNPs and colorectal cancer

Supplementary table 1 shows the genotype frequency in cases and controls for the 68 successfully genotyped tSNPs. Six SNPs (rs2059520, rs13408008, rs330792, rs3136329, rs6463524, rs10040849) deviated from HWE in controls or cases (p<0.05). These are likely to be chance findings as none were highly significant and none deviated in both cases and controls. Moreover, the discrimination of genotypes for all six assays was good. SNPs tests for association and the genotype specific risks are shown in supplementary table 2. There was no difference in genotype frequency in CRC cases and controls for 65 of the 68 tSNPs.

Three tSNPs did show evidence for association: the rare allele of MSH2, rs1981928, was associated with a reduced risk of disease (phet = 0.052). The best fitting genetic model was recessive (OR tt v ta/aa = 0.76; 95% CI, 0.61 to 0.96; p = 0.021). The rare allele of MSH3 rs1979005 was also associated with a protective effect (OR = 0.41; 95% CI, 0.18 to 0.94; phet = 0.053). The best fitting genetic model was a recessive model (OR = 0.42, 95% CI, 0.18 to 0.95; p = 0.03). Finally, the rare allele of MSH3 rs26279 was associated with an increase risk of developing the disease (phet = 0.045). The best fitting model was again recessive (OR = 1.29; 95% CI, 1.04 to 1.57; p = 0.019). None of the seven SNP tagging multimarker haplotypes were associated with CRC (table 1).

The AML global test of association was not significant using heterogeneity test (p-value = 0.442) or trend test (p-value = 0.67).

Two-locus interaction

No evidence of interaction has been found using the case–control approach (data not shown).

Using the case-only approach, the distribution of the test statistics for the 1820 combinations deviated from that expected (supplementary fig 1). The 10 most significant two-locus interactions are listed in the supplementary table 4. The majority of these pairs of SNPs are located in the chromosome 2. The two most significant pairs were MSH6-rs3136326/MSH2-rs1981928 and MSH6-rs1800936/MSH2-rs13408008. However, these two MSH6 SNPs are correlated with each other (r2 = 0.57) and the two MSH2 SNPs are correlated with each other (r2 = 0.41). This suggests that both pairs of interacting SNPs are reporting the same interaction. Only MSH6-rs3136326/MSH2-rs1981928 had a significant interaction after Bonferroni correction. Supplementary table 5 shows the risks of CRC (ORs) for the possible genotype combinations. Individuals who carry two copies of the common allele at MSH6 and two copies of the rare alleles at MSH2 were at reduced risk (OR = 0.70, 95% CI (0.53 to 0.92), p = 0.01). We used the double common homozygote as a baseline.

Global haplotype analyses

There were 10 haplotype blocks in total. Five out of the seven MMR genes fell within a single block (MSH2, PMS1, PMS2, MLH1 and MLH3). The other two genes, MSH6 and MSH3, have, respectively, two and three blocks (supplementary table 6). The results of the comparison of haplotype frequencies in cases and controls for each block are presented in supplementary table 3. No significant difference was found for the nine blocks covering MLH3, PMS1, PMS2, MSH2, MSH3 and MSH6. However, MLH1 was significant at the 5% level (p5df = 0.049) (OR = 9.76; 95% CI, 1.25 to 76.29; p = 0.03).

The global test for MLH1 was significant at the 5% level. This appeared to be due to a substantial difference in the frequency of the pooled rare haplotypes which were carried by 10 cases and only one control. Six of these haplotypes were unique in the subjects but one haplotype was found to be carried by five cases and no control. It is possible that this haplotype carries a single rare variant that confers a moderate risk of disease.

DISCUSSION

This is the first comprehensive candidate gene study to systematically tag all the known common variants in the MMR genes and to test the tag SNPs for association with colorectal cancer susceptibility. We found little evidence for association of common variants in these genes (global test p value = 0.44).

Our study is well powered to detect alleles with modest effect at stringent levels of significance. For example, we have 90% power to detect a variant with a minor allele frequency of 10%, conferring a risk of 1.18 under a co-dominant model, with a type I error rate of 10−4. The power to detect recessive alleles is less.

The most plausible association was for MSH3 rs26279 a non-synonymous (threonine to alanine) variant in exon 23 (ptrend = 0.04) which could alter the function of the protein. Notably, rs26279 was also associated with colorectal cancer in a recent case–cohort study (rare homozygote, RR = 1.65; 95% CI, 1.01 to 2.70; p-value = 0.02). The same group also reported an association for rs184967,29 which was not significant in our study (phet = 0.23, ptrend = 0.14). Three SNPs showed borderline associations (including rs26279) but none were significant at stringent levels required for genetic association studies. The most likely explanation for these associations is chance.

The tSNP set captured most of the known common variation in these genes reasonably well. However, some SNPs were poorly captured and so true associations between these SNPs and disease may have been missed. For example, rs1799977 was selected as a tSNP but failed assay design and was not correlated with any alternative tSNPs. This singleton is a non-synonymous variant (Val219Ile) in MLH1. It lies in a transducer domain homologous to the second domain of the DNA gyrase B subunit and is important in nucleotide hydrolysis.

We found some evidence for a gene–gene interaction between SNPs in MSH2 and MSH6. We also found evidence for association of a rare haplotype in MLH1. However, this association was driven by the difference in frequencies of the pooled rare haplotypes which conferred a high risk (OR = 9.8) and had a combined frequency of 0.23% in cases and 0.03% in controls. It is possible that this haplotype is a marker for a single, rare, deleterious, high-penetrance mutation in our population. Both these findings need to be replicated in independent studies to be confirmed.

Recently, two genome-wide association studies8 9 released a handful of loci associated with colorectal cancer, but none of them is located at a reasonable distance from our genes (if on the same chromosome). Furthermore, because the first step of these studies is less powered than our study; it is unlikely that SNPs selected further than the first step will have the same characteristics as the one we targeted.

In conclusion, we have found little evidence to support the hypothesis that common variants in the mismatch repair genes are associated with low-to-moderate penetrance colorectal cancer susceptibility. The possibility of important rare alleles or two-locus interactions cannot be excluded.

This study was approved by the Eastern Multi-Centre Research Ethics Committee (Eastern MREC) on 28 November 2000.

Acknowledgments

We thank all those who participated in this study: the SEARCH investigators included Jean Abraham, Fiona Blows, Don Conroy, Gary Dew, Kristy Driver, Helen Field, Patricia Harrington, Craig Luccarini, Hannah Munday, Barbara Perkins, Mitul Shah and Judy West, all at the Department of Oncology, University of Cambridge. DFE is a Principal Fellow, and PDPP is a Senior Clinical Research Fellow of Cancer Research United Kingdom.

REFERENCES

Supplementary materials

Footnotes

  • Funding: This work was funded by Cancer Research United Kingdom. TK is funded by the Foundation Dr Henri Dubois-Ferriere Dinu Lipatti.

  • Competing interests: None.