Abstract
Background: Replication impediments can produce helicase-polymerase uncoupling allowing lagging strand synthesis to continue for as much as 6 kb from the site of the impediment. Materials and Methods: We developed a cloning procedure designed to recover fragments from lagging strand near the helicase halt site. Results: A total of 62% of clones from a p53-deficient tumor cell line (PC3) and 33% of the clones from a primary cell line (HPS-19I) were within 5 kb of a G-quadruplex forming sequence. Analyses of a RACK7 gene sequence, that was cloned multiple times from the PC3 line, revealed multiple deletions in region about 1 kb from the cloned region that was present in a non-B conformation. Sequences from the region formed G-quadruplex and i-motif structures under physiological conditions. Conclusion: Defects in components of non-B structure suppression systems (e.g. p53 helicase targeting) promote replication-linked damage selectively targeted to sequences prone to G-quadruplex and i-motif formation.
Non-B DNA structure formation must be suppressed during replication in order for replication to proceed properly. Many such structures occur at DNA sequences that exhibit GC-skew where cytidine residues are largely confined to one strand and guanidine residues are largely confined to the other. In addition to R-Loops (1), these sequences are capable of forming G-quadruplexes (2, 3) triple helices (4) and i-motifs (5). At physiological pH, certain sequences are capable of forming several different structures (6, 7), however, the most commonly seen structures in these regions are the R-Loop, G-quadruplex and, less frequently, the triple helix (7) and the i-motif (8). DNA damage associated with R-loop formation is generally associated with transcription blocks (9, 10), but can be associated with replication. There is also considerable evidence that G-quadruplex formation can cause both epigenetic (11, 12) and genetic damage associated with DNA replication when G-quadruplex suppression systems are compromised (13-15). Here it is important to point out that G4-seq (60) results confirm the earlier report (16) that the frequency of G-quadruplex forming sequences increases with biological complexity suggesting that these sequences have been under positive selective pressure during evolution of higher organisms. To facilitate this increased frequency, the systems that promote replication through G-quadruplex structures or unwind these structures during replication and transcription of DNA appear to have evolved increased sophistication and redundancy in higher organisms. For example the Y polymerases REV1, Pol κ and Pol η are known to process G-quadruplex motifs when replication is slowed by them (17). Further, several helicases including Pif1 (15), and the superfamily 2 helicases WRN (18), BLM (19) and FANCJ (20) have all been shown to unwind G-quadruplex DNA structures in vitro. Even so, WRN (21), BLM (22), Pif1 (23), and FNACJ (24) are also capable of unwinding RNA:DNA hybrids in vitro.
In one of the most carefully studied instances, loss of function in the FANC J homolog dog-1 in C. elegans, produces selective deletion mutagenesis at sites where persistent G-quadruplex and i-motif structures can occur (14). In this case, deletion was apparently produced via a polymerase theta end-joining mechanism (25, 26). These data and those of Paeschke et al. (15), suggest that when non-B suppression is compromised the resulting structures can produce strong impediments to leading strand progress during replication. Leading strand impediments are known to cause Cdc45-Mcm2-7-GINs (CMG) helicase complex and lagging strand synthesis to become uncoupled from leading strand synthesis (27), permitting lagging strand synthesis to continue for an extended distance beyond a stalled replication fork (28). We hypothesized that the structure formed due to non-B structure induced uncoupling could be exploited to identify sites of recurring non-B structure. As can be seen in Figure 1, short double-stranded DNA fragments originating from the lagging strand at or near the CMG halt site should be produced after shear fragmentation or mild chemical fragmentation of genomic DNA.
In this report, we tested this hypothesis using ligation-mediated PCR to clone spontaneously- and chemically-induced DNA fragments from isolated genomic DNA and then mapped the isolated sequences to the human genome. A comparison of cloned sequences from a primary prostate stromal cell line (HPS-19I) with a normal karyotype with those of a prostate tumor cell line with a heavily rearranged karyotype (PC3) showed that the tumor cell line yielded about twice as many clones adjacent to a sequence with G-quadruplex forming potential as the normal cell line. Further, overlapping clones originating from the same genomic sequences were obtained only in the tumor cell line. One sequence, which was obtained multiple times in the screen of the tumor cell line was examined in detail. A region upstream from the cloned sequences, was shown to be accessible to bisulfite modification under non-denaturing conditions indicating that it was in a non-B conformation in genomic DNA (29). In vitro analysis of oligodeoxynucleotides corresponding to the site of chemical attack showed that a stable G-quadruplex structure can form on the G-rich strand and that an i-motif structure can form on the C-rich strand under physiological conditions. Sequencing analyses of the region showed that the sequence was subject to both deletion as seen in C. elegans (14), as well as apparent slippage and deamination mutagenesis downstream from the site of quadruplex formation. Models of processes that can generate these effects are discussed.
Materials and Methods
Cell culture and DNA isolation. Human prostate cancer cells (PC3) were cultured as previously described (30). Human prostate stromal cells (HPS-19I) were a gift of Prof. David R. Rowley Baylor University College of Medicine, Houston, TX, USA. They were cultured as described (31). DNA was isolated from each cell line using the QIAamp® Blood Mini kit (Qiagen Sciences, Frederick, MD, USA) using the instructions supplied by the manufacturer.
Karyotyping. Cells were harvested, fixed with Carnoy fixative, dropped onto glass slides and aged overnight at 60°C. Following standard GTG banding, 50 metaphases were captured with GenASIs Bandview software (Applied Spectral Imaging, Carlsbad, CA, USA). A 24-color SKY spectral karyotypic was performed, using a standard protocol. Briefly, after RNAse and Pepsin treatment, DNA in metaphases was denatured at 70°C for 2 min followed by three ice-cold ethanol washes with 70%, 80%, and 90% ethanol. Denatured SKY™ 24-color probes were hybridized to the slide preparation for 48 h at 37°C. After a wash with formamide containing SSC and incubations with 1xSSC and 4xSSC, images were captured under Vectashield DAPI using the HiSKY system (Applied Spectral Imaging, Carlsbad, CA).
Preparation of random libraries with clones from uncoupled replication forks. Ligation-mediated PCR was used to amplify DNA fragments spontaneously generated during DNA isolation, or after bisulfite treatment of genomic DNA at 37°C (29) to promote chemical breakage of the DNA (32) at single stranded regions. Under these conditions, bisulfite cannot attack duplex DNA so the final treatment with sodium hydroxide was omitted to preserve duplex structure.
Two DNA preparations from each cell line were treated with CIP (Calf Intestinal Phosphatase, New England Biolabs, USA) in a 30 μl reaction containing 10 U of CIP, in 50 mM Potassium Acetate, 20 mM Tris-acetate 10 mM Magnesium Acetate 100 μg/ml BSA pH 7.9 for 1 h at 37°C to prevent ligation concatamers. However, concatemers were not observed when CIP treatment was omitted, indicating that CIP treatment was unnecessary. Two DNAs were employed to form the duplex linker: Linker_1 (5’-AGAAGCTTGAATTCGAGCAGTCAG-3’) was annealed to linker_2 (5’-CTGCTCGAATTCAAGCTTCT-3’). A total of 2.5 μl of 100 μM of each oligodeoxynucleotide was brought to a final volume of 45 μl and incubated for 2 min at 94°C, 5 min at 70°C, and for 5 min at 50°C. The duplex thus formed was allowed to cool to room temperature. The duplex was then diluted to a final volume of 250 μl for a 1 μM duplex linker and ligated to 15 μl isolated DNA that had been treated with of T4 polymerase to repair ends for blunt end ligation to the linkers by using 5 units of T4 ligase (New England Biolabs) and 1.5 μl 10x ligase buffer (New England Biolabs), 1 μl 1 μM Linker, to a final volume of 15 μl. The reaction was then incubated overnight at 4°C. The ligated DNA was then treated with the Qiagen Reaction cleanup kit, and the DNA was eluted in 15 μl of pure water. The eluted DNA was then used for PCR, with 0.25 U of Hotstar Taq (Qiagen), 10 μl of 10x Taq Buffer (Qiagen), 8 μl 25 mM MgCl2, 1.6 μl of 10 mM dNTPs (Roche), 1 μl 100 μM linker 2 in a final volume of 100 μl. The PCR conditions were 55°C 2 min, 72°C 5 min, 94°C 10 min, 24 cycles of 94°C 1 min, 55°C 1 min, 72°C 1 min, and then 72°C for 5 min, with a 4°C hold. After amplification, 2 μl of the PCR reaction was cloned using the TOPO® TA cloning® Kit (Invitrogen, Waltham, MA, USA) following the manufacturer's instructions. The resulting colonies were grown in liquid culture, and plasmids were isolated and sequenced as previously described (11).
Cleavage frequency. A scanned profile of the molecular weight distribution of purified PC3 DNA separated by capillary electrophoresis was obtained by using a DNA 7500 microfluidic chip (Agilent Technologies). The number average molecular weight of the DNA preparation was determined from the scan as previously described (30).
Direct PCR from the RACK7 region (bisulfite-treated PC3 DNA). Each of the colored regions in Figure 2 were amplified from PC3 cell line DNA after non-denaturing bisulfite treatment (29) to detect possible open DNA structures. In each case, 1 μg of native DNA from the cell line was treated with bisulfite using the EZ DNA Methylation Kit (Zymo Research), also as previously described (32), except that the initial sodium hydroxide denaturation step was omitted, and the temperature of the 16 h incubation period was reduced from 55°C to 37°C so as to preserve the native secondary structure of the isolated DNA. Initial exposure of the DNA was to the bisulfite reagent adjusted to pH 5.3, as described in (11). The reaction was cleaned up by use of the Zymo spin columns. The manufacturer's instructions were followed exactly in these experiments. The bisulfite-treated DNA was then eluted in 15 μl of pure water, and all of it used for PCR amplification. Pretreatment of DNA with RNAse H was carried out in some cases as described in (33).
PCR Primers for the region 3’ to the cloned region (Bright Green in Figure 2) were as follows: Forward: 5’-CGTTCCATTGGAACAGCACCCTGACTATG-3’; Reverse: 5’-ATTTATGGTTTGGGAGTTATTGGGGAAGAC-3’. PCR Primers for the Cloned Region (Yellow in Figure 2) were: Forward 5’-ACTCCTGAGAGTACTGTACAG-3’; and Reverse: 5’-GTCTACCTTCTAATTCTGCCT-3’. PCR primers for the region 5’ to the cloned region (Turquoise in Figure 2) were: Forward 5’-GGTTGGCCAGTTGTGTACCTCACAGTGG-3’; and Reverse 5’-GCTCCAGCCTGGGCAACAAAGCAACAGT-3’. Forward and reverse primers were used to form a solution containing 9 μM of each primer. PCR amplification master mix contained with 0.25 U of Hotstar Taq (Qiagen), 2.5 μl of 10x Taq Buffer (Qiagen), 0.95 μl 5X Q solution (Qiagen) 2 μl 25 mM MgCl2, 0.8 μl of 10 mM dNTPs (Roche) and 2.5 μl 9 μM primers to obtain a final volume of 9 μl. Each reaction contained 9 μl master mix and 200 ng of target DNA brought to a final volume of 25 μl with molecular Biology Water (Sigma). The PCR conditions were 95°C 10 min, 35 cycles of 95°C 1 min, 64.5°C 1 min, 72°C 1 min, for 35 cycles and then 72°C for 5 min, with a 4°C hold.
Direct sequencing of the RACK7 region (native DNA). DNA from the 5’-region (Turquoise in Figure 2) of RACK7 was amplified without bisulfite treatment from PC3 DNA specimens as follows: PCR primers for this region were: Forward: 5’-GGTTGGCCAGTTG TGTACCTCACAGTGG-3’; and Reverse 5’-GCTCCAGCCTGGGCAACAAAGCAACAGT-3’. PCR reactions contained 12.5 μl 2X KAPA2G Fast Multiplex mix (Kapabiosystems, Wilmington, MA, USA), 0.55 μl 9μM of each primer, 200ng target DNA, made up to a final volume of 25 μl with Molecular Biology Water (Sigma). PCR cycling conditions were 95°C for 3 min, followed by 35 cycles of 95°C for 1 min, 60°C for 30 sec, 72°C for 38 sec followed by a final extension at 72°C for 26 sec.
Cloning and sequencing of RACK7 region PCR products. DNA amplicons from 20 μl of each RACK7 region PCR reaction were separated on 2% agarose gels (40 mM Tris-Acetate, 10 mM EDTA pH 7.8) and stained with 4 μg/ml Ethidium Bromide and destained in water. Excised fragments of the expected length (710 bp for the 3’ turquoise region in Figure 2, 640 bp for the yellow cloned region in Figure 2, and 430 bp for the bright green 5’-region in Figure 2) were purified using the Qiagen Gel Extraction Kit as described by the manufacturer. Lastly, 2 μl of the purified amplicon was cloned using the TOPO® TA cloning® Kit (Invitrogen, Waltham MA, USA) following the manufacturer's instructions. The resulting colonies were grown in liquid culture, and plasmids were isolated and sequenced as previously described (11).
DNA sequence alignment software. Cloned DNA sequences in FASTA format we aligned against the reference sequence for RACK7 from Homo sapiens chromosome 20, GRCh38.p12 Primary Assembly, using two software packages: MAFFT (34) and T-Coffee (35) with similar results showing variable length deletions and TCC→TTC OR TAC mutations. Sequence data sets depicted in Figure 3 were constructed from MView (36) renderings of the sequence alignments obtained with MAFFT.
Circular dichroism and UV melting/annealing analysis of model oligodeoxynucleotides. CD spectra and melting and annealing data were collected as previously described (37). Briefly, oligodeoxynucleotide sequences (10 μM for CD, 2.5 μM for melting/annealing) were diluted in buffer containing 10 mM sodium cacodylate and either 100 mM or 25 mM potassium chloride at the specified pH, thermally annealed by heating in a heat block at 95°C for 5 min, and slowly cooled to room temperature overnight. CD data were recorded using a Jasco J-810 spectropolarimeter and values were corrected for the molar ellipticity reading at 320 nm. The transitional pH (pHT) of the i-motif was calculated from the inflection point of the fitted ellipticity at 288 nm. All UV melting/annealing experiments were carried out at pH 6.5 (i-motif) or 7.0 (G-quadruplex) and were recorded using a Jasco V-730 spectrophotometer. Melting (Tm) and annealing (Ta) temperatures were determined using the first-derivative method (37).
Results
A total of 96 clones were sequenced and mapped to chromosomal locations in the human genome: 43 from IPS19I and 53 from PC3 (Tables I and II). Motifs with non-B structure potential were observed within 5 kb of the cloned sequence in 60% of the clones from IPS-19I and 74% of the clones from PC3. However, clones within 5 kb of a sequence conforming to the G-quadruplex folding rule (2) were twice as frequent in PC3 compared to IPS19I (Table III). This was consistent with the karyotypes of the two cell lines showing that IPS-19I possessed a normal karyotype (Figure 4) suggesting a normal phenotype, while the hyperdiploid and heavily rearranged karyotype of the p53 deficient PC3 cell line (Figures 5 and 6) indicated that ongoing chromosomal damage might be occurring in this tumor cell line. Moreover, clones containing a chromosomal rearrangement (a fusion between Chromosome 20 and Chromosome 7), and clones originating from the same chromosomal location were observed only in the PC3 cell line (Table II).
In the PC3 cell line, two clones mapped to the same chromosomal location on chromosome 7, two clones mapped to the same location on chromosome 11 and seven clones mapped to the same chromosomal location in an intron of the RACK7 gene on chromosome 20 (Figure 2). Four of the seven clones were obtained from untreated DNA. Three of the seven clones were obtained from DNA that had been treated with sodium bisulfite to facilitate breakage. We estimated the probability that clones could originate from the same location by random shear as follows. Using the method also described in (30) we determined that spontaneous cleavage frequency for the PC3 DNA preparation to be 1/8800bp from the scanned molecular weight profile for purified PC3 DNA used in the preparation of the PC3 DNA library (not shown).
For a genome of length Λ, it has been shown (38, 39) that the weight fraction (FW) between DNA fragment lengths L1 and L2 that is produced by random breakage at a frequency f is given by: Since the procedure used for pCR™2.1-TOPO® cloning is designed to clone fragments less than 1,000 bp, the inserts we observed were all in this range. The weight fraction predicted by random breakage in this range for a cleavage frequency f=1/8800 is about 0.006 and could represent almost 1% of the genome. Thus, the twenty independent isolates obtained without bisulfite or CIP treatment could be produced by random breakage. At the measured cleavage frequency, a hyper diploid PC3 genome (Figure 5) comprising about 7.5×109 bp of DNA would have at least 2×1011 fragments/μg DNA. Of these, about 1.7×109 bp representing a random sampling of the genome would be in the size range clonable by pCR™2.1-TOPO®. Consequently, each of the single isolates could be the result of random cleavage unless biologically frangible sites associated with replication stress (e.g. non-B sequence potential) are more frequent.
On the other hand, sequences obtained more than once cannot be the result of random cleavage. This is most clearly seen for the case of the sequences four sequences obtained from untreated DNA that appear to have been produced by random shear from RACK7 gene. In this case four independently isolated overlapping fragments originated from the same 656 bp region of RACK7. The weight fraction of the PC3 genome that falls between 0 bp and 656 bp for randomly cleaved DNA is expected to be 0.0026. Since the cloned sequences all fall within a 656 bp region of RACK7, if we assume that there are only 6 copies of the region present in the PC3 hyper-diploid genome based on Giemsa (Figure 5) and SKY Spectral analysis (Figure 6), then the random probability of cloning fragments ranging in size from 0 bp to 656 bp from that 656 bp region of the RACK7 gene sequence with 100% cloning efficiency is given by:
The probability of cloning it m times in n trials is given by: For 4 clones in 20 trails p=1.2×10−23
The observed cloning frequency of 4/20=0.25 clearly rules this out.
Given that the probability that random breakage during DNA isolation would produce multiple clones from the same site is vanishingly small (see Materials and Methods) it is likely that this region represents a site of recurring DNA damage that renders it particularly frangible, and thus subject to repeated isolation by the cloning scheme used here. Further, six of the seven clones had a common breakpoint its 3’ end and a randomly placed break point at its 5’ end (Figure 2) suggesting that a recurring biologically induced gap occurs at the 3’ site and that the clonable duplex fragments were produced by hydrodynamic shear during DNA isolation or by bisulfite induced breakage at the gap in native DNA prior DNA isolation.
Regions that persist as non-B structures in native DNA have been shown to be accessible to chemical modification (6, 40, 41). When we used published methodology (29) for the chemical deamination of cytosine residues by sodium bisulfite under non-denaturing conditions, we found that deamination only occurred in the region upstream of the cloned region (Turquoise in Figure 2) and not in the cloned region itself (Yellow in Figure 2) or the region downstream (Green in Figure 2) of the cloned region (data not shown). The sequence in this intronic region of RACK7 was not rearranged in the prostate cancer cell line, however the bisulfite accessible region contained deletions of different lengths (Figure 3). Variation among individual isolates of the region suggests that the deletion process may be ongoing and cumulative. Further deletions did not block bisulfite access since deletions were observed even in sequences containing heavily deaminated regions after treatment with bisulfite under non-denaturing conditions (Figure 3A). We also noted TCC→TTC or TAC mutations characteristic of the AID and APOBEC family of single-strand specific DNA cytosine deaminases downstream from the deletions in untreated DNA (Figure 3C).
Studies of dog-1 induced deletions in C. elegans (42) found that deletions are generally sharply confined to the 3’ end of the G-rich strand conforming to the general G-quadruplex folding rule (2) (G3+N1-7G3+N1-7G3+N1-7G3+). The data in Figure 3 display the C-rich strand from this region of the human RACK7 gene, since the extreme GC-skew in the region precludes extensive bisulfite mediated C→T deamination on the G-rich strand. Deletions tended to be more commonly confined to the 5’ end of the (TCCC)9 repeat of the C-rich strand (Figure 3). On the complementary G-rich strand the deletions are confined to what would be the 3’ end of the complementary (GGGA)9 repeat that conforms to the general G-quadruplex folding rule (2). Thus, the positioning of the deletions is similar to those observed in dog-1 deficient C. elegans.
Our cloning strategy was based the possibility that a variety of non-B structures could uncouple lagging strand replication at a leading strand impediment (Figure 1). Although non-B DNA structures can produce this effect hybrid structures like the R-Loop are also candidates. DRIP methods report a weak R-loop signal in this region in epithelial cells, but not in fibroblast or leukemia cell lines (43). RDIP-Seq reports weak R-loop signal in both epithelial and fibroblast cell lines (44), and surprisingly DRIPc reports weak R-Loop signal only on the minus strand in a pluripotent cell line (45). Even so, an R-Loop in RACK7 would have to violate the general rule that the nascent RNA strand is the G-rich strand in the region of GC-Skew (10, 46) because transcription of RACK7 produces a nascent C-rich strand RNA. Nevertheless, we tested the region for bisulfite sensitivity after pretreating the DNA with RNAse H. The intense bisulfite induced deamination seen on the C-rich strand under non-denaturing (Figure 3A) is only moderately reduced by pretreatment of the DNA with RNAse H (Figure 3B), suggesting that the open structure in this region is not completely abolished by treatment with RNAse H as is generally seen with R-Loops. The most intense deamination occurs within the (CCTG)8-CC-(TCCC)9 region of the repetitive element (Figure 3A and 3B) with or without RNAse H pretreatment. Assuming that each sequence is an independent isolate of a representative pool, analysis of the sequence sets produced by bisulfite mediated deamination with or without RNAse H pretreatment shows that the difference between the two sets is not statistically significant at the p<0.05 level in either one sided or two-sided t-tests.
In order to study the nature of the structures that can form in the region of bisulfite accessibility, we performed biophysical analysis on model oligodeoxynucleotides corresponding each strand from this region using circular dichroism (CD) and UV thermal melting/annealing analysis (Figure 7). The CD spectrum of the model oligodeoxynucleotide from the G-rich strand shows a strong positive signal at 263 nm, indicative of a parallel G-quadruplex structure (Figure 7B). Consistent with this observation, the UV melting/annealing experiments (Figure 7A) revealed a highly stable structure in physiological-like conditions, such that it would not melt under the conditions of the experiment. Consequently, we had to reduce the potassium cation concentration from 100 mM, similar to physiological conditions, to 25 mM. Only then were we able to observe a melting transition (Tm=81°C). This suggests the G-quadruplex is highly stable in conditions that mimic physiological pH and cation concentration. The data also clearly support the formation of a G-quadruplex associated with deletions at its 3’ end under physiological conditions. The stability of the structure indicated by its high Tm is consistent with its formation even within sequences that have undergone short deletion near its 3’ end, as evidenced by the bisulfite accessibility of the region in those the sequences carrying deletions depicted in Figure 3.
We also studied a model oligodeoxynucleotide from the C-rich strand for i-motif structures most commonly detected in C-rich strand sequences. Such structures involve C:C+ base pairs that are frequently most stable at acidic pHs. Consequently, we expected it to carry a lower potential for non-B structure formation under physiological conditions than is expected for the complementary G-rich strand. Analysis of a 37-mer from this region indicated this could form into an i-motif structure, indicated by a strong positive signal at 288 nm. Plotting the ellipticity of this peak at 288 nm against pH provided a transitional pH of 6.9, indicating that this sequence forms i-motif at neutral pH (Figure 7D). This structure also appears to be stable under near-physiological conditions (Figure 7C) with melting transitions at 37.6 and 41.6°C. Clearly both G-quadruplex and i-motif structures can potentially form in the same region, although previous biophysical experiments indicate that the formation of each structure is mutually exclusive (47) in a constrained duplex, suggesting that it may be that only one of the two structures would be present at this specific site at a given time.
Discussion
The model used in the cloning design (Figure 1) is based on evidence demonstrating that CMG uncoupling permits lagging strand synthesis to continue past a replication impediment for as much as 6 kb beyond the leading strand halt site (27, 28). The model is supported by the cloning results reported herein. Those results show that a sequence conforming to the G-quadruplex folding rule (2) within a region of GC-skew was present within 5 kb of the cloned sequence in 33% of the sequences cloned from the normal cell line and 62% of the sequences from the tumor cell line. Several studies (13-15) suggest that DNA Quadruplex formation can provide an impediment to DNA replication that can lead to DNA damage in cells compromised in one or more of the numerous systems that aid replication and repair at these structure prone sequences. The two-fold increase in G-quadruplex linked clones in the tumor cell line compared to the normal cell line could be associated with the absence of functional p53 in the tumor cell line (48). Functional p53 is important in binding at least one quadruplex resolving helicase (BLM) (19) and perhaps another (WRN) (18) at sites of DNA damage and repair (49, 50), where they appear to cooperate with FANCJ to resolve G-quadruplex replication impediments (51). However, both BLM and WRN have also been shown to resolve RNA:DNA hybrids in vitro (21, 22) so the p53 lesion in the PC3 cell line does not exclude the possibility that a fraction of the sites we have cloned reside near replication impediments caused by R-Loops.
In addition to Quadruplex forming structures, other non-B structures that are not characterized by GC-skew are capable of causing DNA damage and influencing repair (52). Consistent with this possibility, our data showed that 60% of clones were within 5 kb of a sequence capable of non-B structure formation in the normal cell line while 74% of the clones from the tumor cell line were within 5 kb of a non-B structure forming sequence. Of particular interest are the representatives of the multiply cloned sequences from the tumor cell line. Only the repair compromised tumor cell line yielded multiple clones from the same region, suggesting that these sites represent recurring replication impediments in the tumor cell line. Consistent with this possibility seven overlapping clones from the same region of RACK7 were obtained from the tumor cell line. Furthermore, six of those seven clones carried an identical breakpoint at one end. Since this breakpoint (gray arrows in Figure 2) is about 1 kb 3’ to the site of the open structures detected with bisulfite treatment of native DNA (red region in Figure 2) it is also consistent with our cloning model where the common breakpoint would represent a site at or very near the uncoupled CMG halt site with the leading strand replication impediment at the site of the non-B structure.
The involvement of quadruplex structures at the putative replication impediment in RACK7 is suggested by the data in Figures 2 and 3, and the biophysical studies on the representative oligodeoxynucleotides from the region (Figure 7). First, nearly all sequences recovered from this region contain a deletion in the region of bisulfite accessibility in native DNA, and neither the deletions nor the RNAse H treatment fully block the formation of the open structure detected by bisulfite treatment of native DNA. Those sequences recovered from bisulfite treated DNA (with or without RNAse H pretreatment) for which bisulfite mediated deamination was detected retain a nearly full copy of the (GGGA)9/(TCCC)9 duplex capable of G-quadruplex and i-motif quadruplex formation.
In many cases of R-Loop formation the shortening of the duplex region caused by the RNA:DNA hybrid's A-Form structure is thought to bring regions of the looped out G-rich strand into juxtaposition where G-quadruplex formation can occur (53). Given the stability of the G-quadruplex it may persist after an R-Loop or transcription process has been resolved. In the case of the structure at RACK7 under study here, if the RNAse H results are interpreted to mean that non-canonical R-Loop forms with between the nascent C-Rich strand RNA transcript and the G-rich strand DNA generating a C-rich loop, then a quadruplex in the form the i-motif structure that is stable at neutral pH and physiological temperature at the (TCCC)9 sequence in the loop would again be involved. RNAse H treatment could remove the hybridized RNA while permitting the i-motif to persist in sequences with deletions that do not remove the (TCCC)9 region. Alternatively the i-motif structure in the loop might slow RNAse H action. Here again the data are consistent with our cloning model where the putative i-motif and an R-Loop provide the generic replication impediment depicted Figure 1.
Additional support for the single stranded nature of the C-rich strand comes from the detection of TCC→TTC or TAC transitions downstream from the deletions. Transition mutations of this type are characteristic of the APOBEC family (54), and it is reasonable to suspect that AID, and APOBEC family of deaminases may be attacking single-stranded DNA in the cells that contain these extensively remodeled sequences. Moreover, these enzymes are known to prefer low pH (55) and single-stranded DNA. The observation is also consistent with the preference of APOBEC for single-stranded DNA at replication forks and DSBs (56), and the preference of the AID DNA deaminase for single-stranded DNA formed near G-quadruplex model oligodeoxynucleotides (57). Clearly, deletion of the quadruplex forming sequences would preclude their formation, but it is also important to note that dC→dU mutagenesis would have a similar effect. For example, the i-motif is destabilized by the introduction of a damaged base (37).
Finally, in nearly all aspects, the data from the human RACK 7 region mirrors the data obtained from dog-1 deficient C. elegans (42, 58). Their data rules out XPF-1 mediated excision repair or MUS-81 linked homologous recombination repair in favor of a bypass mechanism (28, 59) or a replication fork merging mechanism that would generate a persistent gap on the leading strand opposite putative G-quadruplex (42, 58). Subsequent replication of the region would generate double stand breaks that they showed required DNA polymerase theta (25) end joining repair to create the observed deletions (26). Although the enzymatic properties of DOG-1 do not appear to have been studied in detail, human FNACJ is known to unwind RNA:DNA hybrids (24). Moreover, R-Loops have been observed to impair replication in C. elegans (60). Hence a very similar mechanism in which dog-1 mutants fail to resolve R-Loop replication impediments is not ruled out by the data on deletions in regions conforming to the G-quadruplex folding rule in C. elegans. Models for the genesis of deletions at RACK7 also include bypass as well as template switching mechanisms that can generate the deletions associated with replication impediments at sites of non-B structure formation (Figure 8).
Comparison with G4-seq. G4-seq (61, 62), provides an effective way of globally mapping G-quadruplex sequence potential, by using high-throughput next generation sequencing to detect sites that can adopt G-quadruplex structures when single-stranded DNA is exposed to K+ and G-quadruplex stabilizing ligands during sequencing in vitro. On the other hand, G4-seq does not offer a means for determining which of the sequences identified in vitro actually play a role in vivo. In this regard it is important to compare the number of sequences identified by G4-seq with the immunohistochemistry results demonstrating the presence of G-quadruplex in the nucleus of mammalian cells (63). In those experiments the highest number of fluorescent foci in the nucleus was seen during S-phase, consistent with replication dependent formation of G-quadruplex (63). Even so, only about 35 foci/nucleus were detected at any one time in S-phase. G4-seq returns more than 716,000 distinct genomic sites capable of G-quadruplex formation (61). Consequently, even if the fluorescent foci represent replication factories (64) containing several hundred growing replication forks, less than 2% of the sequences identified with G4-seq would be present at any one time during replication and most of these would be resolved properly by G-quadruplex suppression systems.
The approach we describe herein offers a much-needed method for supplementing G4-seq data by directly detecting those quadruplex forming sequences that have blocked DNA polymerase progression in vivo in a given cell type. Moreover, while G4-seq is focused on G-quadruplex, and our results support the idea that G-quadruplex formation is a major contributor to replication stress, our method also has the potential to identify other non-B structure forming sequences that induce replication stress linked helicase uncoupling.
Finally, G4-seq requires access to next-generation sequencing, which may not be available to many labs at reasonable cost, while the method we describe can be performed with simple low-cost cloning and sequencing tools.
Conclusion
All of the data presented above are consistent with the cloning model depicted in Figure 1. Moreover, elevated frequency of cloned sequences adjacent to sites of G-quadruplex forming potential in the repair compromised tumor cell line supports the idea that endogenous G-quadruplex and i-motif formation is an important source of genetic instability in this cell line. Further, the detailed analysis of the site within RACK7 from the tumor cell line strongly suggests that deletions and mutations are linked to recurrent quadruplex DNA formation at that site. This suggests that the cloning procedure we describe permits the in vivo identification of sequences that often produce DNA polymerase-helicase uncoupling during replication where the intercession of damage prone repair processes is required. As such, the method provides a valuable approach to identifying those sequences with G-quadruplex potential that actually produce replication impediments in vivo.
Acknowledgements
This work was supported by: The Biotechnology and Biological Sciences Research Council (BB/L02229X/1) to Z.A.E.W, by a grant 5R01-CA102521 to S.S.S. from the U.S. National Cancer Institute of the National Institutes of Health, and by the Ensign Foundation. E.F.W was supported by a Wellcome Trust grant (204515/Z/16/Z). Research reported in this publication also included work performed in the Integrative Genomics and Bioinformatics Core and the Cytogenetics Core supported by the National Cancer Institute of the National Institutes of Health under award number P30CA033572. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Footnotes
Authors' Contributions
C. A., J. C., V. B., J. M., M. M., E. W. and M. A. conducted experiments, F.P. designed experiments, Z.A. E. W. designed and conducted experiments, S. S. S. designed and conducted experiments and wrote the paper.
This article is freely accessible online.
Conflicts of Interest
None to be declared.
- Received January 7, 2020.
- Revision received January 25, 2020.
- Accepted January 27, 2020.
- Copyright© 2020, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved