Evolutionary Expansion of Structurally Complex DNA Sequences

STEVEN S. SMITH

Abstract

The observed number per base pair (i.e. the frequency) of G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊ motifs has increased rapidly in the eumetazoa for which complete genomic sequences are available. This increase appears to be under positive selective pressure since it exceeds the frequency expected for a random sequence genome in every case. Since the motif is capable of forming several non-B DNA structures including quadruplexes, triplexes and hairpins, the expansion has been enabled by the presence of systems capable of suppressing non-B DNA conformations during normal replication and repair and by the emergence of proteins that promote the formation of unusual structures at these sites. Positive selection for these motifs suggests that they are not merely associated with their negative effects on genome stability, but may be useful in increasing the number of structural states in nucleic acids that are available for the elaboration of epigenetic states.

The evolutionary progression among the eumetazoan animals is a progression that has generated increasing developmental complexity. The serial emergence of eumetazoans, bilaterians, protostomes, deuterostomes, chordates, vertebrates and mammals marks a progressive increase in developmental complexity and therefore a progressive increase in the underlying epigenetic potential of the respective genomes. The well-known expansion of genome size in this progression is consistent with evidence for combinatorial models of epigenetic complexity in which multiple inputs from regulatory proteins and regulatory RNAs in expanded genomes modulate transcription of structural and metabolic proteins to enhance epigenetic complexity (1, 2). While the evidence for gene regulatory networks and the attendant requirement for a general genome expansion and a general increase in transcription factors (3-5) is incontrovertible, additional modulating mechanisms are being recognized. For example, DNA methylation patterning in vertebrate genomes has been proposed to have important epigenetic functions in gene regulatory networks (6-8).

Non-B DNA structure potential has also been proposed as a component of epigenetic systems (9-21). Searches of genomic sequences from bacteria and partial sequences of the yeast and human genomes have suggested that the homopurine mirror repeats characteristic of the H-DNA triplex, cruciform and slipped DNA structures (Figure 1) are overrepresented in eukaryotic genomes (22).

Sequence elements defined as G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊ motifs residing on either DNA strand (14) carry the potential for G-quadruplex formation (Figure 1). In addition, they identify a subset of the imperfect homopurine mirror repeats capable of triplex formation (23) and C-strand i-motif formers (17) (Table I). Moreover, slipped and foldback structures are implicit intermediates in the formation of both quadruplex (24) and triplex (25) structures at these motifs. This report describes the prevalence of this motif in the eight eukaryotic genomes for which complete genomic sequences are currently available.

Materials and Methods

Evolutionary distance. No single bioinformatics approach that is based on nucleic acid or protein sequencing linking eukaryotic organisms in an evolutionary hierarchy has been universally adopted (26). When comparing sequenced eukaryotic genomes one is often tempted to employ a modern version of the Aristotelian concept of graded scale of existence (i.e. the scala naturae or the Great Chain of Being) as refined by Plotinus (27). Certainly, roundworms are less highly evolved than birds or mice, but quantifying the evolutionary distance between them based on genomic parameters is difficult. A variety of molecular clocks and paleonotlogical evidence (1, 28-30) tend to provide measures of evolutionary distance, and the molecular and paleontological time scales now tend to agree on the timing of the major evolutionary divergences (26, 31). In this report, the paleontological view based primarily on the fossil record was adopted as the measure of evolutionary distance (32-34).

Genomic searches, background data gathering. Genomic sequences for the different organisms were obtained through links provided by the NCBI website (http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome). Human sequence (Homo sapiens, reference build, release 2.1, build 36.3) and the mouse sequence (Mus musculus, build 36.1) were obtained from the NCBI site. Chicken sequence (Gallus gallus, release 2.1, build 2.1) was obtained from the WUSTL website (http://genome.wustl.edu/genomes/view/gallus_gallus/). Zebrafish (Danio rerio) and medaka (Oryzias latipes) sequences were obtained from the Ensembl FTP site (http://uswest.ensembl.org/info/data/ftp/index.html). The Drosophila sequence (Drosophila melanogaster, release 5.9) was obtained from Flybase (ftp://ftp.flybase.net/genomes/). C. elegans (Caenorhabditis elegans, WS170, build 7.1) was obtained from the Sanger project website (http://www.sanger.ac.uk/Projects/C_elegans/Genomic_Sequence.shtml). To calculate the quantity of each nucleotide, chromosomal sequences were opened using EditPad Pro (Just Great Software, Phuket, Thialand) and a search was made through each of the files. Microsoft Excel (Microsoft Corp., Redmond Washington, USA) was used to create the table with the total genome size and also used to calculate the GC fraction by dividing the number of G+C over G+C+A+T. Unknown bases, labeled N, were not included in this calculation but still counted towards the total genome size.

Figure 1.

Conformation space available at the G₃₊N_1-7G₃₊N1-7G3+N_1-7G₃₊ motif. Ribbon drawings of canonical non-B-DNA structures formed at G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊ are depicted: (A) B-DNA, (B) an intramolecular triplex linked by base triplet formation, (C) an asymmetrical hairpin linked by base pairing, (D) an intramolecular G-quadruplex linked by Hoogsteen paired tetrads, and (E) an intramolecular C-quadruplex (i-motif) linked by intercalated C:C+ pairs.

Data collection

Complete genome searches. All unusual structures counts were placed into Excel, and the frequencies were calculated by dividing the number of found structures by the total genome size, as determined previously. G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊ density was calculated using the Huppert algorithm (35) with the program Quadparser, available from http://www.quadruplex.org/. General quadruplex frequencies were calculated with the Maizels algorithm (36) with the program G4P, available from http://depts.washington.edu/maizels9/G4calc.php. Similar trends were detected with both algorithms. Data from Quadparser were depicted graphically.

Quadparser was placed in the same folder as the FASTA-formatted sequence files, and the program was executed through the command prompt. Output text files were then individually examined and the total number of unique quadruplexes was counted for each file. The G4P calculator used Microsoft.NET (Microsoft) and sequence files were used as input for the algorithm. The output was the density of regions that contained a quadruplex.

Tandem repeats density was calculated using Tandem Repeats Finder (TRF) (37), available from http://tandem.bu.edu/trf/trf.html. Alignment parameters for match, mismatch, and indel were set at 2, 7, and 7 respectively. Alignment scores above 50 were reported, and the maximum period size was 500. Sequences were opened and scanned by the program. Output files were opened and counts were tallied in Excel.

Inverted repeat density was calculated using Inverted Repeats Finder (IRF) (38), available from http://tandem.bu.edu/irf/irf.download.html. Default parameters for this program were used. Alignment parameters for match, mismatch, and indel were set at 2, 3, and 5 respectively. Matching and indel probabilities were 0.8 and 0.1, respectively. Alignment scores above 40 were reported, and the maximum period size was 2000. The program was executed through the command prompt and was placed in the same folder as the sequence files. Mirror repeat frequencies were also calculated using IRF.

Sampled genome searches. Because the algorithms for finding Z-DNA and palindromes take much longer to run on long sequences, random segments of the genome were sampled to determine the density of the entire genome. Fifty sequences totaling 1.5% of the entire genome were sampled. Numbers were generated using a random number generator via the process of converting atmospheric noise to numbers (available from http://www.random.org) and then mapped to the starting locations of the sequences. To verify the accuracy of this approach, results from random samples taken from smaller genomes were compared to those of the entire genome and were found to be within around 3% of the actual results.

View this table:

Table I.

Biological sequences containing the G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊ and observed structures. Both quadruplex and triplex forms have been reported for this motif. The table lists the forms observed in vitro and the associated reference.

Z-Hunt was used to determine the density of Z-DNA in each of the genomes (39), available from http://gac-web.cgrb.oregonstate.edu/zDNA/. Segments of the genome were uploaded to the server, which then processed the data and displayed the output file listing all the potential Z-DNA sequences. Output sequences were copied to a text file and the number of Z-DNA sequences was counted.

Palindromes were counted using the palindrome function in the EMBOSS package mEMBOSS (40), available from http://emboss.sourceforge.net/. Sequences were used as input to mEMBOSS for analysis. The minimum length of the palindrome was set at 15 and the maximum at 100 base pairs, with two possible mismatches allowed in the palindrome. The maximum gap between each of the palindromes was set to be 100. The program generated a text file containing the palindromes, and the total number was tallied in Excel.

Expected density. Expected densities for quadruplex, tandem repeats, inverted repeats, Z-DNA, mirror repeats and palindromes were determined by using a randomly generated sequence (with equal probability for G, C, A, T). Each letter was assigned to a number and the random numbers were generated using the aforementioned random number generator. 10 sequences of 1 million bp were generated, as well as 1 sequence of 10 million bp, for a total of 20 million bp. The sequences were then converted to FASTA format and run through each of the algorithms to determine the density. A normal distribution between the sequences was assumed, and the 99% confidence interval for the density of random sequences was determined from their standard deviation.

Data analysis. All data was input into Mathematica (Wolfram Research Inc., Champaign, IL, U.S.A.) and processed with Mathematica subroutines. Plots were generated with the ListPlot subroutine. The subroutines LinearModelFit or NonlinearModelFit were used to model the data and to obtain fitted parameter confidence intervals and R² values. Parameter confidence interval shading was obtained with the MeanPredictionBands subroutine.

Results

Available genomic sequence information allowed us to calculate genome size with a high degree of accuracy. In order to detect evolutionary trends, the data for the sequenced eukaryotic genomes were plotted on a geologic time scale that links the genome to its point of divergence from the main eumetazoan lineage based on paleontological dating from the fossil record (32-34). The data are plotted in Figure 2.

The same genomic sequence information allowed calculation of the G+C content with a high degree of accuracy. The data (Figure 3) suggested a very weak tendency toward increased G+C content in the evolutionary progression of the eumetazoans. Since the frequency of the (G₃₊N_1-11G₃₊N_1-11G₃₊N_1-11G₃₊) motif is expected to increase with increasing G+C content, a baseline expectation was calculated for a random genome of 20Mb with equal frequencies for each of the four nucleotides. This baseline expectation for random sequences was calculated from a direct search of 20Mb of random sequence for the (G₃₊N_1-7 G₃₊N_1-7G₃₊N_1-7G₃₊) motif. Since the random sequence was 50% G+C, the expectation exceeded that for the highest G+C content in the sequenced genomes studied here (42%).

Corresponding scans of each of the sequenced genomes yielded the data given in Figure 4. The motif was overrepresented relative to the random expectation in each organism and the frequency in number/bp increased steadily as the epigenetic potential of the genome increased.

Interestingly, inverted repeat frequency, tandem repeat frequency and Z-DNA sequence frequency did not show smooth increases as a function of the time of divergence (Table II). With the exception of Z-DNA frequency, each of these motifs was present in every eukaryotic genome tested at a frequency that exceeded the expectation for a random genome as previously suggested from sampling data (22). However, only the G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊ motif approximated a linear increase as a function of time at divergence (Figure 4).

Figure 2.

Genome size for fully-sequenced organisms vs time at divergence node. The line represents the best linear fit to the data (in the least square sense). The shaded area marks the 68% confidence interval on the fitted line. The linear extrapolation to the origin of the eumetazoans is about 0.584 Byr before present, corresponding to the Ediacaran period, well after the origin of the eukaryotes. Sizes are the sum of the sequenced bases for each chromosome in each organism. Similar plots were obtained with 2N male and 2N female genome sizes. Byr: Billion years.

Discussion

Developmental complexity is synonymous with genetic and epigenetic complexity. Since each stage in a developmental program manipulates the same genomic sequence as new cell types and stably differentiated states are elaborated, developmental complexity results from the combined epigenetic and genetic complexity of an organism. While it is clear that the developmental complexity of a mammal is significantly greater than that of a roundworm, it is difficult to assign a measure of developmental complexity to the respective genomes. In general, the Aristotelian Chain of Being has been used to order organisms qualitatively in demonstrating that genome size (C-value paradox) (41) and gene number (G-value paradox) (42, 43) cannot account for evolutionary complexity. Current best evidence suggests that transcription factor numbers (3), or increasingly complex cis regulatory elements and multiprotein transcription complexes scale with the Chain of Being (4, 44).

Scaling developmental and epigenetic complexity. While the Chain of Being places organisms in an intuitive order, the underlying logic of this order can be seen in palenontology. Palenontology, interpreted through Darwinian evolution, offers the best approach to an unbiased scale of developmental complexity against which candidate processes can be measured. Among the animals, the date at which a given species diverged from the main evolutionary lineage can be taken as its position in the genetic and epigenetic hierarchy. This approach places the Aristotelian Chain of Being on a paleontological time scale that not only orders genetic and epigenetic complexity but also permits genomic analysis of the trends in the emergence of that complexity.

Figure 3.

Percent G+C content of fully-sequenced organisms vs. time at divergence node. The line represents the best linear fit to the data (in the least square sense). The shaded area marks the 90% confidence interval on the fitted line. Byr: Billion years.

Figure 4.

G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊ motif frequency vs. time at divergence node. The line represents the best linear fit to the data (in the least square sense). The frequency is the average over male and female genomes for each organism. The shaded area marks the 68% confidence interval on the fitted line. Random sequence expectation was calculated from random sequence scans (red). Eukaryotic genomes appear to have evolved about 1.4-2.5 Byr ago (26). The linear extrapolation for the beginning of the expansion of this motif in the eumetazoa is about 0.688 Byr before present, corresponding to the Ediacaran period, well after the origin of the eukaryotes. Byr: Billion years.

View this table:

Table II.

Average repeat sequence frequencies. Observed frequencies for each of the canonical repeats studied here are given for each organism in occurrences/base-pair. The expectation in occurrences/ base-pair is calculated by searching random DNA sequences for each motif.

Based on paleontolological evidence (34) and molecular clocks (26, 28, 30) that encompass the major laboratory organisms (34), the genomes of the eukaryotic animals can now be placed on a paleontological time scale with a high degree of confidence.

When the eumetazoan genome sizes studied here are placed on this paleontological time scale, it is clear that these particular genomes scale linearly from about 584 million years ago as one might expect for a lineage thought to have originated at about the time of the Cambrian Explosion. Obviously this linear relationship for genome size holds only for the laboratory species studied here. Additional organisms plotted on this scale would obscure this linear relationship (27): nematodes can have genomes with sizes comparable to mammals. Nevertheless, these organisms can be traced to the divergence points shown in Figure 2 and, thus, provide an evolutionary series of organisms with monotonically increasing genome sizes that permit further analysis.

Given the evidence demonstrating that sequence motifs associated with non-B conformations can undermine genomic stability by promoting mutagenesis (45), dynamic mutation (46-48) and gene rearrangement (24, 49), it is reasonable to conclude that the maintenance of the larger genomes is enabled by the evolution of suppressors of non-B structure like the RecQ (50) helicase family represented in modern yeast by Sgs1 (51). By enabling the genomes to incorporate high conformation space sequence motifs (52) like those associated with the formation of quadruplex, triplex and hairpin structures, the emergence of large genomes is allowed to go forward unhindered. Given neutral selection, the aggregate frequency (namely the number per base pair) at which these motifs should occur in a random DNA sequence is expected to be a constant, independent of genome size. The present analysis showed that Z-DNA is maintained below the random expectation in every organism tested (Table II) suggesting that it is under negative selective pressure. In contrast, the observed frequency of every motif studied except that of Z-DNA fell above the associated random expectation for every organism studied (Table II), suggesting that at least some of these motifs have been the subject to positive selection. This is consistent with the results on the maintenance of cruciforms on Y (53) and the abundance of simple repeats (54) if one assumes that the positively selected palindromes, inverted repeats and tandem repeats form a class with low annealing temperatures.

Of the sequences that occured at frequencies above the respective random expectation, only the (G₃₊N_1-7G₃₊N_1-7 G₃₊N_1-7G₃₊) motif exhibited a monotonic increase in frequency on the palenotological scale (Figure 4). Since it is capable of forming quadruplex, triplex and hairpin structures (Figure 1 and Table I), the expanding frequency of this motif can be taken as a measure of the expansion of non-B-DNA forming potential that is linked to the expansion of developmental potential (Figure 4). Importantly, these rather large changes in sequence motif representation occurred without dramatic changes in the overall G+C content. Although the progression encompassed poikilothermic (cold blooded) eukaryotes and homeothermic (warm blooded) metazoans, none are extremophiles (requiring physically extreme conditions). The organisms studied here develop in temperatures that range from 10°C to 44°C, thus ranging in G+C content from about 35% to 42%. While the range from 35% to 42% suggests a trend toward a higher density of G-rich motifs in more highly evolved genomes, none of the G+C contents present in sequenced organisms approached the 50% G+C content present in a completely random genome. Clearly, the increase in the frequency of linked G-rich motifs (Figure 4) exceeded the expectation for a random sequence and cannot be due to the small increase in G+C content associated with the evolutionary progression.

Positive selection for the G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊ motif. The question of whether positive selection for this form of non-B structure potential is a reflection of the expansion of the developmental potential of the genome or is a reflection of the accumulation of junk DNA (55) sequences is an important one. While human Alu sequences lack the motif, the consensus sequence of the human L1 retrotransposon contains one copy of the motif near the polyA sequence. Thus at least some of the instances of the motif (as much as a third of the human occurrences) can be attributed to retrotransposition within the L1 family in humans. Although retrotransposition may actually be promoted by the formation of quadruplex, hairpin and triplex structures, it does not appear to be required for transposition, since P-elements in Drosophila lack the motif. Thus, the proliferation of repetitive elements does not appear to account for the observed smooth expansion of the motif frequency with developmental complexity. In short, the observed increase in the frequency of the G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊ motif was more consistent with positive selection for functionality (perhaps via retrotransposition) than with neutral selection associated with the accumulation of junk DNA sequences.

Positive selection is also consistent with the appearance of proteins that promote the formation of unusual structures in DNA (e.g. the meiotic pairing gene Hop1 (11) and Nucleolin (56)). Moreover, positive selection is difficult to rationalize in terms of the coding capacity of these sequences for proteins given the redundancy of the genetic code and its capacity to mold codon usage (57). The analysis suggests that sequences capable of quadruplex, hairpin and triplex formation are not merely associated with harmful effects as one may expect from their association with sites of dynamic mutation (46-48, 58) and gene rearrangement (49) but also serve useful developmental functions since they appear to have been under strong positive selective pressure during the emergence of developmental complexity.

Roles for the G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊ motif. In vitro studies of the formation of quadruplex DNAs have generally been performed with short oligodeoxynucleotides representing regions of biologically important sequences. A key example is the nuclease hypersensitive element (NHE III₁) located between promoter P0 and promoter P1 in the human MYC gene (59). This element has been shown to form an intramolecular quadruplex (21) in vitro. However, mutagenesis studies and the presence of an imperfect homopurine mirror repeat present in the sequence suggest that it can also form an intramolecular triplex (60). Consistent with these findings, the region has also been shown to carry a canonical motif (G₃₊N_1-7G₃₊N_1-7 G₃₊N₁₇ G₃₊) characteristic of both quadruplex sequences (14) and the G-rich subset of triplex sequences formed by imperfect homopurine mirror repeats (23).

Several well-studied examples of quadruplex or triplex formation (Table 1) exhibit this motif and thus carry the capacity for the formation of either or both types of non-B DNA structure. Moreover, other similar motifs are known to form quadruplex sequences. For example, motifs such as (G₃₊N_1-11G₃₊N_1-11G₃₊N_1-11G₃₊) with extended loop size are expected to form quadruplex DNA (10, 61). However, longer loops reduce stability (62). In addition, it is well known that the (GGC)_n motif in triplet repeat sequences from the FMR1 gene has been observed to form quadruplex DNA (47), and is associated with the formation of a spontaneous slippage intermediate on the complementary C-rich strand (46, 48, 58).

Although quadruplex formation in oligodeoxynucleotides is not often studied in the presence of both complementary strands, quadruplexes have been observed to form in vitro in the presence of their complementary strands (58). Moreover, the extended loop size (Lamparska-Kupsik and Smith, unpublished) or the presence of the complementary strand (63) can result in the stable trimolecular triple helices. The full structural complexity of these motifs is depicted schematically in Figure 1.

A significant number of proposals for the function of these sequence motifs are consistent with the expectation that they should expand in frequency in concert with developmental complexity (35, 64). Since the intermolecular quadruplex is a molecular mimic of the synaptonemal alignment of homologues during meiosis, an active role in meiosis has been suggested (10). Support for this proposal has been adduced from the KEM1/SEP1 nuclease system (65) in yeast which can block meiosis at pachytene when mutated (66), and from the presence of quadruplex binding proteins of similar function in human cells (67). These proposals are also consistent with the expansion of the G₃₊N_1-7G₃₊N_1-7 G₃₊N_1-7G₃₊ motif, because homologous pairing is expected to require a higher density of interstrand links as chromosome size increases.

Gene expression. Proposals for a role in the control of gene expression through the formation of intramolecular triplex (60) or quadruplex (21) at promoter sequences in the human c-MYC (59), KRAS (68) and c-KIT (69) genes are consistent with the presence of these elements in the 5' UTRs of a significant number of human genes (14) and with the capacity of Hop1 (11) and Nucleolin (56) to induce these structures in DNA.

These proposals require the generation of the structure in DNA, however, it is also possible that the RNA transcript would carry the folded structure encoded by a B-DNA sequence as discussed by Huppert et al. (14). For example, transcription from start site P0 at the human MYC gene would produce a 5' UTR capable of intramolecular quadruplex or triplex formation, while transcription from start site P1 would not. Thus, the presence of these motifs in 5' and 3' UTRs is consistent with a role in RNA processing (14) and the mechanism of action of non-coding RNAs.

The combinatorial limit. Based on Ohno's original suggestion (55) that mutation rate limits the total number of genes to between 15,000 and 20,000 genes per genome, it is tempting to speculate that once this combinatorial limit (2) is reached (apparently already at the roundworms), additional epigenetic mechanisms involving phenomena like unusual DNA structure formation and DNA methylation must come into play.

Structural complexity and epigenetic potential. A structurally complex motif like the (G₃₊N_1-7G₃₊N_1-7 G₃₊N_1-7G₃₊) motif necessarily generates a dramatic increase in the epigenetic potential of the sequence at a given site. A simple B-DNA sequence offers one DNA conformation that can be recognized by a protein or nucleic acid modulator. A (G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊) motif can present six distinct DNA conformations that can be uniquely recognized by protein or nucleic acid modulators (Figure 1). Thus, even the pair-wise interaction potential of these motifs as sites of protein or nucleic acid recognition will be increased by fifteen times (=6!/2!(6-2)!) the number of motifs present in the genome. Thus, the increase in (G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊) density links the positive selection for these non-B conformations with the emergence of epigenetic complexity through a clear enhancement of information content. In other words, the different intermolecular and intramolecular conformations that can be adopted or encoded in RNA by a single DNA sequence form different epigenetic signals that effectively increase the epigenetic information content of the genome (12-14, 52, 59, 70, 71) by increasing the number of protein and nucleic acid binding sites available from a single sequence. The data reported here suggest that eumetazoans have enhanced their developmental and epigenetic potential by selectively incorporating structurally complex motifs in their genomes in spite of the potential that these motifs have for chromosomal damage (50, 63).

In conclusion, similar to transcription factors and cis-regulatory elements, non-B DNA forming motifs like the G₃₊N_1-7G₃₊N_1-7G₃₊N_1-7G₃₊ motif appeared to scale with organismic complexity when placed on a paleontological scale that quantifies the Aristotelian Chain of Being. The observed steady increase in the frequency of these motifs associated with the emergence of complexity was consistent with the proposed roles for this motif in amplifying the number of structural states in nucleic acids that are available for molecular recognition during the elaboration of epigenetic states.

Acknowledgments

Yu-Loung Chang and Jarrod Clark are thanked for their help in compiling data and illustrating figures.

Received March 20, 2010.
Revision received May 12, 2010.
Accepted May 14, 2010.

References

↵
1. Britten RJ,
2. Davidson EH
: Gene Regulation for Higher Cells: A theory. Science 165: 349-357, 1969.
OpenUrl FREE Full Text
↵
1. Davidson EH,
2. Levine MS
: Properties of developmental gene regulatory networks. Proc Natl Acad Sci USA 105: 20063-20066, 2008.
OpenUrl Abstract/FREE Full Text
↵
1. van N,
2. imwegen E
: Scaling laws in the functional content of genomes. Trends Genet 19: 479-484, 2003.
OpenUrl CrossRef PubMed
↵
1. Levine M,
2. Tjian R
: Transcription regulation and animal diversity. Nature 424: 147-151, 2003.
OpenUrl CrossRef PubMed
↵
1. Chen K,
2. Rajewsky N
: The evolution of gene regulation by transcription factors and microRNAs. Nat Rev Genet 8: 93-103, 2007.
OpenUrl CrossRef PubMed
↵
1. Jones PA,
2. Baylin SB
: The epigenomics of cancer. Cell 128: 683-692, 2007.
OpenUrl CrossRef PubMed
1. Miranda TB,
2. Jones PA
: DNA methylation: the nuts and bolts of repression. J Cell Physiol 213: 384-390, 2007.
OpenUrl CrossRef PubMed
↵
1. Schaefer CB,
2. Ooi SK,
3. Bestor TH,
4. Bourc'his D
: Epigenetic decisions in mammalian germ cells. Science 316: 398-399, 2007.
OpenUrl Abstract/FREE Full Text
↵
1. Smith SS
: DNA methylation in eukaryotic chromosome stability. Mol Carcinog 4: 91-92, 1991.
OpenUrl PubMed
↵
1. Sen D,
2. Gilbert W
: Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis. Nature 334: 364-366, 1988.
OpenUrl CrossRef PubMed
↵
1. Muniyappa K,
2. Anuradha S,
3. Byers B
: Yeast meiosis-specific protein Hop1 binds to G4 DNA and promotes its formation. Mol Cell Biol 20: 1361-1369, 2000.
OpenUrl Abstract/FREE Full Text
↵
1. Hershman SG,
2. Chen Q,
3. Lee JY,
4. Kozak ML,
5. Peng Y,
6. Wang L-S,
7. F,
8. Brad Johnson FB
: Genomic distribution and functional analyses of potential G-quadruplex-forming sequences in Saccharomyces cerevisiae. Nucleic Acids Res 36: 144-156, 2008.
OpenUrl Abstract/FREE Full Text
1. Goñi JP,
2. Vaquerizas JM,
3. Dopazo J,
4. Orozco M
: Exploring the reasons for the large density of triplex-forming oligonucleotide target sequences in the human regulatory regions. BMC Genomics 7: 63-73, 2006.
OpenUrl CrossRef PubMed
↵
1. Huppert JL,
2. Bugaut A,
3. Kumari S,
4. Balasubramanian S
: G-quadruplexes: the beginning and end of UTRs. Nucleic Acids Res 36: 6260-6268, 2008.
OpenUrl Abstract/FREE Full Text
1. Gonzalez V,
2. Guo K,
3. Hurley L,
4. Sun D
: Identification and characterization of nucleolin as a c-myc G-quadruplex-binding protein. J Biol Chem 284: 23622-23635, 2009.
OpenUrl Abstract/FREE Full Text
1. Grand CL,
2. Powell TJ,
3. Nagle RB,
4. Bearss DJ,
5. Tye D,
6. Gleason-Guzman M,
7. Hurley LH
: Mutations in the G-quadruplex silencer element and their relationship to c-MYC overexpression, NM23 repression, and therapeutic rescue. Proc Natl Acad Sci USA 102: 516, 2005.
OpenUrl FREE Full Text
↵
1. Guo K,
2. Gokhale V,
3. Hurley LH,
4. Sun D
: Intramolecularly folded G-quadruplex and i-motif structures in the proximal promoter of the vascular endothelial growth factor gene. Nucleic Acids Res 36: 4598-4608, 2008.
OpenUrl Abstract/FREE Full Text
1. Palumbo SL,
2. Ebbinghaus SW,
3. Hurley LH
: Formation of a unique end-to-end stacked pair of G-quadruplexes in the hTERT core promoter with implications for inhibition of telomerase by G-quadruplex-interactive ligands. J Am Chem Soc 131: 10878-10891, 2009.
OpenUrl CrossRef PubMed
1. Palumbo SL,
2. Memmott RM,
3. Uribe DJ,
4. Krotova-Khan Y,
5. Hurley LH,
6. Ebbinghaus SW
: A novel G-quadruplex-forming GGA repeat region in the c-myb promoter is a critical regulator of promoter activity. Nucleic Acids Res 36: 1755-1769, 2008.
OpenUrl Abstract/FREE Full Text
1. Rangan A,
2. Fedoroff OY,
3. Hurley LH
: Induction of duplex to G-quadruplex transition in the c-myc promoter region by a small molecule. J Biol Chem 276: 4640-4646, 2001.
OpenUrl Abstract/FREE Full Text
↵
1. Yang D,
2. Hurley LH
: Structure of the biologically relevant G-quadruplex in the c-MYC promoter. Nucleosides Nucleotides Nucleic Acids 25: 951-968, 2006.
OpenUrl CrossRef PubMed
↵
1. Cox R,
2. Mirkin SM
: Characteristic enrichment of DNA repeats in different genomes. Proc Natl Acad Sci USA 94: 5237-5242, 1997.
OpenUrl Abstract/FREE Full Text
↵
1. Frank-Kamenetskii MD,
2. Mirkin SM
: Triplex DNA structures. Annu Rev Biochem 64: 65-95, 1995.
OpenUrl CrossRef PubMed
↵
1. Sundquist WI,
2. Klug A
: Telomeric DNA dimerizes by formation of guanine tetrads between hairpin loops. Nature 342: 825-829, 1989.
OpenUrl CrossRef PubMed
↵
1. Mirkin SM,
2. Lyamichev VI,
3. Drushlyak KN,
4. Dobrynin VN,
5. Filippov SA,
6. Frank-Kamenetskii MD
: DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature 330: 495-497, 1987.
OpenUrl CrossRef PubMed
↵
1. Hedges SB,
2. Blair JE,
3. Venturi ML,
4. Shoe JL
: A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol 4: 2-9, 2004.
OpenUrl CrossRef PubMed
↵
1. Gregory TR
: Macroevolution, hierarchy theory, and the C-value enigma. Paleobiology Paleobiology 30: 179-202, 2004.
OpenUrl
↵
1. Doolittle RF,
2. Feng D-F,
3. Tsang S,
4. Cho G,
5. Little E
: Determining divergence times of the major kingdoms of living organisms with a protein clock. Science 271: 470-477, 1996.
OpenUrl Abstract
1. Wray GA
: Dating branches on the tree of life using DNA. Genome Biol 3: REVIEWS0001, 2002.
OpenUrl PubMed
↵
1. Roger AJ,
2. L.A. H.
The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation. Phil Trans R Soc B 361: 1039-1054, 2006.
OpenUrl Abstract/FREE Full Text
↵
1. Benton MJ,
2. Ayala FJ
: Dating the Tree of Life. Science 300: 1698-1700, 2003.
OpenUrl Abstract/FREE Full Text
↵
1. Benton MJ,
2. Donoghue PCJ
: Paleontological evidence to date the Tree of Life. Mol Biol Evol 24: 26-53, 2006.
OpenUrl CrossRef PubMed
1. Benton MJ,
2. Donoghue PCJ
: Paleontological evidence to date the Tree of Life. (Erratum). Mol Biol Evol 24: 26-53, 2007.
OpenUrl
↵
1. Donoghue PCJ,
2. Benton MJ
: Rocks and clocks: calibrating the Tree of Life using fossils and molecules. Trends Ecol Evol 22: 424-431, 2007.
OpenUrl CrossRef PubMed
↵
1. Huppert JL,
2. Balasubramanian S
: Prevalence of quadruplexes in the human genome. Nucleic Acids Res 33: 2908-2916, 2005.
OpenUrl Abstract/FREE Full Text
↵
1. Eddy J,
2. Maizels N
: Gene function correlates with potential for G4 DNA formation in the human genome. Nucleic Acids Res 34: 3887-3896, 2006.
OpenUrl Abstract/FREE Full Text
↵
1. Benson G
: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27: 573-580, 1999.
OpenUrl Abstract/FREE Full Text
↵
1. Warburton PE,
2. Giordano J,
3. Cheung F,
4. Gelfand Y,
5. Benson G
: Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. Genome Res 14: 1861-1869, 2004.
OpenUrl Abstract/FREE Full Text
↵
1. Ho PS,
2. Ellison MJ,
3. Quigley GJ,
4. Rich A
: A computer aided thermodynamic approach for predicting the formation of Z-DNA in naturally occurring sequences. EMBO J 5: 2737-2744, 1986.
OpenUrl PubMed
↵
1. Rice P,
2. Longden I,
3. Bleasby A
: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16: 276-277, 2000.
OpenUrl CrossRef PubMed
↵
1. Thomas CA Jr.
: The genetic organization of chromosomes. Annu Rev Genet 5: 237-256, 1971.
OpenUrl CrossRef PubMed
↵
1. Hodgkin J
: What does a worm want with 20,000 genes? Genome Biol 2: COMMENT2008, 2001.
OpenUrl PubMed
↵
1. Hahn MW,
2. Wray GA
: The G-value paradox. Evol Dev 4: 73-75, 2002.
OpenUrl CrossRef PubMed
↵
1. Davidson EH
: The Regulatory Genome. San Deigo: Academic Press, 2006.
↵
1. Wang G,
2. Vasquez KM
: Naturally occurring H-DNA-forming sequences are mutagenic in mammalian cells. Proc Natl Acad Sci USA 101: 13448-13453, 2004.
OpenUrl Abstract/FREE Full Text
↵
1. Chen X,
2. Mariappan SV,
3. Catasti P,
4. Ratliff R,
5. Moyzis RK,
6. Laayoun A,
7. Smith SS,
8. Bradbury EM,
9. Gupta G
: Hairpins are formed by the single DNA strands of the fragile X triplet repeats: structure and biological implications. Proc Natl Acad Sci USA 92: 5199-5203, 1995.
OpenUrl Abstract/FREE Full Text
↵
1. Fry M,
2. Loeb LA
. The fragile X syndrome d(CGG)n nucleotide repeats form a stable tetrahelical structure. Proc Natl Acad Sci USA 91: 4950-4954, 1994.
OpenUrl Abstract/FREE Full Text
↵
1. Gacy AM,
2. Goellner G,
3. Juranic N,
4. Macura S,
5. McMurray CT
: Trinucleotide repeats that expand in human disease form hairpin structures in vitro. Cell 81: 533-540, 1995.
OpenUrl CrossRef PubMed
↵
1. Raghavan SC,
2. Swanson PC,
3. Wu X,
4. Hsieh CL,
5. Lieber MR
: A non-B-DNA structure at the Bcl-2 major breakpoint region is cleaved by the RAG complex. Nature 428: 88-93, 2004.
OpenUrl CrossRef PubMed
↵
1. Johnson JE,
2. Cao K,
3. Ryvkin P,
4. Wang LS,
5. Johnson FB
: Altered gene expression in the Werner and Bloom syndromes is associated with sequences having G-quadruplex forming potential. Nucleic Acids Res, 38(4): 1114-1122, 2010
OpenUrl Abstract/FREE Full Text
↵
1. Sun H,
2. Bennett RJ,
3. Maizels N
: The Saccharomyces cerevisiae sgs1 helicase efficiently unwinds G-G paired DNAs. Nucleic Acids Res 27: 1978-1984, 1999.
OpenUrl Abstract/FREE Full Text
↵
1. Smith SS,
2. Crocitto L
: DNA methylation in eukaryotic chromosome stability revisited: DNA methyltransferase in the management of DNA conformation space. Mol Carcinog 26: 1-9, 1999.
OpenUrl CrossRef PubMed
↵
1. Rozen S,
2. Skaletsky H,
3. Marszalek JD,
4. Minx PJ,
5. Cordum HS,
6. Waterston RH,
7. Wilson RK,
8. Page DC
: Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature 423: 873-876, 2003.
OpenUrl CrossRef PubMed
↵
1. Bacolla A,
2. Larson JE,
3. Collins JR,
4. Li J,
5. Milosavljevic A,
6. Stenson PD,
7. Cooper DN,
8. Wells RD
: Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties. Genome Res 18: 1545-1553, 2008.
OpenUrl Abstract/FREE Full Text
↵
1. Ohno S
: So Much ``Junk DNA'' in our Genome. Brookhaven Symposia in Bioogy 23: 366-370, 1972.
OpenUrl
↵
1. Hanakahi LA,
2. Sun H,
3. Maizels N
: High affinity interactions of nucleolin with G-G-paired rDNA. J Biol Chem 274: 15908-15912, 1999.
OpenUrl Abstract/FREE Full Text
↵
1. Smith SS
: Species-specific differences in tumorigenesis and senescence. Trends Genet 10: 305-306, 1994.
OpenUrl PubMed
↵
1. Smith SS,
2. Laayoun A,
3. Lingeman RG,
4. Baker DJ,
5. Riley J
: Hypermethylation of telomere-like foldbacks at codon 12 of the human c-Ha-ras gene and the trinucleotide repeat of the FMR-1 gene of fragile X. J Mol Biol 243: 143-151, 1994.
OpenUrl CrossRef PubMed
↵
1. Siddiqui-Jain A,
2. Grand CL,
3. Bearss DJ,
4. Hurley LH
: Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc Natl Acad Sci USA 99: 11593-11598, 2002.
OpenUrl Abstract/FREE Full Text
↵
1. Belotserkovskii BP,
2. De Silva E,
3. Tornaletti S,
4. Wang G,
5. Vasquez KM,
6. Hanawalt PC
: A triplex-forming sequence from the human c-MYC promoter interferes with DNA transcription. J Biol Chem 282: 32433-32441, 2007.
OpenUrl Abstract/FREE Full Text
↵
1. Clark J,
2. Smith SS
: Secondary structure at a hot spot for DNA methylation in DNA from human breast cancers. Cancer Genomics Proteomics 5: 241-251, 2008.
OpenUrl Abstract/FREE Full Text
↵
1. Hazel P,
2. Huppert J,
3. Balasubramanian S,
4. Neidle S
: Loop-length-dependent folding of G-quadruplexes. J Am Chem Soc 126: 16405-16415, 2004.
OpenUrl CrossRef PubMed
↵
1. Raghavan SC,
2. Chastain P,
3. Lee JS,
4. Hegde BG,
5. Houston S,
6. Langen R,
7. Hsieh CL,
8. Haworth IS,
9. Lieber MR
: Evidence for a triplex DNA conformation at the bcl-2 major breakpoint region of the t(14;18) translocation. J Biol Chem 280: 22749-22760, 2005.
OpenUrl Abstract/FREE Full Text
↵
1. Todd AK,
2. Johnston M,
3. Neidle S
: Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res 33: 2901-2907, 2005.
OpenUrl Abstract/FREE Full Text
↵
1. Liu Z,
2. Frantz JD,
3. Gilbert W,
4. Tye BK
: Identification and characterization of a nuclease activity specific for G4 tetrastranded DNA. Proc Natl Acad Sci USA 90: 3157-3161, 1993.
OpenUrl Abstract/FREE Full Text
↵
1. Tishkoff DX,
2. Rockmill B,
3. Roeder GS,
4. Kolodner RD
: The sep1 mutant of Saccharomyces cerevisiae arrests in pachytene and is deficient in meiotic recombination. Genetics 139: 495-509, 1995.
OpenUrl PubMed
↵
1. Creacy SD,
2. Routh ED,
3. Iwamoto F,
4. Nagamine Y,
5. Akman SA,
6. Vaughn JP
: G4 resolvase 1 binds both DNA and RNA tetramolecular quadruplex with high affinity and is the major source of tetramolecular quadruplex G4-DNA and G4-RNA resolving activity in HeLa cell lysates. J Biol Chem 283: 34626-34634, 2008.
OpenUrl Abstract/FREE Full Text
↵
1. Cogoi S,
2. Quadrifoglio F,
3. Xodo LE
: G-rich oligonucleotide inhibits the binding of a nuclear protein to the Ki-ras promoter and strongly reduces cell growth in human carcinoma pancreatic cells. Biochemistry 43: 2512-2523, 2004.
OpenUrl CrossRef PubMed
↵
1. Bejugam M,
2. Sewitz S,
3. Shirude PS,
4. Rodriguez R,
5. Shahid R,
6. Balasubramanian S
: Trisubstituted isoalloxazines as a new class of G-quadruplex binding ligands: small molecule regulation of c-kit oncogene expression. J Am Chem Soc 129: 12926-12927, 2007.
OpenUrl CrossRef PubMed
↵
1. Catasti P,
2. Chen X,
3. Moyzis RK,
4. Bradbury EM,
5. Gupta G
: Structure-function correlations of the insulin-linked polymorphic region. J Mol Biol 264: 534-545, 1996.
OpenUrl CrossRef PubMed
↵
1. Sun D,
2. Guo K,
3. Rusche JJ,
4. Hurley LH
: Facilitation of a structural transition in the polypurine/polypyrimidine tract within the proximal promoter region of the human VEGF gene by the presence of potassium and G-quadruplex-interactive agents. Nucleic Acids Res 33: 6070-6080, 2005.
OpenUrl Abstract/FREE Full Text
1. Simonsson T,
2. Pecinka P,
3. Kubista M
: DNA tetraplex formation in the control region of c-myc. Nucleic Acids Res 26: 1167-1172, 1998.
OpenUrl Abstract/FREE Full Text
1. Firulli AB,
2. Maibenco DC,
3. Kinniburgh AJ
: Triplex forming ability of a c-myc promoter element predicts promoter strength. Arch Biochem Biophys 310: 236-242, 1994.
OpenUrl CrossRef PubMed
1. Lew A,
2. Rutter WJ,
3. Kennedy GC
: Unusual DNA structure of the diabetes susceptibility locus IDDM2 and its effect on transcription by the insulin promoter factor Pur-1/MAZ. Proc Natl Acad Sci USA 97: 12508-12512, 2000.
OpenUrl Abstract/FREE Full Text
1. Qin Y,
2. Rezler EM,
3. Gokhale V,
4. Sun D,
5. Hurley LH
: Characterization of the G-quadruplexes in the duplex nuclease hypersensitive element of the PDGF-A promoter and modulation of PDGF-A promoter activity by TMPyP4. Nucleic Acids Res 35: 7698-7713, 2007.
OpenUrl Abstract/FREE Full Text

In this issue

Download PDF

Article Alerts

Email Article

Citation Tools

Reprints and Permissions

Cited By...

Duplex DNA from Sites of Helicase-Polymerase Uncoupling Links Non-B DNA Structure Formation to Replicative Stress

Google Scholar

[1] ↵
Britten RJ,
Davidson EH
: Gene Regulation for Higher Cells: A theory. Science 165: 349-357, 1969.
OpenUrl FREE Full Text

[2] Britten RJ,

[3] Davidson EH

[4] ↵
Davidson EH,
Levine MS
: Properties of developmental gene regulatory networks. Proc Natl Acad Sci USA 105: 20063-20066, 2008.
OpenUrl Abstract/FREE Full Text

[5] Davidson EH,

[6] Levine MS

[7] ↵
van N,
imwegen E
: Scaling laws in the functional content of genomes. Trends Genet 19: 479-484, 2003.
OpenUrl CrossRef PubMed

[8] van N,

[9] imwegen E

[10] ↵
Levine M,
Tjian R
: Transcription regulation and animal diversity. Nature 424: 147-151, 2003.
OpenUrl CrossRef PubMed

[11] Levine M,

[12] Tjian R

[13] ↵
Chen K,
Rajewsky N
: The evolution of gene regulation by transcription factors and microRNAs. Nat Rev Genet 8: 93-103, 2007.
OpenUrl CrossRef PubMed

[14] Chen K,

[15] Rajewsky N

[16] ↵
Jones PA,
Baylin SB
: The epigenomics of cancer. Cell 128: 683-692, 2007.
OpenUrl CrossRef PubMed

[17] Jones PA,

[18] Baylin SB

[19] Miranda TB,
Jones PA
: DNA methylation: the nuts and bolts of repression. J Cell Physiol 213: 384-390, 2007.
OpenUrl CrossRef PubMed

[20] Miranda TB,

[21] Jones PA

[22] ↵
Schaefer CB,
Ooi SK,
Bestor TH,
Bourc'his D
: Epigenetic decisions in mammalian germ cells. Science 316: 398-399, 2007.
OpenUrl Abstract/FREE Full Text

[23] Schaefer CB,

[24] Ooi SK,

[25] Bestor TH,

[26] Bourc'his D

[27] ↵
Smith SS
: DNA methylation in eukaryotic chromosome stability. Mol Carcinog 4: 91-92, 1991.
OpenUrl PubMed

[28] Smith SS

[29] ↵
Sen D,
Gilbert W
: Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis. Nature 334: 364-366, 1988.
OpenUrl CrossRef PubMed

[30] Sen D,

[31] Gilbert W

[32] ↵
Muniyappa K,
Anuradha S,
Byers B
: Yeast meiosis-specific protein Hop1 binds to G4 DNA and promotes its formation. Mol Cell Biol 20: 1361-1369, 2000.
OpenUrl Abstract/FREE Full Text

[33] Muniyappa K,

[34] Anuradha S,

[35] Byers B

[36] ↵
Hershman SG,
Chen Q,
Lee JY,
Kozak ML,
Peng Y,
Wang L-S,
F,
Brad Johnson FB
: Genomic distribution and functional analyses of potential G-quadruplex-forming sequences in Saccharomyces cerevisiae. Nucleic Acids Res 36: 144-156, 2008.
OpenUrl Abstract/FREE Full Text

[37] Hershman SG,

[38] Chen Q,

[39] Lee JY,

[40] Kozak ML,

[41] Peng Y,

[42] Wang L-S,

[43] F,

[44] Brad Johnson FB

[45] Goñi JP,
Vaquerizas JM,
Dopazo J,
Orozco M
: Exploring the reasons for the large density of triplex-forming oligonucleotide target sequences in the human regulatory regions. BMC Genomics 7: 63-73, 2006.
OpenUrl CrossRef PubMed

[46] Goñi JP,

[47] Vaquerizas JM,

[48] Dopazo J,

[49] Orozco M

[50] ↵
Huppert JL,
Bugaut A,
Kumari S,
Balasubramanian S
: G-quadruplexes: the beginning and end of UTRs. Nucleic Acids Res 36: 6260-6268, 2008.
OpenUrl Abstract/FREE Full Text

[51] Huppert JL,

[52] Bugaut A,

[53] Kumari S,

[54] Balasubramanian S

[55] Gonzalez V,
Guo K,
Hurley L,
Sun D
: Identification and characterization of nucleolin as a c-myc G-quadruplex-binding protein. J Biol Chem 284: 23622-23635, 2009.
OpenUrl Abstract/FREE Full Text

[56] Gonzalez V,

[57] Guo K,

[58] Hurley L,

[59] Sun D

[60] Grand CL,
Powell TJ,
Nagle RB,
Bearss DJ,
Tye D,
Gleason-Guzman M,
Hurley LH
: Mutations in the G-quadruplex silencer element and their relationship to c-MYC overexpression, NM23 repression, and therapeutic rescue. Proc Natl Acad Sci USA 102: 516, 2005.
OpenUrl FREE Full Text

[61] Grand CL,

[62] Powell TJ,

[63] Nagle RB,

[64] Bearss DJ,

[65] Tye D,

[66] Gleason-Guzman M,

[67] Hurley LH

[68] ↵
Guo K,
Gokhale V,
Hurley LH,
Sun D
: Intramolecularly folded G-quadruplex and i-motif structures in the proximal promoter of the vascular endothelial growth factor gene. Nucleic Acids Res 36: 4598-4608, 2008.
OpenUrl Abstract/FREE Full Text

[69] Guo K,

[70] Gokhale V,

[71] Hurley LH,

[72] Sun D

[73] Palumbo SL,
Ebbinghaus SW,
Hurley LH
: Formation of a unique end-to-end stacked pair of G-quadruplexes in the hTERT core promoter with implications for inhibition of telomerase by G-quadruplex-interactive ligands. J Am Chem Soc 131: 10878-10891, 2009.
OpenUrl CrossRef PubMed

[74] Palumbo SL,

[75] Ebbinghaus SW,

[76] Hurley LH

[77] Palumbo SL,
Memmott RM,
Uribe DJ,
Krotova-Khan Y,
Hurley LH,
Ebbinghaus SW
: A novel G-quadruplex-forming GGA repeat region in the c-myb promoter is a critical regulator of promoter activity. Nucleic Acids Res 36: 1755-1769, 2008.
OpenUrl Abstract/FREE Full Text

[78] Palumbo SL,

[79] Memmott RM,

[80] Uribe DJ,

[81] Krotova-Khan Y,

[82] Hurley LH,

[83] Ebbinghaus SW

[84] Rangan A,
Fedoroff OY,
Hurley LH
: Induction of duplex to G-quadruplex transition in the c-myc promoter region by a small molecule. J Biol Chem 276: 4640-4646, 2001.
OpenUrl Abstract/FREE Full Text

[85] Rangan A,

[86] Fedoroff OY,

[87] Hurley LH

[88] ↵
Yang D,
Hurley LH
: Structure of the biologically relevant G-quadruplex in the c-MYC promoter. Nucleosides Nucleotides Nucleic Acids 25: 951-968, 2006.
OpenUrl CrossRef PubMed

[89] Yang D,

[90] Hurley LH

[91] ↵
Cox R,
Mirkin SM
: Characteristic enrichment of DNA repeats in different genomes. Proc Natl Acad Sci USA 94: 5237-5242, 1997.
OpenUrl Abstract/FREE Full Text

[92] Cox R,

[93] Mirkin SM

[94] ↵
Frank-Kamenetskii MD,
Mirkin SM
: Triplex DNA structures. Annu Rev Biochem 64: 65-95, 1995.
OpenUrl CrossRef PubMed

[95] Frank-Kamenetskii MD,

[96] Mirkin SM

[97] ↵
Sundquist WI,
Klug A
: Telomeric DNA dimerizes by formation of guanine tetrads between hairpin loops. Nature 342: 825-829, 1989.
OpenUrl CrossRef PubMed

[98] Sundquist WI,

[99] Klug A

[100] ↵
Mirkin SM,
Lyamichev VI,
Drushlyak KN,
Dobrynin VN,
Filippov SA,
Frank-Kamenetskii MD
: DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature 330: 495-497, 1987.
OpenUrl CrossRef PubMed

[101] Mirkin SM,

[102] Lyamichev VI,

[103] Drushlyak KN,

[104] Dobrynin VN,

[105] Filippov SA,

[106] Frank-Kamenetskii MD

[107] ↵
Hedges SB,
Blair JE,
Venturi ML,
Shoe JL
: A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol 4: 2-9, 2004.
OpenUrl CrossRef PubMed

[108] Hedges SB,

[109] Blair JE,

[110] Venturi ML,

[111] Shoe JL

[112] ↵
Gregory TR
: Macroevolution, hierarchy theory, and the C-value enigma. Paleobiology Paleobiology 30: 179-202, 2004.
OpenUrl

[113] Gregory TR

[114] ↵
Doolittle RF,
Feng D-F,
Tsang S,
Cho G,
Little E
: Determining divergence times of the major kingdoms of living organisms with a protein clock. Science 271: 470-477, 1996.
OpenUrl Abstract

[115] Doolittle RF,

[116] Feng D-F,

[117] Tsang S,

[118] Cho G,

[119] Little E

[120] Wray GA
: Dating branches on the tree of life using DNA. Genome Biol 3: REVIEWS0001, 2002.
OpenUrl PubMed

[121] Wray GA

[122] ↵
Roger AJ,
L.A. H.
The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation. Phil Trans R Soc B 361: 1039-1054, 2006.
OpenUrl Abstract/FREE Full Text

[123] Roger AJ,

[124] L.A. H.

[125] ↵
Benton MJ,
Ayala FJ
: Dating the Tree of Life. Science 300: 1698-1700, 2003.
OpenUrl Abstract/FREE Full Text

[126] Benton MJ,

[127] Ayala FJ

[128] ↵
Benton MJ,
Donoghue PCJ
: Paleontological evidence to date the Tree of Life. Mol Biol Evol 24: 26-53, 2006.
OpenUrl CrossRef PubMed

[129] Benton MJ,

[130] Donoghue PCJ

[131] Benton MJ,
Donoghue PCJ
: Paleontological evidence to date the Tree of Life. (Erratum). Mol Biol Evol 24: 26-53, 2007.
OpenUrl

[132] Benton MJ,

[133] Donoghue PCJ

[134] ↵
Donoghue PCJ,
Benton MJ
: Rocks and clocks: calibrating the Tree of Life using fossils and molecules. Trends Ecol Evol 22: 424-431, 2007.
OpenUrl CrossRef PubMed

[135] Donoghue PCJ,

[136] Benton MJ

[137] ↵
Huppert JL,
Balasubramanian S
: Prevalence of quadruplexes in the human genome. Nucleic Acids Res 33: 2908-2916, 2005.
OpenUrl Abstract/FREE Full Text

[138] Huppert JL,

[139] Balasubramanian S

[140] ↵
Eddy J,
Maizels N
: Gene function correlates with potential for G4 DNA formation in the human genome. Nucleic Acids Res 34: 3887-3896, 2006.
OpenUrl Abstract/FREE Full Text

[141] Eddy J,

[142] Maizels N

[143] ↵
Benson G
: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27: 573-580, 1999.
OpenUrl Abstract/FREE Full Text

[144] Benson G

[145] ↵
Warburton PE,
Giordano J,
Cheung F,
Gelfand Y,
Benson G
: Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. Genome Res 14: 1861-1869, 2004.
OpenUrl Abstract/FREE Full Text

[146] Warburton PE,

[147] Giordano J,

[148] Cheung F,

[149] Gelfand Y,

[150] Benson G

[151] ↵
Ho PS,
Ellison MJ,
Quigley GJ,
Rich A
: A computer aided thermodynamic approach for predicting the formation of Z-DNA in naturally occurring sequences. EMBO J 5: 2737-2744, 1986.
OpenUrl PubMed

[152] Ho PS,

[153] Ellison MJ,

[154] Quigley GJ,

[155] Rich A

[156] ↵
Rice P,
Longden I,
Bleasby A
: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16: 276-277, 2000.
OpenUrl CrossRef PubMed

[157] Rice P,

[158] Longden I,

[159] Bleasby A

[160] ↵
Thomas CA Jr.
: The genetic organization of chromosomes. Annu Rev Genet 5: 237-256, 1971.
OpenUrl CrossRef PubMed

[161] Thomas CA Jr.

[162] ↵
Hodgkin J
: What does a worm want with 20,000 genes? Genome Biol 2: COMMENT2008, 2001.
OpenUrl PubMed

[163] Hodgkin J

[164] ↵
Hahn MW,
Wray GA
: The G-value paradox. Evol Dev 4: 73-75, 2002.
OpenUrl CrossRef PubMed

[165] Hahn MW,

[166] Wray GA

[167] ↵
Davidson EH
: The Regulatory Genome. San Deigo: Academic Press, 2006.

[168] Davidson EH

[169] ↵
Wang G,
Vasquez KM
: Naturally occurring H-DNA-forming sequences are mutagenic in mammalian cells. Proc Natl Acad Sci USA 101: 13448-13453, 2004.
OpenUrl Abstract/FREE Full Text

[170] Wang G,

[171] Vasquez KM

[172] ↵
Chen X,
Mariappan SV,
Catasti P,
Ratliff R,
Moyzis RK,
Laayoun A,
Smith SS,
Bradbury EM,
Gupta G
: Hairpins are formed by the single DNA strands of the fragile X triplet repeats: structure and biological implications. Proc Natl Acad Sci USA 92: 5199-5203, 1995.
OpenUrl Abstract/FREE Full Text

[173] Chen X,

[174] Mariappan SV,

[175] Catasti P,

[176] Ratliff R,

[177] Moyzis RK,

[178] Laayoun A,

[179] Smith SS,

[180] Bradbury EM,

[181] Gupta G

[182] ↵
Fry M,
Loeb LA
. The fragile X syndrome d(CGG)n nucleotide repeats form a stable tetrahelical structure. Proc Natl Acad Sci USA 91: 4950-4954, 1994.
OpenUrl Abstract/FREE Full Text

[183] Fry M,

[184] Loeb LA

[185] ↵
Gacy AM,
Goellner G,
Juranic N,
Macura S,
McMurray CT
: Trinucleotide repeats that expand in human disease form hairpin structures in vitro. Cell 81: 533-540, 1995.
OpenUrl CrossRef PubMed

[186] Gacy AM,

[187] Goellner G,

[188] Juranic N,

[189] Macura S,

[190] McMurray CT

[191] ↵
Raghavan SC,
Swanson PC,
Wu X,
Hsieh CL,
Lieber MR
: A non-B-DNA structure at the Bcl-2 major breakpoint region is cleaved by the RAG complex. Nature 428: 88-93, 2004.
OpenUrl CrossRef PubMed

[192] Raghavan SC,

[193] Swanson PC,

[194] Wu X,

[195] Hsieh CL,

[196] Lieber MR

[197] ↵
Johnson JE,
Cao K,
Ryvkin P,
Wang LS,
Johnson FB
: Altered gene expression in the Werner and Bloom syndromes is associated with sequences having G-quadruplex forming potential. Nucleic Acids Res, 38(4): 1114-1122, 2010
OpenUrl Abstract/FREE Full Text

[198] Johnson JE,

[199] Cao K,

[200] Ryvkin P,

[201] Wang LS,

[202] Johnson FB

[203] ↵
Sun H,
Bennett RJ,
Maizels N
: The Saccharomyces cerevisiae sgs1 helicase efficiently unwinds G-G paired DNAs. Nucleic Acids Res 27: 1978-1984, 1999.
OpenUrl Abstract/FREE Full Text

[204] Sun H,

[205] Bennett RJ,

[206] Maizels N

[207] ↵
Smith SS,
Crocitto L
: DNA methylation in eukaryotic chromosome stability revisited: DNA methyltransferase in the management of DNA conformation space. Mol Carcinog 26: 1-9, 1999.
OpenUrl CrossRef PubMed

[208] Smith SS,

[209] Crocitto L

[210] ↵
Rozen S,
Skaletsky H,
Marszalek JD,
Minx PJ,
Cordum HS,
Waterston RH,
Wilson RK,
Page DC
: Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature 423: 873-876, 2003.
OpenUrl CrossRef PubMed

[211] Rozen S,

[212] Skaletsky H,

[213] Marszalek JD,

[214] Minx PJ,

[215] Cordum HS,

[216] Waterston RH,

[217] Wilson RK,

[218] Page DC

[219] ↵
Bacolla A,
Larson JE,
Collins JR,
Li J,
Milosavljevic A,
Stenson PD,
Cooper DN,
Wells RD
: Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties. Genome Res 18: 1545-1553, 2008.
OpenUrl Abstract/FREE Full Text

[220] Bacolla A,

[221] Larson JE,

[222] Collins JR,

[223] Li J,

[224] Milosavljevic A,

[225] Stenson PD,

[226] Cooper DN,

[227] Wells RD

[228] ↵
Ohno S
: So Much ``Junk DNA'' in our Genome. Brookhaven Symposia in Bioogy 23: 366-370, 1972.
OpenUrl

[229] Ohno S

[230] ↵
Hanakahi LA,
Sun H,
Maizels N
: High affinity interactions of nucleolin with G-G-paired rDNA. J Biol Chem 274: 15908-15912, 1999.
OpenUrl Abstract/FREE Full Text

[231] Hanakahi LA,

[232] Sun H,

[233] Maizels N

[234] ↵
Smith SS
: Species-specific differences in tumorigenesis and senescence. Trends Genet 10: 305-306, 1994.
OpenUrl PubMed

[235] Smith SS

[236] ↵
Smith SS,
Laayoun A,
Lingeman RG,
Baker DJ,
Riley J
: Hypermethylation of telomere-like foldbacks at codon 12 of the human c-Ha-ras gene and the trinucleotide repeat of the FMR-1 gene of fragile X. J Mol Biol 243: 143-151, 1994.
OpenUrl CrossRef PubMed

[237] Smith SS,

[238] Laayoun A,

[239] Lingeman RG,

[240] Baker DJ,

[241] Riley J

[242] ↵
Siddiqui-Jain A,
Grand CL,
Bearss DJ,
Hurley LH
: Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc Natl Acad Sci USA 99: 11593-11598, 2002.
OpenUrl Abstract/FREE Full Text

[243] Siddiqui-Jain A,

[244] Grand CL,

[245] Bearss DJ,

[246] Hurley LH

[247] ↵
Belotserkovskii BP,
De Silva E,
Tornaletti S,
Wang G,
Vasquez KM,
Hanawalt PC
: A triplex-forming sequence from the human c-MYC promoter interferes with DNA transcription. J Biol Chem 282: 32433-32441, 2007.
OpenUrl Abstract/FREE Full Text

[248] Belotserkovskii BP,

[249] De Silva E,

[250] Tornaletti S,

[251] Wang G,

[252] Vasquez KM,

[253] Hanawalt PC

[254] ↵
Clark J,
Smith SS
: Secondary structure at a hot spot for DNA methylation in DNA from human breast cancers. Cancer Genomics Proteomics 5: 241-251, 2008.
OpenUrl Abstract/FREE Full Text

[255] Clark J,

[256] Smith SS

[257] ↵
Hazel P,
Huppert J,
Balasubramanian S,
Neidle S
: Loop-length-dependent folding of G-quadruplexes. J Am Chem Soc 126: 16405-16415, 2004.
OpenUrl CrossRef PubMed

[258] Hazel P,

[259] Huppert J,

[260] Balasubramanian S,

[261] Neidle S

[262] ↵
Raghavan SC,
Chastain P,
Lee JS,
Hegde BG,
Houston S,
Langen R,
Hsieh CL,
Haworth IS,
Lieber MR
: Evidence for a triplex DNA conformation at the bcl-2 major breakpoint region of the t(14;18) translocation. J Biol Chem 280: 22749-22760, 2005.
OpenUrl Abstract/FREE Full Text

[263] Raghavan SC,

[264] Chastain P,

[265] Lee JS,

[266] Hegde BG,

[267] Houston S,

[268] Langen R,

[269] Hsieh CL,

[270] Haworth IS,

[271] Lieber MR

[272] ↵
Todd AK,
Johnston M,
Neidle S
: Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res 33: 2901-2907, 2005.
OpenUrl Abstract/FREE Full Text

[273] Todd AK,

[274] Johnston M,

[275] Neidle S

[276] ↵
Liu Z,
Frantz JD,
Gilbert W,
Tye BK
: Identification and characterization of a nuclease activity specific for G4 tetrastranded DNA. Proc Natl Acad Sci USA 90: 3157-3161, 1993.
OpenUrl Abstract/FREE Full Text

[277] Liu Z,

[278] Frantz JD,

[279] Gilbert W,

[280] Tye BK

[281] ↵
Tishkoff DX,
Rockmill B,
Roeder GS,
Kolodner RD
: The sep1 mutant of Saccharomyces cerevisiae arrests in pachytene and is deficient in meiotic recombination. Genetics 139: 495-509, 1995.
OpenUrl PubMed

[282] Tishkoff DX,

[283] Rockmill B,

[284] Roeder GS,

[285] Kolodner RD

[286] ↵
Creacy SD,
Routh ED,
Iwamoto F,
Nagamine Y,
Akman SA,
Vaughn JP
: G4 resolvase 1 binds both DNA and RNA tetramolecular quadruplex with high affinity and is the major source of tetramolecular quadruplex G4-DNA and G4-RNA resolving activity in HeLa cell lysates. J Biol Chem 283: 34626-34634, 2008.
OpenUrl Abstract/FREE Full Text

[287] Creacy SD,

[288] Routh ED,

[289] Iwamoto F,

[290] Nagamine Y,

[291] Akman SA,

[292] Vaughn JP

[293] ↵
Cogoi S,
Quadrifoglio F,
Xodo LE
: G-rich oligonucleotide inhibits the binding of a nuclear protein to the Ki-ras promoter and strongly reduces cell growth in human carcinoma pancreatic cells. Biochemistry 43: 2512-2523, 2004.
OpenUrl CrossRef PubMed

[294] Cogoi S,

[295] Quadrifoglio F,

[296] Xodo LE

[297] ↵
Bejugam M,
Sewitz S,
Shirude PS,
Rodriguez R,
Shahid R,
Balasubramanian S
: Trisubstituted isoalloxazines as a new class of G-quadruplex binding ligands: small molecule regulation of c-kit oncogene expression. J Am Chem Soc 129: 12926-12927, 2007.
OpenUrl CrossRef PubMed

[298] Bejugam M,

[299] Sewitz S,

[300] Shirude PS,

[301] Rodriguez R,

[302] Shahid R,

[303] Balasubramanian S

[304] ↵
Catasti P,
Chen X,
Moyzis RK,
Bradbury EM,
Gupta G
: Structure-function correlations of the insulin-linked polymorphic region. J Mol Biol 264: 534-545, 1996.
OpenUrl CrossRef PubMed

[305] Catasti P,

[306] Chen X,

[307] Moyzis RK,

[308] Bradbury EM,

[309] Gupta G

[310] ↵
Sun D,
Guo K,
Rusche JJ,
Hurley LH
: Facilitation of a structural transition in the polypurine/polypyrimidine tract within the proximal promoter region of the human VEGF gene by the presence of potassium and G-quadruplex-interactive agents. Nucleic Acids Res 33: 6070-6080, 2005.
OpenUrl Abstract/FREE Full Text

[311] Sun D,

[312] Guo K,

[313] Rusche JJ,

[314] Hurley LH

[315] Simonsson T,
Pecinka P,
Kubista M
: DNA tetraplex formation in the control region of c-myc. Nucleic Acids Res 26: 1167-1172, 1998.
OpenUrl Abstract/FREE Full Text

[316] Simonsson T,

[317] Pecinka P,

[318] Kubista M

[319] Firulli AB,
Maibenco DC,
Kinniburgh AJ
: Triplex forming ability of a c-myc promoter element predicts promoter strength. Arch Biochem Biophys 310: 236-242, 1994.
OpenUrl CrossRef PubMed

[320] Firulli AB,

[321] Maibenco DC,

[322] Kinniburgh AJ

[323] Lew A,
Rutter WJ,
Kennedy GC
: Unusual DNA structure of the diabetes susceptibility locus IDDM2 and its effect on transcription by the insulin promoter factor Pur-1/MAZ. Proc Natl Acad Sci USA 97: 12508-12512, 2000.
OpenUrl Abstract/FREE Full Text

[324] Lew A,

[325] Rutter WJ,

[326] Kennedy GC

[327] Qin Y,
Rezler EM,
Gokhale V,
Sun D,
Hurley LH
: Characterization of the G-quadruplexes in the duplex nuclease hypersensitive element of the PDGF-A promoter and modulation of PDGF-A promoter activity by TMPyP4. Nucleic Acids Res 35: 7698-7713, 2007.
OpenUrl Abstract/FREE Full Text

[328] Qin Y,

[329] Rezler EM,

[330] Gokhale V,

[331] Sun D,

[332] Hurley LH

Main menu

User menu

Search

Evolutionary Expansion of Structurally Complex DNA Sequences

Abstract

Materials and Methods

Data collection

Results

Discussion

Acknowledgments

References

In this issue

Citation Manager Formats

Related Articles

Cited By...

Similar Articles

Main menu

User menu

Search

Evolutionary Expansion of Structurally Complex DNA Sequences

Abstract

Materials and Methods

Data collection

Results

Discussion

Acknowledgments

References

In this issue

Citation Manager Formats

Jump to section

Related Articles

Cited By...

Similar Articles