Skip to main content

Main menu

  • Home
  • Current Issue
  • Archive
  • Info for
    • Authors
    • Advertisers
    • Editorial Board
  • Other Publications
    • Anticancer Research
    • In Vivo
    • Cancer Diagnosis & Prognosis
  • More
    • IIAR
    • Conferences
  • About Us
    • General Policy
    • Contact
  • Other Publications
    • Cancer Genomics & Proteomics
    • Anticancer Research
    • In Vivo

User menu

  • Register
  • Subscribe
  • My alerts
  • Log in
  • My Cart

Search

  • Advanced search
Cancer Genomics & Proteomics
  • Other Publications
    • Cancer Genomics & Proteomics
    • Anticancer Research
    • In Vivo
  • Register
  • Subscribe
  • My alerts
  • Log in
  • My Cart
Cancer Genomics & Proteomics

Advanced Search

  • Home
  • Current Issue
  • Archive
  • Info for
    • Authors
    • Advertisers
    • Editorial Board
  • Other Publications
    • Anticancer Research
    • In Vivo
    • Cancer Diagnosis & Prognosis
  • More
    • IIAR
    • Conferences
  • About Us
    • General Policy
    • Contact
  • Visit iiar on Facebook
  • Follow us on Linkedin
Research Article

Position Dominant Sequence Elements in Experimentally Verified Human Promoters and their Putative Relation to Cancer

KONSTANTINOS VOUGAS, ATHINA SAMARA, GEORGE SPYROU and GEORGE TH. TSANGARIS
Cancer Genomics & Proteomics November 2009, 6 (6) 337-355;
KONSTANTINOS VOUGAS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
ATHINA SAMARA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
GEORGE SPYROU
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
GEORGE TH. TSANGARIS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

Abstract

Promoter regions of the human genome play a key role in our understanding of the regulatory mechanisms related to the physiological and disease states. The aim of this study was to investigate the sequence positional properties of experimentally verified human promoters. Consequently, we determined short sequence elements ranging from 4 to 9mers presenting position dominance close to, or away from the transcription start site (TSS). For this purpose rigid statistical criteria were used and whether position dominance was in any way related to transcription control was determined. To achieve this goal we designed and implemented a dedicated filtering method to massively detect position-dominant sequence elements embedded in the promoter set. Additionally, via a high throughput procedure, we gathered data on the majority of the publicly available transcription factor-binding sites (TFBSs) and matched them to our findings, aiming to accomplish a large-scale correlation between position-dominant sequence elements and TFBSs. In this analysis, we present unique compositional and conservational perturbations at the TSS and the core promoter region. Using our filtering method, 7,088 short sequences ranging from 4 to 9mers were found to present strong positional dominance close to or away from the TSS, while the aforementioned short sequences were matched to a large number of known TFBSs. Moreover, using probability theory, evidence is presented showing that TFBSs tend to present strong positional preferences. In addition, we demonstrate that the actual TFBS copy number is related to the transcription regulatory process. On the basis of the last argument, it is suggested that all the detected short sequences which did not match any known TFBS, have a high potential for being novel transcription control elements. Furthermore, using a well-described `high potential cancer biomarker resource', we attempted to identify position dominant sequence elements associated with cancer, as derived by their presence in the respective promoters of cancer related proteins.

  • Human promoters
  • sequence analysis
  • positional motifs

The human genome is a massive store of information, approximately 3 billion bases wide, which encodes the instructions for synthesis of the majority of the molecules that form each human cell and organise them into tissues and organs. The Human Genome Sequencing Project has provided highly accurate DNA sequences for all the human chromosomes. However, our understanding of the protein-coding regions of the genome is limited. Markedly even more limited is our understanding for non-protein-coding transcripts and genomic elements that regulate gene expression. In order to understand the way the human genome functions physiologically or pathologically and reacts to environmental signals, we need a clearer view of its core regulatory mechanisms. To shed light on such an issue, one needs to identify the genome control elements and define their localization patterns. Although the largest and most conserved control elements are readily identified, the vast majority of non-coding functional elements remain unknown (1).

The idea of sequence motif positional distribution analysis was developed recently, suggesting that several motifs occurring in natural sequences have strong positional preferences (2, 3). For example, it is acknowledged that in eukaryotes, the best-characterized core promoter elements consist of three control regions: a TATA-box located approximately 30 nucleotides upstream from the start site, an initiator element located at the transcription start site (TSS) and a downstream promoter element (DPE) located approximately 30 nucleotides downstream from the TSS. Various groups have worked towards this direction such as Xie et al. who created a catalogue of `common regulatory motifs' occurring many times in the human genome (4). Moreover, FitzGerald et al. looked for 8mers that cluster around the TSS and tried to see whether these sequences match to known transcription factor binding sites (TFBS) of 8 transcription factors, and the ENCODE pilot project attempted to identify functional elements in the human genome (5, 6).

However, regarding the detailed analysis of nucleotide composition and positional conservation combined with motif positional distribution around the TSS, further work is necessary. The goal of the present work was to investigate the sequence positional properties of experimentally verified human promoters and determine short sequence elements ranging from 4mers to 9mers; these elements would present with position dominant sequence elements (PDSEs) close to, or away from the TSS using rigid statistical criteria. This would enable us to determine whether position dominance is in any way related to transcriptional control.

Thus, we designed and implemented a dedicated filtering method to massively detect PDSEs embedded in the promoter set. Additionally, via a high throughput procedure, we gathered the majority of the publicly available TFBS and matched them to our findings, aiming to accomplish a large-scale correlation between PDSEs and TFBS.

The data gathered from this study (a large amount of publicly available TFBS, along with the detected PDSEs) form a unique resource for further investigation and querying. In order to facilitate its exploitation, we have already initiated the construction of a publicly available database accessible from the web to be included shortly in our database server (http://bioserver-1.bioacademy.gr).

Moreover, we attempted to utilize the data gathered and juxtapose against a well described `high potential cancer biomarker resource' (7). We endeavored to identify PDSEs associated to cancer, as derived by these PDSEs presence in the respective promoters of cancer-related proteins. We thus suggest potential targets for future promoter mutation analysis aiming to the further understanding of the disease state.

Materials and Methods

Dataset. One thousand, eight hundred and seventy-one human, experimentally verified, promoter sequences were retrieved from the European Promoter Database (EPD) (8). The maximum sequence for each promoter stored in the database was retrieved (9,999 bases upstream of the TSS and 6,000 bases downstream). All the sequences were stored in a single text file in FASTA format (9).

Calculation of the nucleotide content positional distribution. A composition matrix was constructed having four columns, one for each nucleotide and 16,000 rows, one for each promoter position. The composition of each nucleotide for every promoter position was calculated using the GeneR package (10); the dinucleotide composition was also calculated using GeneR. Based on the promoter mononucleotide compositional matrix the sample variance for each position was calculated (11).

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Logical diagram of the methodology used to select for statistically significant PDSEs.

PDSE discovery. Mapping sequences on EPD human promoters was accomplished using an in-house Perl script which recorded the positions of all the possible nmer sequences on each EPD human promoter, utilizing a sliding windows of `X' nucleotides (where `X' ranges from 4 to 9) and a slide step of 1. Following this, the R Language for Statistical Computing was used for constructing density histograms from the output of the previous procedure (12). We chose to construct the histograms using the density attribute instead of numbers of occurrence, which is the normalized frequency distribution, since the total area of the density histogram always equals 1.

Two approaches of the same methodology (Figure 1) were used for selecting sequence elements presenting significant non-random promoter localization. The first one, termed rough view, evaluates sequence density histograms ranging over the full promoter length (16000 bases long) using a bin size of 200 bases and the second one, termed `Fine View', evaluates density histograms ranging 500 bases upstream and 500 bases downstream of the TSS using a bin size of 20 bases.

In order to define objective dynamic criteria for determining randomness in sequence density histograms, 1,871 16-kbase long sequences were selected from randomly chosen positions of the human genome (NCBI reference assembly build 36, HGSC Finished Genome v4.0), summing a total of 29,936,000 bases, out of which 97,206 are non-identified nucleotides (marked as N). These sequences are stored in a text file following the FASTA schema (Supplementary Material). The task of the decoy data construction was accomplished using the R Language. More specifically, random sampling of the human genome was carried out using the output of a uniform distribution random number generator utilizing the Mersenne-Twister algorithm, for picking random human chromosomes and then picking random starting positions on these chromosomes (13). Manipulation of the human genome and recording of the random sequences using the FASTA schema was accomplished using the GeneR package of the Bioconductor Platform installed in the R Language (10, 12, 14).

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Nucleotide content(%) of the EPD human promoter dataset. The TSS (position 0) has been magnified for in-depth visualization of the compositional fluctuation around this locus. Detailed nucleotide composition for the whole length of the human promoters (16 kbs) can be found in Supplementary Table I. A: adenine; B: cytocine; C: thymine; D: Guanine.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Promoter sequence conservation visualization using variance and Logo figures. (A) Promoter variance. TSS is on position 0. The region from position –50 to position +50 is zoomed. (B) Sequence logo of the area –50 to +50 visualised using the web-based `Weblogo' service (33, 34). Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack is proportional to the information content at that position which is an indication of sequence conservation, is measured in bits and ranges from 0 to 2, 0 indicating a position with no conservation (9). The height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position. (C) Sequence logo of a region 5000 nucleotides upstream the TSS. (D) Sequence logo of a region 5000 nucleotides downstream the TSS. Areas labeled as 1, 2 and 3 on (A) (B) are equivalent areas around the TSS. (References used (33, 34))

The positions of all the possible nmer (4mer to 9mer) sequences were mapped on the decoy dataset using the aforementioned Perl script. Density histograms were created for each sequence and the (maximum density)/(mean density) termed as `S/N' was calculated and recorded along with the total number of instances of each corresponding sequence in the decoy dataset termed as `Counts'. This task was performed for `rough' as well as `fine' views. The previous procedure was repeated for all the nmers against the promoter dataset. The consequent step was to evaluate the PDSE validity. For each nmer sequence the following steps were carried out. The sequence's S/N and counts for the promoter dataset were retrieved. All the S/Ns from the decoy dataset were retrieved for all the entries having counts equal to promoter dataset counts ±25%. The standard error was calculated for the above dataset and the significance threshold was set as the maximum S/N plus the standard error. If the S/N of the particular sequence for the promoter dataset is greater than the significance threshold then this sequence is considered to present strong non-random dominance on the promoter dataset.

For sequences that had a large number of counts, the significance threshold is close to 1, this having as a result even the minor fluctuations of S/N to be considered significant, which as empirically determined, is not valid. For this situation to be circumvented, the minimum value of the significance threshold was empirically set to 1.6 for `Rough View' and 2.4 for `Fine View' by manually evaluating the 4mer set of sequences which present the largest number of counts.

The application of `fine view' has revealed sequences peaking significantly around the TSS which were missed by `rough view'. For an example the TATA 4mer sequence when examined under `rough view' presents a density histogram whose shape is that of a valley having its local minima around the TSS. This histogram was obviously rejected by the `rough view' filter. Nevertheless, when the same sequence is examined by the `fine view' filter, a clean peak near the TSS appears (Figure 4). These two approaches can be considered more or less like different lenses of a microscope, providing complementary information for the sample under examination. This methodology provides a rigid way of detecting dominance of sequence elements in genomic regions aligned against specific elements such as the TSS.

Aggregation of the majority of publicly available TFBS. Three sources were used for the collection of known transcription factor binding sites: (i) Transcriptional Regulatory Element Database (TRED), which is an integrated repository for both cis and trans regulatory elements in mammals (15). A total of 1,247 sequences ranging from 4mer to 95mers, present in human chromosomes were collected. All sequences contained nucleotides from the 4-letter code only (hence no ambiguity code). (ii) TRANSFAC 7.0 Public 2005, which is the Public version of the commercially available TRANSFAC Database which contains experimentally verified TFBS (16). At this point it should be noted that the sequence motifs are consensus motifs which are the result of positional weight matrices (PWMs). These motifs are termed using the ambiguity code. It is clear that for one to use the ambiguity code-bearing sequences in TRANSFAC, they must be analyzed to non-ambiguous sequences, hence 4-letter code sequences. This is a computationally intensive task, especially for long sequences with many ambiguous nucleotides. In order to make this task feasible, TRANSFAC motifs were searched for flanking Ns, i.e. sequences starting and/or ending with any number of the letter code N, which means any amino acid. It is clear that the sequence motif of interest is contained within the flanking Ns which cause the number of possible non-ambiguous sequences (when analyzed) to rise dramatically without adding to the information. For such a situation to be avoided, the flanking Ns were removed. A total of 211 sequence motifs detected in humans, were collected and processed. After processing, the motifs ranged from 4mers to 29mers. Only sequences from 4mers – 23mers were analyzed for all the possible non-ambiguous sequences. The sequences of length greater than 23 amino acids could not be analyzed due to limitation of computational resources. The analysis resulted in 40,651 non-ambiguous sequences. (iii) Literature available through PubMed and PMC was searched for TFBS not present in the two previous sources; 26 sequence motifs (both ambiguous and unambiguous) were collected. Analysis of the ambiguous motifs resulted in a total of 103 sequences ranging from 5mers to 9mers (5, 17-19). After merging all the sources and removing the redundancies, 41,042 sequences were identified (Supplementary Table III).

Calculating the non-randomness of PDSEs on the known TFBS. We needed to calculate the probability of two unrelated groups A and B (both belonging to the same sample space Ω) overlapping at various degrees. By applying the basic principles of probability theory we end up with the following equation:

Embedded Image

This equation allows us to calculate the probability of the PDSE and TFBS groups having ν common elements, for a specific nmer group, given that these two groups are totally unrelated.

Clustering the PDSEs. One matrix was constructed for each nmer (4 to 9mers) having rows equal to the selected PDSEs and columns equal to the number of different promoters examined. The content of each cell in the matrix is the number of times the specific sequence is observed in the specific promoter. The Pearson correlation coefficient was calculated for all the possible raw combination and a `distance matrix' was constructed (11). A dendrogram was created based on the distance matrix using the complete hierarchical clustering algorithm which calculates the maximum distance between elements of each cluster (20). This clustering algorithm was selected in order for the dendrogram construction to be as stringent as possible. Several authors have offered guidelines for the interpretation of a correlation coefficient. We follow Cohen's suggestion for correlation interpretation in psychological research (21). It must be noted that for the tree construction, the distance of each element based on the correlation was calculated as follows: distance=1 – correlation coefficient. Therefore 100% positively correlated elements have a distance=0 and 100% negatively correlated elements have a distance=2. The clustering procedure was carried out using the R – Language, and the `proxy' package installed on the R Language (22). Cancer related hypothesis driven data mining. Using a database of candidate biomarkers, which were reported as being differentially expressed in previous human cancer studies (7), the highest priority candidates were located within our human promoter dataset (Table I). We looked for the presence of 9mer PDSEs in the region ±500 around the TSS of each promoter of Table I. For each of the resulting 9mer PDSEs the promoters (of the original dataset) containing these 9mers were recorded in different data files for each sequence. In each of these files, the occurrence of those members of Table I was noted. The respective sequences were ranked according to the aforementioned number of occurrences. All the above tasks were carried out using R-Project for statistical computing and MySQL Server.

Results

Nucleotide content positional distribution. We analyzed the positional nucleotide content of human promoters included in the EPD on a large scale ranging from 6,000 bases downstream up to 10,000 bases upstream of the TSS (8) (Supplementary Table I). Beyond the well-known fact that the core promoter region is GC rich, our data present a considerable spike at the TSS area, indicating an aberrant fluctuation of the nucleotides content in that area (5). Detailed analysis in an area ranging from 50 nucleotides upstream to 50 nucleotides downstream of the TSS reveals that there are two areas where spikes are present. The first is the area around position 0 in which the TSS is located and the second one is the area around the 25th nucleotide upstream of the TSS, termed as position -25 (Figure 2 and Table II).

At the TSS, the positions ranging from 1 nucleotide upstream to 3 nucleotides downstream (termed as positions -1 to +3) present special interest because of their large deviations from the mean values. In a GC-rich environment, around the TSS, where G seems to be the dominant nucleotide (Table II and Figure 2B), position -1 is vastly dominated by C with a composition of 50.13% followed by T with a composition of 25%. At the TSS (position 0), there is an abrupt change in relation to the previous position. The dominant nucleotide at this position is A with a composition of 44.36%, followed by G with a composition of 33.78%, T being almost absent at a composition of 3.01%. The next three positions (+1 to +3) present interest because they have almost equal compositions for all four nucleotides unlike the previous two positions and unlike the nucleotide mean values (Table II). Identical compositional behavior is also noticed around position –25.

We then calculated the variance matrix based on compositional information of these promoters (See Materials & Methods). When a given position in the promoters is dominated by a specific nucleotide, indicating that this is a position of high conservation, the variance index for this position is close to 1. In contrast, when the composition of the four nucleotides at the given position is close to 25%, this indicates that this is a position of low conservation and the variance index for this position is close to 0. The variance index is interesting for both the TSS and position –25. At the TSS, a spike in variance is noted at positions –1 and 0 (TSS), indicating a high degree of sequence conservation at these positions (Figure 3A), followed by an abrupt drop at the next three positions (+1 to +3), indicative of almost no sequence conservation. Such a drop in the variance index, and therefore sequence conservation, is also noted around position -25, where the core promoter region is reported to reside (20). The variance index information is adherent to the information content index which is integrated into the visualization of sequence logo graphs (Figure 3A, 3B) (23).

PDSEs. Promoter sequence localization analysis for nmers was the next step of this study. Since transcription control elements shorter than 4 nucleotides were absent from all the public resources used in the current study we focused our analysis on all the possible 4mer up to 9mers sequences in order to detect PDSEs. The detection of PDSEs was carried out by the application of a dedicated detection filter we developed functioning in both rough view and fine view settings (See Materials & Methods) (Figure 1). Application of these filters resulted in a total of 7,088 4mers to 9mer PDSEs (Table III and Supplementary Table II). The PDSEs were screened for palindromic sequences, for complementary sequences which are sequences that are present along with their complementary counterparts in the PDSEs and for unique sequences which are sequences that are not present along with their complementary counterparts in the PDSEs (Table IV).

Characteristic PDSE density histograms for both rough view and fine view are shown in Figure 4 and the top 20 PDSEs from each nmer sorted against S/N and number of occurrences are tabulated in Table V.

The mononucleotide composition of all detected PDSEs ranging from 4mers to 9mers was analysed and was found to be clearly GC rich (Figure 5A). Furthermore, it is noted that while moving from 4mer to 9mer, the gap between A, T and C, G content increases. Exactly the same trend is maintained by dinucleotides (Figure 5B) where dinucleotides containing only G or C are well separated from all the rest and present and `uptrend' while moving from 4mer to 9mer sequences, in contradiction to the remaining ones which present a `downtrend'.

PDSE mapping on known TFBS. A total of 41,042 TFBS were aggregated from TRED, TRANSFAC 7.0 Public 2005, as well as from independent literature search, as described in Materials and Methods (Supplementary Table III). Initially, nmers characterised as PDSEs were mapped on the corresponding TFBS nmers (for n=4 to 9). Subsequently, the remaining unmatched nmers were mapped on the available TFBS of length 10 bases or more, a procedure termed as higher order matching. We observed that 4mer to 6mer PDSEs were matched to the total of TFBS sequences; 7mers were matched to 32,871, 8mers to 9,138 and 9mers to 1,999 TFBS sequences. After passing of the higher order matching, 2,404 sequence of 7mers to 9mers remained unmatched to any known TFBS (Table VI).

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Characteristic PDSE density histograms. Position 10000 on the density histograms is the TSS. (A) Density histogram in `Rough View' of an SP1 binding site peaking at 100 nucleotides upstream the TSS. (B) Density histogram in `Rough View' of a DPE binding site peaking at 300 nucleotides downstream the TSS. (C) Density histogram in `Rough View' of a TBP binding site (TATA box). The histogram forms a `pit' rather that a peak at the TSS. (D) Density histogram in `Fine View' of the previous TBP binding site. The peak 30 nucleotides upstream the TSS is now clear.

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

PDSE nucleotide content (%). A: Mononucleotide content; B: Dinucleotide content.

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Cluster dendrogram of 4mer PDSEs. Four groups (groups 1-4) are well separated as indicated on the figure. Characteristic density histograms from each group are displayed. Position 10000 on the density histograms is the TSS. CGTG and GCGA, which have been mapped as HIF1A and E2F+p107 binding sites respectively, belong to group 2 and are indicated on the figure with green boxes.

Figure 7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 7.

A: Box plots of the number of occurrences of the four PDSE groups as generated from the clustering dendrogram in the human promoter dataset. Each box is indicative of the standard deviation, the line inside the box is the mean value and the horizontal lines connected with dotted lines to the box are indicative of the range of the distribution. B: Mononucleotide content of the four groups.

In order to investigate whether a sequence which was simultaneously a TFBS and a PDSE was a random event, the probability of such a coincidence occurring randomly was calculated for each group of nmers. Suppose there is a total number of possible nmers and two groups of sequences are selected out of the sum based on two different criteria. The first criterion concerns sequences characterized as PDSEs on the total of the promoters studied and the second concerns sequences being documented as TFBSs. The probability of random occurrence for the observed overlaps between the several groups of nmers has been calculated and it is extremely low (Table VI), leading to the conclusion that a sequence element may indeed be both a PDSE and a TFBS. PDSE clustering. All same order sequences (e.g. all 5mers) were correlated to one another by correlating the number of instances to each promoter. Full PDSE clustering was performed only for 4mers (Figure 6). For the remaining nmers the dendrograms produced were over complicated because of the higher number of list inhabitants. For these nmers, only sequences which were directly matched to known TFBSs were analyzed with the clustering method (Supplementary Figures 1-5).

View this table:
  • View inline
  • View popup
Table I.

High priority cancer biomarker candidates in the human promoter subset of EPD; EPD accession number and gene name are provided for each of the candidates.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table II.

Nucleotide composition around the TSS. The mean is the mean composition percentage of 100 nucleotides around the TSS, while SD is the standard deviation of the sample from the mean value. TSS Comp is the nucleotide composition percentage of the TSS. Rows labelled as Pos-1 and Pos-25 contain the corresponding values of the position 1 nucleotide and 25 nucleotides upstream of the TSS respectively.

As far as the 4mers are concerned, four major groups are easily distinguishable, groups 1-4 (Figure 6). Group 4 is negatively correlated to groups 1 to 3 and is comprised only of sequences containing adenine and thymine e.g. (TATA). These sequences were picked as PDSEs using the fine view settings of our methodology contrary to those belonging to the first three groups which were picked using the rough view settings. These latter sequences present a high degree of correlation to one another. The major difference between the first three groups is the number of occurrences of each of their members in the human promoter dataset used. Group 1 seems to be the high, group 2 the medium and group 3 the low copy number group (Figure 7A).

In group 1, G is the dominant nucleotide with a content of 54%, A and T content is similar at 8.33% and 9.72% respectively while C has a content of 27.77%. In group 2, the composition of the four nucleotides reaches a perfect symmetry with A and T at 8.33% and G and C at 41.42%. In group 3, the four nucleotide compositions are uniform having composition of 25%. In group 4, adenine and thymine are the vastly dominant nucleotides with compositions of 55% and 40% respectively, while cytosine is barely present at a composition of 5% and guanine is totally absent.

Nine out of ten unique sequences belong to group 1, which seems to be the high copy number group. By looking more closely at the dendrogram, pairs of sequences that seem to be highly correlated most often form a 5mer through a 3-nucleotide overlap. For example, CCCG and CCGG can form the CCCGG 5mer by overlapping their common CCG 3-nucleotide. This 5mer is a straightforward and concrete indication that these 4mers are present in common promoters in a highly associative manner. To further elaborate on this fact, we present the following example. CCCG and CCGG are highly correlated and can form the 5mer CCCGG. On the contrary CCCG and GCCC are weakly correlated and can form the 5mer GCCCG. According to our hypothesis, the 5mer CCCGG should have a higher copy number (at least in relation to the copy numbers of its constituent 4mers) than the 5mer GCCCG. Indeed, CCCGG has a copy number of 25,399, originating from CCCG with a copy number of 67,101 and CCGG with a copy number of 59,347, while GCCCG has a lower copy number of 18,876, originating from CCCG with a copy number of 67,101 and GCCC with a copy number of 134,922, the later having a copy number almost double that of the other 4mers. The aforementioned case is verified if we look at other examples as well. From the above statements, we conclude that highly correlated sequences tend to appear at the same promoters in similar copy numbers. It must be noted that only two 4mer PDSEs were directly matched to TFBSs: CGTG and GCGA, which have been mapped as HIF1A and E2F+p107 binding sites, respectively, and have been found to reside in group 2 of the clustering dendrogram presenting a high degree of correlation to one another (Figure 6).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table III.

Number of PDSEs detected using the described methodology. The detailed results are tabulated on Supplementary Table II.

As far as the clustering of the remaining nmers is concerned, the extended results are given in the Supporting Material Section (Supplementary Figures 1-5). In brief, in 5mer PDSEs, only DPE motifs seem to be significantly correlated to one another, while 6mer PDSEs that matched to TFBSs present no significant correlation to one another. The 7mers CCCCTCC and GAGGGGA are moderately correlated, the latter being an MZF1 binding site. The 8mer CCGCCCCC, which is a JUN binding site, is moderately correlated to a group of highly correlated sequences (GCCCCGCC, CCCCGCCC, and CCCGCCCC) which are SP-1 binding sites. Finally, three groups of 9mer sequences seem to stand out from the rest, presenting a moderate correlation. The first group comprises GCGGGGGCG and GGGGCGGGG, which are a WT1-binding and a GC-box motif respectively; the second comprises CGCCCCCGC and CCCCGCCCC, which are WT11 and SP1-binding sites respectively; and the third comprises GCCAATGGG and AGCCAATGG, which are an AP-2a-binding site and a CCAAT motif respectively. However, WT1 and GC-box correlation is identical to the WT1 – SP-1 correlation since GC-boxes are actually SP-binding sites. We note here that the first two groups of 9mers are complementary to one another (the two WT1-binding sites and the two SP motifs are complementary to one another).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table IV.

PDSEs screened for complementary, unique and palindromic sequences. Detailed results are presented in Supplementary Table II.

Proteins whose promoters contain all the above correlated sequences 800 nucleotide around the TSS were assembled in lists and were searched against the names of the corresponding transcription factors in PubMed using PubMatrix 3.0 (24). This procedure resulted in 7 tables. Each one corresponds to each highly correlated pair of sequences, containing the gene products along with their EPD accession number and the number of references (hyperlinking to the actual results page) found in PubMed when each gene product was searched against each transcription factor (Supplementary. Table 5).

Mining for cancer associations. Using the database of candidate biomarkers (7), the highest priority candidates were located within our human promoter subset of EPD, as described in Table I. We explored the presence of 9mer PDSEs in the region ±500 around the TSS of each of the promoters we have enlisted in Table I. For each of the resulting 9mer PDSEs, the promoters of the actual dataset that contained these 9mers were recorded in different data files for each of the resulting sequences. In each of these files the occurrence of those in Table I was noted. The respective sequences were ranked according to the aforesaid number of occurrences.

Shortlisting our dataset, we further present, a list of the most frequently encountered PDSEs in promoters of high priority cancer biomarkers (Table VII). The total entries correspond to the sum of EPD promoters containing the sequence at a ±500 radius around the TSS. Additionally, in Table VII we provide the hit number, which is an indicator of the total entries that belong to the high priority cancer biomarker list.

Discussion

In this study, EPD was used as the source for human RNA polymerase II promoters for which the transcription start site has been determined experimentally. The reasons for choosing EPD are firstly because EPD is non-redundant and secondly because the annotation part of a promoter entry includes a description of the initiation site mapping data, exhaustive cross-references to the EMBL nucleotide sequence database, SWISS-PROT, TRANSFAC and other databases, as well as bibliographic references (8). Such an extensive annotation greatly facilitates the attribution of biological relevance to the findings of this study as well as any subsequent ones.

Detailed promoter compositional analysis is the first step towards the in depth comprehension of the basic transcription control mechanisms. Such an analysis has revealed two regions of interest in terms of sequence conservation located around the TSS: the TSS itself and the area around position –25. The greatest variance fluctuation is noted around the TSS. In position –1 and 0 there is an abrupt increase followed by an equally abrupt decrease at the next three positions (+1, +2, +3). The variance fluctuation around position -30 relative to the TSS (–23 –35) is of interest because of two contradicting facts. The first is the fluctuation itself, which is an indication that the specific region is a potential host for transcription regulation elements considering the fact that the only other position presenting such an abrupt variance fluctuation is the TSS itself. The second is that the direction of the fluctuation is opposite to that observed for the TSS. While variance around the TSS increases, we observe that around position –30 it drops abruptly towards 0 forming a pit, as seen in Figure 3. Low variance means simply that the degree of sequence conservation is low. This is backed by the compositional data which reveal that at this region, compositions of the four nucleotides tend towards 25%. These two facts seem to be in contradiction to one another since one would expect that a crucial segment would present a high degree of sequence conservation. Nevertheless this finding is backed by the literature.

View this table:
  • View inline
  • View popup
Table V.

A: Twenty PDSEs from each group, from 4mers to 9mers, with the largest S/Ns. B: Twenty PDSEs from each group, from 4mers to 9mers, with the largest recorded presence in the human promoter dataset. Total_Promoter_No is the number of promoters the sequence was recorded in. 1000D_Prompter_No is the number of promoters the sequence was recorded in within a diameter of 1,000 nucleotides around the TSS. Total_Copy_No is the number of occurrences of a sequence in the human promoter dataset. 1000D_Copy_No is the number of occurrences of a sequence in the human promoter dataset within a diameter of 1,000 nucleotides around the TSS. Spike Position is the promoter position where a peak is formed on the density histograms (position 10,000 on the histograms is the TSS).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table VI.

Overview of PDSE matching TFBS. Overview of direct and higher order matching of PDSEs to TFBSs along with probabilities of these matches being random events, as well as unmatched PDSEs. The detailed results are shown in Supplementary Table 4.1 direct matches, 4.2 higher order matches, and 4.3 unmatched.

The segment extending ∼35 bp upstream and/or downstream the TSS is called the core promoter (20). The core promoter includes elements such as the TATA-box, the brain and reproductive organ-expressed protein (BRE) binding site, the initiator (Inr) sequence and the downstream promoter element (DPE) which interact directly with components of the basal transcription machinery and are of course crucial for transcription. Most specifically, the segment presenting the abrupt variance fluctuation in our study (–23 to –35) is the host of the TATA-box and BRE elements (20).

Although core promoters for RNA polymerase II were originally thought to be invariant, they have been found to possess considerable structural and functional diversity (25, 26). Furthermore, it appears that core promoter diversity makes an important contribution to the combinatorial regulation of gene expression (27, 28).

Although not all promoters contain a TATA-box, it has been demonstrated that the –30 region has a profound influence on promoter strength even if it has little resemblance to the TATA consensus sequence (29, 30).

When we applied our dedicated PDSE detection filter, 7,087 PDSEs were detected from 4mers to 9mers. These PDSEs were screened for unique, complementary and palindromic sequences.

We have shown that the chance for PDSEs and TFBSs to be unrelated is extremely unlikely. This argument is experimentally supported by recent studies showing that TFBS distributions of 9 transcription factors all peaked within 300 bases upstream of the TSS (31). Taking into account the above argument, we can deduce that those members of the PDSE lists which have not been matched to any known TFBSs present a high potential of being novel TFBSs or transcription control elements. Given the fact that certain TFBSs tend to occur with an increased frequency close to the TSS, we suggest that it is possible that the actual TFBS or control element copy number plays a role in the whole transcription regulatory process.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table VII.

A list of PDSEs frequently encountered in promoters of high priority cancer biomarkers. For each sequence, the hit number and total entries are provided. Total entries correspond to the sum of EPD promoters containing the sequence at a ±500 radius around the TSS. The hit number indicates how many of the total entries belong to the high priority cancer biomarker list.

It is worth mentioning that a batch search of the unmatched sequences over PubMed and PubMed Central yielded almost no results. Most of these sequences present the localization preference close to the TSS (most of them slightly upstream and the remaining slightly downstream). Nevertheless, we found 17 sequences presenting their localization maxima from 6,500-6,300 bases upstream of the TSS (far upstream) and 17 sequences presenting their localization maxima from 3,000-6,000 bases downstream of the TSS (far downstream). Finally, we report the detection of six DPE sequences presenting their strongest localization immediately downstream the TSS as expected (Supplementary Table IV), with the exception of the sequence GGTCC which peaks quite interestingly 300 bases downstream of the TSS. At this point, it is worth mentioning that GGTCC and GGACC, which peak close to the TSS and are complementary to one another, present a very strong correlation in clustering analysis. We would like to specifically draw attention to the DPE sequences because as stated in the literature `downstream core promoter motifs occur frequently in Drosophila; however, their occurrence in humans remains to be seen' (4). Only two groups have reported the detection of DPE elements in two human promoters, the CD30 gene promoter and the FBN1 gene promoter (32, 33). Given the above facts, the DPE sequences we detected and report can be considered as novel.

The scope of clustering analysis was firstly to present a rigid methodology capable of grouping together sequences which present similar localization patterns and secondly to detect TFBSs presenting a high degree of correlation to one another. A high degree of correlation can be readily translated to a high degree of co-occurrence, combined with a high degree of co-fluctuation, given the fact that Pearson correlation coefficient is calculated. One could argue that transcription factors, whose binding sites present a high or even moderate correlation to one another, are candidates for gene co-regulation effects, either synergistic or antagonistic. It is of interest to observe which TFBSs tend to come together in our dataset because these transcription factors could be a part of a specific transcription regulatory circuitry. Furthermore, the genes regulated by such a circuitry may belong to a group or groups with specific characteristics e.g. tissue-specific.

It is quite interesting to note that if we gather all the numbers of occurrence of a given sequence on all promoters and co-examine them against the respective sets of other sequences using methodologies such as clustering, valuable information can be mined. For instance, group 4 seems not only to be well separated from the rest of the sequences, but also negatively correlated to them. This finding is also confirmed if we examine the density distributions of the corresponding sequences in rough view. Density histograms in group 4 form a pit at the TSS while all the density histograms of the remaining sequences form a peak at that locus. Since such information was obtained by examining solely the copy number of the sequences across the promoters, this indicates that the copy number alone is possibly the bearer of biologically relevant information, which is in accordance with our prior suggestion that the actual TFBS or control element copy number plays a discrete role in the whole transcription regulatory process.

However, in silico studies need to provide data that may be counter verified by in vitro and in vivo studies. In this work, we present novel putative transcription cis-acting elements that may occur as an entity or as an overlapping region to cancer-related known cis-acting elements.

In Table VII, we presented a short list of the most frequently encountered PDSEs in promoters of high priority cancer biomarkers. We searched against PubMed in order to locate and hopefully correlate these PDSEs to functional elements, associated with cancer or tumourigenesis. Several of these sequences were found to be crucial for the cancer process. More specifically, the sequence CCCCGCCCC (34), a known Sp-1 transcription factor-binding motif was identified as a major player in the regulatory pathway of ornithine decarboxylase, an enzyme which is critical among others for carcinogenesis (35). Quite interestingly our clustering analysis revealed that this Sp-1-binding site is highly correlated to CGCCCCCGC, which is a WT1-binding site. It is worth mentioning that while WT1 is a transcription factor important for normal cellular development and cell survival, its role in cancer is quite ambiguous. While initially identified as a tumor suppressor gene, this view is not in keeping with the frequent finding of wild-type, full-length WT1 in human leukemia, breast cancer and several other types of cancer including the majority of Wilms' tumors (36). Moreover, the sequence GGGGCGGGG, which is complementary to CCCCGCCCC and also considered an SP-binding site, was described as being a part of the GC-rich region of the HPV16 E6 oncoprotein which stimulates the transforming growth factor-beta 1 promoter in fibroblasts (37). Finally Li et al. (35) recognized CCCCTCCCC as a “new protein-binding motif” which we have also identified in Table VII. Furthermore, another study (38) identified the above sequence as instability hotspot region in endometrial carcinomas, located in the hypervariable regions I and II of the D-loop and 12S rRNA gene.

Conclusion

Evidently, these findings corroborate that in silico studies are a valuable tool to mine promoter-focused research, as they have proven to be successful in the identification of key player elements in the regulation of important genes in cancer progression. It must be noted that the full version of Table VII, available at http://bioserver-1.bioacademy.gr/DataRepository/Project_cgp_1/, consists of a great number of PDSEs for which our PubMed searches were inconclusive. Undoubtedly, further work in vitro needs to be carried out to provide solid associations of these PDSEs with trans-acting elements and cellular functions. We thus suggest that these potentially novel cis-acting elements will provide a valuable meta-analysis tool based on site-directed mutagenesis studies.

Acknowledgments

The supplementary tables 1-5, the full version of Table VII, the human promoter dataset, the related decoy dataset in fasta format and the supplementary figures 1-5 are available at: http://bioserver-1.bioacademy.gr/DataRepository/Project_cgp_1/

  • Received September 28, 2009.
  • Revision received November 9, 2009.
  • Accepted November 15, 2009.
  • Copyright© 2009 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved

References

  1. ↵
    1. Bejerano G,
    2. Pheasant M,
    3. Makunin I,
    4. Stephen S,
    5. Kent WJ,
    6. Mattick JS,
    7. Haussler D
    : Ultraconserved elements in the human genome. Science 304: 1321-1325, 2004.
    OpenUrlAbstract/FREE Full Text
  2. ↵
    1. Li N,
    2. Tompa M
    : Analysis of computational approaches for motif discovery. Algorithms Mol Biol 1: 8, 2006.
    OpenUrlCrossRefPubMed
  3. ↵
    1. Hudges JD,
    2. Estep PW,
    3. Tavazoie S,
    4. Church GM
    : Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296: 1205-1214, 2000.
    OpenUrlCrossRefPubMed
  4. ↵
    1. Xie X,
    2. Lu J,
    3. Kulbokas EJ,
    4. Golub TR,
    5. Mootha V,
    6. Lindblad-Toh K,
    7. Lander ES,
    8. Kellis M
    : Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434: 338-345, 2005.
    OpenUrlCrossRefPubMed
  5. ↵
    1. FitzGerald PC,
    2. Shlyakhtenko A,
    3. Mir AA,
    4. Vinson C
    : Clustering of DNA sequences in human promoters. Genome Res 14: 1562-1574, 2004.
    OpenUrlAbstract/FREE Full Text
  6. ↵
    1. ENCODE Project Consortium
    : Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799-816, 2007.
    OpenUrlCrossRefPubMed
  7. ↵
    1. Polanski M,
    2. Anderson NL
    : A list of candidate cancer biomarkers for targeted proteomics. Biomark Insights 1: 1-48, 2007.
    OpenUrlPubMed
  8. ↵
    1. Cavin Périer R,
    2. Junier T,
    3. Bucher P
    : The Eukaryotic Promoter Database EPD. Nucleic Acids Res 26: 353-357, 1998.
    OpenUrlAbstract/FREE Full Text
  9. ↵
    FASTA format description. URL: http://www.ncbi.nlm.nih.gov/blast/fasta.shtml
  10. ↵
    1. Cottret L,
    2. Lucas A,
    3. Marrakchi E,
    4. Rogier O,
    5. Lefort V,
    6. Durosay P,
    7. Viari A,
    8. Thermes C,
    9. d'Aubenton-Carafa Y
    : GeneR: R for genes and sequences analysis. R package version 2.8.0. http://www.cgm.cnrs-gif.fr, 2006.
  11. ↵
    1. Moore D,
    2. McCabe GP
    : Basic Practice of Statistics. 4th ed, pp. 90-114. WH Freeman Company, New York, 2006.
  12. ↵
    1. R Development Core Team
    : R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org, 2008.
  13. ↵
    1. Matsumoto M,
    2. Nishimura T
    : Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transact. Model Comp Simul 8: 3-30, 1998.
    OpenUrl
  14. ↵
    1. Gentleman RC,
    2. Carey VJ,
    3. Bates DM,
    4. Bolstad B,
    5. Dettling M,
    6. Dudoit S,
    7. Ellis B,
    8. Gautier L,
    9. Ge Y,
    10. Gentry J,
    11. Hornik K,
    12. Hothorn T,
    13. Huber W,
    14. Iacus S,
    15. Irizarry R,
    16. Leisch F,
    17. Li C,
    18. Maechler M,
    19. Rossini AJ,
    20. Sawitzki G,
    21. Smith C,
    22. Smyth G,
    23. Tierney L,
    24. Yang JY,
    25. Zhang J
    : Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol 5: R80, 2004.
    OpenUrlCrossRefPubMed
  15. ↵
    1. Jiang C,
    2. Xuan Z,
    3. Zhao F,
    4. Zhang MQ
    : TRED: a transcriptional regulatory element database, new entries and other development. Nucleic Acids Res 35: D137-140, 2007.
    OpenUrlAbstract/FREE Full Text
  16. ↵
    1. Wingender E,
    2. Dietze P,
    3. Karas H,
    4. Knüppel R
    : TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res 24: 38-241, 1996.
    OpenUrl
  17. ↵
    1. Smale ST,
    2. Kadonaga JT
    : The RNA Polymerase II Core Promoter. Annu Rev Biochem 72: 449-479, 2003.
    OpenUrlCrossRefPubMed
    1. Gershenzon NI,
    2. Stormo GD,
    3. Ioshikhes IP
    : Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites. Nucleic Acids Res 33: 2290-2301, 2005.
    OpenUrlAbstract/FREE Full Text
  18. ↵
    1. Casimiro AC,
    2. Vinga S,
    3. Freitas AT,
    4. Oliveira AL
    : An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance. BMC Bioinformatics 9: 89, 2008.
    OpenUrlCrossRefPubMed
  19. ↵
    1. Härdle W,
    2. Simar L
    : Applied Multivariate Statistical Analysis. Berlin, Springer Verlag, 2007.
  20. ↵
    1. Cohen J
    : Statistical power analysis for the behavioral sciences. 2nd edition. Lawrence Erlbaum Associates, Hillsdale, NJ, 1988.
  21. ↵
    1. Meyer D,
    2. Buchta C
    : proxy: Distance and Similarity Measures. R package version 0.3, 2007.
  22. ↵
    1. Schneider TD,
    2. Stormo GD,
    3. Gold L,
    4. Ehrenfeucht A
    : Information content of binding sites on nucleotide sequences. J Mol Biol 188: 415-431, 1986.
    OpenUrlCrossRefPubMed
  23. ↵
    1. Becker KG,
    2. Hosack DA,
    3. Dennis G Jr.,
    4. Lempicki RA,
    5. Bright TJ,
    6. Cheadle C,
    7. Engel J
    : PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics 4: 61, 2003.
    OpenUrlCrossRefPubMed
  24. ↵
    1. Smale ST
    : Transcription: Mechanisms and Regulation. Conaway RC and Conaway JW (eds.). pp. 63-81. Raven, New York, 1994.
  25. ↵
    1. Butler JE,
    2. Kadonaga JT
    : The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev 16: 2583-2592, 2002.
    OpenUrlFREE Full Text
  26. ↵
    1. Smale ST
    : Core promoters: active contributors to combinatorial gene regulation. Genes Dev 15: 2503-2508, 2001.
    OpenUrlFREE Full Text
  27. ↵
    1. Zenzie-Gregory B,
    2. Khachi A,
    3. Garraway IP,
    4. Smale ST
    : Mechanism of initiator-mediated transcription: evidence for a functional interaction between the TATA-binding protein and DNA in the absence of a specific recognition sequence. Mol Cell Biol 13: 3841-3849, 1993.
    OpenUrlAbstract/FREE Full Text
  28. ↵
    1. Martinez E,
    2. Zhou Q,
    3. L'Etoile ND,
    4. Oelgeschläger T,
    5. Berk AJ,
    6. Roeder RG
    : Core promoter-specific function of a mutant transcription factor TFIID defective in TATA-box binding. Proc Natl Acad Sci USA 92: 11864-11868, 1995.
    OpenUrlAbstract/FREE Full Text
  29. ↵
    1. Koudritsky M,
    2. Domany E
    : Positional distribution of human transcription factor binding sites. Nucl Acids Res 36: 6795-6805, 2008.
    OpenUrlAbstract/FREE Full Text
  30. ↵
    1. Franchina M,
    2. Woo AJ,
    3. Dods J,
    4. Karimi M,
    5. Ho D,
    6. Watanabe T,
    7. Spagnolo DV,
    8. Abraham LJ
    : The CD30 gene promoter microsatellite binds transcription factor Yin Yang 1 (YY1) and shows genetic instability in anaplastic large cell lymphoma. J Pathol 214: 65-74, 2008.
    OpenUrlPubMed
  31. ↵
    1. Guo G,
    2. Bauer S,
    3. Hecht J,
    4. Schulz MH,
    5. Busche A,
    6. Robinson PN
    : A short ultraconserved sequence drives transcription from an alternate FBN1 promoter. Int J Biochem Cell Biol 40: 638-650, 2008.
    OpenUrlCrossRefPubMed
  32. ↵
    1. Schneider TD,
    2. Stephens RR
    : Sequence Logos: A New Way to Display Consensus Sequences. Nucleic Acid Res 18: 6097-6100, 1990.
    OpenUrlAbstract/FREE Full Text
  33. ↵
    1. Crooks GE,
    2. Hon G,
    3. Chandonia JM,
    4. Brenner SE
    : WebLogo: A sequence logo generator. Genome Res 14: 1188-1190, 2004.
    OpenUrlAbstract/FREE Full Text
  34. ↵
    1. Li RS,
    2. Abrahamsen MS,
    3. Johnson RR,
    4. Morris DR
    : Complex interactions at a GC-rich domain regulate cell type-dependent activity of the ornithine decarboxylase promoter. J Biol Chem 269: 7941-7949, 1994.
    OpenUrlAbstract/FREE Full Text
  35. ↵
    1. Yang L,
    2. Han Y,
    3. Suarez Saiz F,
    4. Minden MD
    : A tumor suppressor and oncogene: the WT1 story. Leukemia 21: 868-876, 2007.
    OpenUrlPubMed
  36. ↵
    1. Dey A,
    2. Atcha IA,
    3. Bagchi S
    : HPV16 E6 oncoprotein stimulates the transforming growth factor-beta 1 promoter in fibroblasts through a specific GC-rich sequence. Virology 228: 190-199, 1997.
    OpenUrlPubMed
  37. ↵
    1. Liu VW,
    2. Yang HJ,
    3. Wang Y,
    4. Tsang PC,
    5. Cheung AN,
    6. Chiu PM,
    7. Ng TY,
    8. Wong LC,
    9. Nagley P,
    10. Ngan HY
    : High frequency of mitochondrial genome instability in human endometrial carcinomas. Br J Cancer 89: 697-701, 2003.
    OpenUrlCrossRefPubMed
PreviousNext
Back to top

In this issue

Cancer Genomics - Proteomics: 6 (6)
Cancer Genomics & Proteomics
Vol. 6, Issue 6
November-December 2009
  • Table of Contents
  • Table of Contents (PDF)
  • Index by author
  • Back Matter (PDF)
  • Front Matter (PDF)
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Cancer Genomics & Proteomics.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Position Dominant Sequence Elements in Experimentally Verified Human Promoters and their Putative Relation to Cancer
(Your Name) has sent you a message from Cancer Genomics & Proteomics
(Your Name) thought you would like to see the Cancer Genomics & Proteomics web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
1 + 6 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Citation Tools
Position Dominant Sequence Elements in Experimentally Verified Human Promoters and their Putative Relation to Cancer
KONSTANTINOS VOUGAS, ATHINA SAMARA, GEORGE SPYROU, GEORGE TH. TSANGARIS
Cancer Genomics & Proteomics Nov 2009, 6 (6) 337-355;

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Reprints and Permissions
Share
Position Dominant Sequence Elements in Experimentally Verified Human Promoters and their Putative Relation to Cancer
KONSTANTINOS VOUGAS, ATHINA SAMARA, GEORGE SPYROU, GEORGE TH. TSANGARIS
Cancer Genomics & Proteomics Nov 2009, 6 (6) 337-355;
del.icio.us logo Digg logo Reddit logo Twitter logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Materials and Methods
    • Results
    • Discussion
    • Conclusion
    • Acknowledgments
    • References
  • Figures & Data
  • Info & Metrics
  • PDF

Related Articles

  • No related articles found.
  • Google Scholar

Cited By...

  • No citing articles found.
  • Google Scholar

Similar Articles

Cancer & Genome Proteomics

© 2022 Cancer Genomics & Proteomics

Powered by HighWire