Elsevier

Genomics

Volume 83, Issue 4, April 2004, Pages 727-734
Genomics

TAFA: a novel secreted family with conserved cysteine residues and restricted expression in the brain

https://doi.org/10.1016/j.ygeno.2003.10.006Get rights and content

Abstract

We have discovered a family of small secreted proteins in Homo sapiens and Mus musculus using a novel database searching strategy. The family is composed of five highly homologous genes referred to as TAFA-1 to -5. The TAFA genes encode proteins of approximately 100 amino acids that contain conserved cysteine residues at fixed positions. TAFA-1 to -4 are more closely related to each other than to TAFA-5, in which a conserved motif including CC in TAFA-1 to -4 is not present. In H. sapiens, TAFA-3 has two isoforms formed by alternative splicing. Sequence homology analyses reveal that TAFA proteins appear distantly related to MIP-1α, a member of the CC-chemokine family. TAFA mRNAs are highly expressed in specific brain regions, with little expression seen in other tissues.

Section snippets

TAFA genes encode small secreted proteins

Our search strategy uses as a starting point EST/cDNA sequences, to which we applied an assembly algorithm that generates sequence contigs [9]. A novel putative gene was defined as a collection of connected contigs with no significant BLAST homology to any known genes. We then clustered these genes by protein sequence similarity using a TBLASTX-based algorithm [2]. Putative coding regions were defined by the TBLASTX alignments. The clusters of candidate proteins were then evaluated for the

Discussion

We have applied a novel strategy to search databases for novel gene families comprising small secreted proteins based upon the following premises: genomic and EST (public and private) databases were searched for transcripts that (1) have no significant homology to known sequences, (2) cluster into multigene families, (3) encode predicted signal sequences but lack transmembrane domains, and (4) have orthologs in other species. Using this method we have identified a new gene family named TAFA.

Sequence assembly and gene discovery

DNA sequences were assembled from internal Nuvelo EST sequences and from sequences available through the NCBI. Similar methods have been used previously by others to estimate the number of independent genes within the human genome [14]. Chromatograms from cDNA clones were obtained using dideoxy sequencing and resolution on ABI 377/3700 sequencers or by downloading chromatograms from the public domain dbEST database. Phred was used for base-calling and to assign quality scores [15], [16]. The

References (40)

  • R Hughey et al.

    Hidden Markov models for sequence analysis: extension and analysis of the basic method

    CABIOS

    (1996)
  • C.R Mackay

    Chemokines: immunology's high impact factors

    Nature Immunol.

    (2001)
  • V Krishna et al.

    The Chemokine Facts Book

    (1997)
  • J Tillinghast et al.

    Clustering and assembly of a large number of EST and cDNA sequences using Hyseq's proprietary software

  • H Nielsen et al.

    A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites

    Int. J. Neural Syst.

    (1997)
  • J Thompson et al.

    CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice

    Nucleic Acids Res.

    (1994)
  • L Brodsky et al.

    GeneBee-NET: internet-based server for analyzing biopolymers structure

    Biochemistry

    (1995)
  • L.I. Brodsky, TreeTop—phylogenetic tree prediction:...
  • B Ewing et al.

    Analysis of expressed sequence tags indicates 35,000 human genes

    Nat. Genet.

    (2000)
  • B Ewing et al.

    Base-calling of automated sequencer traces using phred. I. Accuracy assessment

    Genome Res.

    (1998)
  • Cited by (0)

    View full text