Designed Armadillo Repeat Proteins as General Peptide-Binding Scaffolds: Consensus Design and Computational Optimization of the Hydrophobic Core

https://doi.org/10.1016/j.jmb.2007.12.014Get rights and content

Abstract

Armadillo repeat proteins are abundant eukaryotic proteins involved in several cellular processes, including signaling, transport, and cytoskeletal regulation. They are characterized by an armadillo domain, composed of tandem armadillo repeats of approximately 42 amino acids, which mediates interactions with peptides or parts of proteins in extended conformation. The conserved binding mode of the peptide in extended form, observed for different targets, makes armadillo repeat proteins attractive candidates for the generation of modular peptide-binding scaffolds. Taking advantage of the large number of repeat sequences available, a consensus-based approach combined with a force field-based optimization of the hydrophobic core was used to derive soluble, highly expressed, stable, monomeric designed proteins with improved characteristics compared to natural armadillo proteins. These sequences constitute the starting point for the generation of designed armadillo repeat protein libraries for the selection of peptide binders, exploiting their modular structure and their conserved binding mode.

Introduction

In recent years, as an alternative to raising monoclonal antibodies by immunization, recombinant antibodies1 and an increasing number of other protein scaffolds2 have been investigated as novel binding molecules. However, neither antibodies themselves nor any of these alternative protein scaffolds were specifically designed to bind peptides. Target-specific binding molecules are, in general, obtained from large protein libraries by in vitro selection or, in the case of monoclonal antibodies, through traditional immunization procedures. Both approaches require that, for each target, each new binding molecule is individually generated and characterized for specificity and cross-reactivity, making the generation of binders against a large number of peptide targets (e.g., representing a full proteome) an almost prohibitive task.

The aim of the present study was to develop a scaffold for the generation of peptide-specific binding proteins. In more detail, we wanted to develop proteins that were stable under various conditions and with the intrinsic ability to bind peptides in a conserved fashion. To recognize peptides in a sequence-selective manner the specificity of binding should ideally be conferred through specific interactions with the peptide side chains.

Natural peptide-binding scaffolds can be grouped in different classes. Antibodies are known to be able to bind peptides and have been well characterized.3, 4, 5, 6 Although peptide-binding antibodies have certain structural features in common, the mode of binding is not conserved. Thus, the information acquired through studies of antibody–peptide complexes cannot easily be applied to the general design of peptide-binding antibodies or extended to other proteins.

Small adaptor domains (e.g., SH2, SH3, and PDZ)7 show specific binding to their targets, usually in a conserved binding mode within one family, but their affinity is generally low. The recognition sequence is very short and biased toward certain amino acid types, posttranslational modifications, or free N- or C-termini. While several such domains could be linked together by flexible peptides to recognize longer peptide sequences, a coverage of any arbitrary sequence would still be very difficult since these small domains might not be adaptable to the recognition of any arbitrary sequence. Furthermore, the entropy loss upon binding of such flexibly linked constructs would not necessarily lead to high affinities.

The major histocompatibility complex proteins (MHC I and MHC II)8 possess a higher intrinsic variability and the ability to recognize a broad range of peptides, but the difficulties in their handling reduce their attractiveness as a scaffold candidate.

Repeat proteins, in particular tetratricopeptide repeats (TPRs),9 armadillo,10 and WD4011 proteins, have been shown to possess an intrinsic ability to bind peptides, taking advantage of their repetitive structure. Thus, for our purpose, a scaffold based on repeat proteins seemed to constitute a promising candidate. For reasons outlined below, we chose the armadillo repeat protein family as the basis for our scaffold candidate.

Armadillo repeat proteins12,13 are abundant in eukaryotes, where they are involved in a broad range of biological processes (e.g., transcription regulation,14 cell adhesion,15 tumor suppressor activity,16 and nucleocytoplasmic transport17). These proteins are characterized by tandem repeats of approximately 42 amino acids that were first discovered in the product of the Drosophila melanogaster segmentation polarity gene Armadillo, which is homologous to mammalian β-catenin.18,19 Armadillo repeat proteins participate in protein–protein interactions, and the armadillo domain is usually involved in the recognition process. The domain forms a right-handed superhelix20,21 (Fig. 1a), as shown by the crystal structures of β-catenin22 and importin-α.23 Every repeat is composed of three α-helices, named H1, H2, and H3 (Fig. 1b), and several repeats stack to form the compact domain. Specialized repeats are present at the N- and C-termini of the protein, protecting the hydrophobic core from solvent exposure (Fig. 1a).

Armadillo repeat proteins are able to bind different types of peptides, yet relying on a conserved binding mode of the peptide backbone. Reported dissociation constant (Kd) values as low as 10–20 nM24 indicate that high affinities can be achieved. Crystal structures of armadillo repeat proteins in complex with bound peptides have revealed that most peptide targets are bound in an extended conformation along the surface, inside the groove formed by the H3 helices. The superhelical armadillo domain winds around the peptide, oriented in the opposite N- to C-terminal direction (Fig. 1a), thus forming a double-helical complex, topologically similar to the DNA double strand. An asparagine residue, conserved in almost every repeat at the C-terminal part of H3, makes hydrogen bonds to the main chain of the target peptide, thereby keeping it in an extended conformation. Additional interactions to the target side chains are provided by neighboring residues, mostly in H3 (Fig. 1c). In a first approximation, each dipeptide unit of the target peptide is specifically recognized by one repeat in the armadillo domain (Fig. 2a).

In theory, the possibility of developing individual repeats that specifically bind a two-amino-acid sequence unit is very attractive. Given that the individual repeats are based on the same optimized scaffold and, thus, compatible with each other, any given number of repeats can be directly stacked to extend the recognition to much longer peptide sequences. In contrast to flexibly linked small adaptor domains mentioned above, armadillo repeats directly stack on each other in a rather rigid manner, allowing binding to uninterrupted longer peptides. This would exploit the specificity of the individual repeats to provide a peptide-binding designed armadillo protein with high and predetermined specificity, governed by the individual repeats. Such an approach (Fig. 2b), using armadillo proteins assembled from previously selected “building blocks,” could effectively bypass the current in vitro selection procedures for individual peptides. However, this requires such individual peptide-specific repeats first to be developed, using a library-based approach.

In the present study, we have, as a first step, designed armadillo repeat modules based on consensus sequences. Proteins containing different types of modules have been assembled and characterized, initially only leading to stable dimeric proteins or monomeric molten-globule-like proteins. We subsequently used a combination of molecular dynamics and minimization to improve the hydrophobic core packing and convert the consensus-designed armadillo repeat protein with molten-globule-like properties to a monomeric, stable folded protein. Finally, the protein characteristics were evaluated for exploring the possibility of generating a modular peptide-binding scaffold. We succeeded in developing a stable, monomeric consensus protein that can be used now in the generation of peptide-specific individual armadillo repeat proteins.

Section snippets

Armadillo repeat protein design

A consensus design strategy25 has been applied in order to generate armadillo repeat proteins with high expression levels of soluble protein in Escherichia coli, monomeric state, high thermodynamic stability, and absence of cysteines for convenient expression and handling.

This design procedure was aimed at the generation of self-compatible repeat modules; therefore, consensus sequences were derived from multiple alignments of single armadillo repeats from the Swiss-Prot database.26 A consensus

Consensus design

Consensus design has been successfully applied in this work to generate designed armadillo repeat proteins. Similar to leucine-rich repeat proteins,30 but in contrast to ankyrin repeat proteins27 and tetratricopeptide repeats,29 different subfamilies can be clearly defined in the case of armadillo repeat proteins, based on sequences and available structures. Out of 42 signature positions, 12 are characteristic for armadillo repeats, but the conservation at other positions is relatively low.32

Conclusions

This work focused on the generation of designed armadillo repeat proteins for the construction of a general modular peptide-binding scaffold. An initial consensus-based design led to well-expressed and stable but dimeric proteins or molten globules. A stable, well-expressed monomeric protein was obtained using a force field-based approach for the stabilization of the hydrophobic core of the molten globule variant.

In a library perspective, a monomeric protein allows a better evaluation of the

Sequence analysis and modeling

SMART†,33,34 Swiss-Prot‡,26 and PDB§62 were used as the starting databases for our analysis. GCG (Wisconsin Package Version 10.3, Accelrys Inc., San Diego, CA), BLAST∥,63,64 and ClustalW¶35 were used for sequence retrieval and alignment. Structure analysis and modeling were performed with Swiss-Pdb Viewera,65 MOLMOLb,66

Acknowledgements

The authors want to thank W.I. Weis, M. Köhler, and E. Conti for kindly providing the plasmids containing the natural armadillo repeat protein genes. We thank Dr. P. Kolb for valuable suggestions, Dr. A. Honegger for EXCEL macros, and the other members of the Plückthun laboratory for fruitful discussions. The calculations were performed on Matterhorn, a Beowulf Linux cluster at the Informatikdienste of the University of Zürich. We thank C. Bolliger, Dr. T. Steenbock, and Dr. A. Godknecht for

References (75)

  • E. Conti et al.

    Crystallographic analysis of the recognition of a nuclear localization signal by the nuclear import factor karyopherin alpha

    Cell

    (1998)
  • B. Catimel et al.

    Biophysical characterization of interactions involving importin-alpha during nuclear import

    J. Biol. Chem.

    (2001)
  • P. Forrer et al.

    A novel strategy to design binding molecules harnessing the modular nature of repeat proteins

    FEBS Lett.

    (2003)
  • H.K. Binz et al.

    Designing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins

    J. Mol. Biol.

    (2003)
  • E.R. Main et al.

    Design of stable alpha-helical arrays from an idealized TPR motif

    Structure

    (2003)
  • M.T. Stumpp et al.

    Designing repeat proteins: modular leucine-rich repeat protein libraries based on the mammalian ribonuclease inhibitor family

    J. Mol. Biol.

    (2003)
  • G. Interlandi et al.

    Characterization and further stabilization of designed ankyrin repeat proteins by combining molecular dynamics simulations and experiments

    J. Mol. Biol.

    (2008)
  • M.A. Andrade et al.

    Comparison of ARM and HEAT protein repeats

    J. Mol. Biol.

    (2001)
  • A. Lange et al.

    Classical nuclear localization signals: definition, function, and interaction with importin alpha

    J. Biol. Chem.

    (2007)
  • S.W. Leung et al.

    Dissection of the karyopherin alpha nuclear localization signal (NLS)-binding groove: functional requirements for NLS binding

    J. Biol. Chem.

    (2003)
  • N. Sreerama et al.

    Computation and analysis of protein circular dichroism spectra

    Methods Enzymol.

    (2004)
  • J. Slavik

    Anilinonaphthalene sulfonate as a probe of membrane composition and function

    Biochim. Biophys. Acta

    (1982)
  • O.B. Ptitsyn

    Molten globule and protein folding

    Adv. Protein Chem.

    (1995)
  • C.A. Voigt et al.

    Trading accuracy for speed: a quantitative comparison of search algorithms in protein sequence design

    J. Mol. Biol.

    (2000)
  • E. Conti et al.

    Crystallographic analysis of the specific yet versatile recognition of distinct nuclear localization signals by karyopherin alpha

    Structure

    (2000)
  • D. Kalderon et al.

    A short amino acid sequence able to specify nuclear location

    Cell

    (1984)
  • H.J. Choi et al.

    Structure of the armadillo repeat domain of plakophilin 1

    J. Mol. Biol.

    (2005)
  • J.R. Desjarlais et al.

    Side-chain and backbone flexibility in protein core design

    J. Mol. Biol.

    (1999)
  • S.F. Altschul et al.

    Basic local alignment search tool

    J. Mol. Biol.

    (1990)
  • H. Inoue et al.

    High efficiency transformation of Escherichia coli with plasmids

    Gene

    (1990)
  • M.G. Cull et al.

    Biotinylation of proteins in vivo and in vitro using small peptide tags

    Methods Enzymol.

    (2000)
  • G. Bodenhausen et al.

    Natural abundance N-15 NMR by enhanced heteronuclear spectroscopy

    Chem. Phys. Lett.

    (1980)
  • H.R. Hoogenboom

    Selecting and screening recombinant antibody libraries

    Nat. Biotechnol.

    (2005)
  • H.K. Binz et al.

    Engineering novel binding proteins from nonimmunoglobulin domains

    Nat. Biotechnol.

    (2005)
  • J.C. Almagro

    Identification of differences in the specificity-determining residues of antibodies that recognize antigens of different size: implications for the rational design of antibody repertoires

    J. Mol. Recognit.

    (2004)
  • J.J. Marchalonis et al.

    Exquisite specificity and peptide epitope recognition promiscuity, properties shared by antibodies from sharks to humans

    J. Mol. Recognit.

    (2001)
  • I.A. Wilson et al.

    Structure of anti-peptide antibody complexes

    Res. Immunol.

    (1994)
  • Cited by (98)

    • Development and applications of artificial symmetrical proteins

      2020, Computational and Structural Biotechnology Journal
    View all citing articles on Scopus
    1

    Present addresses: A.P. Larsen, Department of Biomedical Sciences, University of Copenhagen, Blegdamsvej 3, DK-2200 Copenhagen, Denmark; M.T. Stumpp, Molecular Partners AG, Grabenstrasse 11a, CH-8952 Schlieren, Switzerland.

    View full text