Journal of Molecular Biology
Designed Armadillo Repeat Proteins as General Peptide-Binding Scaffolds: Consensus Design and Computational Optimization of the Hydrophobic Core
Introduction
In recent years, as an alternative to raising monoclonal antibodies by immunization, recombinant antibodies1 and an increasing number of other protein scaffolds2 have been investigated as novel binding molecules. However, neither antibodies themselves nor any of these alternative protein scaffolds were specifically designed to bind peptides. Target-specific binding molecules are, in general, obtained from large protein libraries by in vitro selection or, in the case of monoclonal antibodies, through traditional immunization procedures. Both approaches require that, for each target, each new binding molecule is individually generated and characterized for specificity and cross-reactivity, making the generation of binders against a large number of peptide targets (e.g., representing a full proteome) an almost prohibitive task.
The aim of the present study was to develop a scaffold for the generation of peptide-specific binding proteins. In more detail, we wanted to develop proteins that were stable under various conditions and with the intrinsic ability to bind peptides in a conserved fashion. To recognize peptides in a sequence-selective manner the specificity of binding should ideally be conferred through specific interactions with the peptide side chains.
Natural peptide-binding scaffolds can be grouped in different classes. Antibodies are known to be able to bind peptides and have been well characterized.3, 4, 5, 6 Although peptide-binding antibodies have certain structural features in common, the mode of binding is not conserved. Thus, the information acquired through studies of antibody–peptide complexes cannot easily be applied to the general design of peptide-binding antibodies or extended to other proteins.
Small adaptor domains (e.g., SH2, SH3, and PDZ)7 show specific binding to their targets, usually in a conserved binding mode within one family, but their affinity is generally low. The recognition sequence is very short and biased toward certain amino acid types, posttranslational modifications, or free N- or C-termini. While several such domains could be linked together by flexible peptides to recognize longer peptide sequences, a coverage of any arbitrary sequence would still be very difficult since these small domains might not be adaptable to the recognition of any arbitrary sequence. Furthermore, the entropy loss upon binding of such flexibly linked constructs would not necessarily lead to high affinities.
The major histocompatibility complex proteins (MHC I and MHC II)8 possess a higher intrinsic variability and the ability to recognize a broad range of peptides, but the difficulties in their handling reduce their attractiveness as a scaffold candidate.
Repeat proteins, in particular tetratricopeptide repeats (TPRs),9 armadillo,10 and WD4011 proteins, have been shown to possess an intrinsic ability to bind peptides, taking advantage of their repetitive structure. Thus, for our purpose, a scaffold based on repeat proteins seemed to constitute a promising candidate. For reasons outlined below, we chose the armadillo repeat protein family as the basis for our scaffold candidate.
Armadillo repeat proteins12,13 are abundant in eukaryotes, where they are involved in a broad range of biological processes (e.g., transcription regulation,14 cell adhesion,15 tumor suppressor activity,16 and nucleocytoplasmic transport17). These proteins are characterized by tandem repeats of approximately 42 amino acids that were first discovered in the product of the Drosophila melanogaster segmentation polarity gene Armadillo, which is homologous to mammalian β-catenin.18,19 Armadillo repeat proteins participate in protein–protein interactions, and the armadillo domain is usually involved in the recognition process. The domain forms a right-handed superhelix20,21 (Fig. 1a), as shown by the crystal structures of β-catenin22 and importin-α.23 Every repeat is composed of three α-helices, named H1, H2, and H3 (Fig. 1b), and several repeats stack to form the compact domain. Specialized repeats are present at the N- and C-termini of the protein, protecting the hydrophobic core from solvent exposure (Fig. 1a).
Armadillo repeat proteins are able to bind different types of peptides, yet relying on a conserved binding mode of the peptide backbone. Reported dissociation constant (Kd) values as low as 10–20 nM24 indicate that high affinities can be achieved. Crystal structures of armadillo repeat proteins in complex with bound peptides have revealed that most peptide targets are bound in an extended conformation along the surface, inside the groove formed by the H3 helices. The superhelical armadillo domain winds around the peptide, oriented in the opposite N- to C-terminal direction (Fig. 1a), thus forming a double-helical complex, topologically similar to the DNA double strand. An asparagine residue, conserved in almost every repeat at the C-terminal part of H3, makes hydrogen bonds to the main chain of the target peptide, thereby keeping it in an extended conformation. Additional interactions to the target side chains are provided by neighboring residues, mostly in H3 (Fig. 1c). In a first approximation, each dipeptide unit of the target peptide is specifically recognized by one repeat in the armadillo domain (Fig. 2a).
In theory, the possibility of developing individual repeats that specifically bind a two-amino-acid sequence unit is very attractive. Given that the individual repeats are based on the same optimized scaffold and, thus, compatible with each other, any given number of repeats can be directly stacked to extend the recognition to much longer peptide sequences. In contrast to flexibly linked small adaptor domains mentioned above, armadillo repeats directly stack on each other in a rather rigid manner, allowing binding to uninterrupted longer peptides. This would exploit the specificity of the individual repeats to provide a peptide-binding designed armadillo protein with high and predetermined specificity, governed by the individual repeats. Such an approach (Fig. 2b), using armadillo proteins assembled from previously selected “building blocks,” could effectively bypass the current in vitro selection procedures for individual peptides. However, this requires such individual peptide-specific repeats first to be developed, using a library-based approach.
In the present study, we have, as a first step, designed armadillo repeat modules based on consensus sequences. Proteins containing different types of modules have been assembled and characterized, initially only leading to stable dimeric proteins or monomeric molten-globule-like proteins. We subsequently used a combination of molecular dynamics and minimization to improve the hydrophobic core packing and convert the consensus-designed armadillo repeat protein with molten-globule-like properties to a monomeric, stable folded protein. Finally, the protein characteristics were evaluated for exploring the possibility of generating a modular peptide-binding scaffold. We succeeded in developing a stable, monomeric consensus protein that can be used now in the generation of peptide-specific individual armadillo repeat proteins.
Section snippets
Armadillo repeat protein design
A consensus design strategy25 has been applied in order to generate armadillo repeat proteins with high expression levels of soluble protein in Escherichia coli, monomeric state, high thermodynamic stability, and absence of cysteines for convenient expression and handling.
This design procedure was aimed at the generation of self-compatible repeat modules; therefore, consensus sequences were derived from multiple alignments of single armadillo repeats from the Swiss-Prot database.26 A consensus
Consensus design
Consensus design has been successfully applied in this work to generate designed armadillo repeat proteins. Similar to leucine-rich repeat proteins,30 but in contrast to ankyrin repeat proteins27 and tetratricopeptide repeats,29 different subfamilies can be clearly defined in the case of armadillo repeat proteins, based on sequences and available structures. Out of 42 signature positions, 12 are characteristic for armadillo repeats, but the conservation at other positions is relatively low.32
Conclusions
This work focused on the generation of designed armadillo repeat proteins for the construction of a general modular peptide-binding scaffold. An initial consensus-based design led to well-expressed and stable but dimeric proteins or molten globules. A stable, well-expressed monomeric protein was obtained using a force field-based approach for the stabilization of the hydrophobic core of the molten globule variant.
In a library perspective, a monomeric protein allows a better evaluation of the
Sequence analysis and modeling
SMART†,33,34 Swiss-Prot‡,26 and PDB§62 were used as the starting databases for our analysis. GCG (Wisconsin Package Version 10.3, Accelrys Inc., San Diego, CA), BLAST∥,63,64 and ClustalW¶35 were used for sequence retrieval and alignment. Structure analysis and modeling were performed with Swiss-Pdb Viewera,65 MOLMOLb,66
Acknowledgements
The authors want to thank W.I. Weis, M. Köhler, and E. Conti for kindly providing the plasmids containing the natural armadillo repeat protein genes. We thank Dr. P. Kolb for valuable suggestions, Dr. A. Honegger for EXCEL macros, and the other members of the Plückthun laboratory for fruitful discussions. The calculations were performed on Matterhorn, a Beowulf Linux cluster at the Informatikdienste of the University of Zürich. We thank C. Bolliger, Dr. T. Steenbock, and Dr. A. Godknecht for
References (75)
- et al.
Antibody–antigen interactions: contact analysis and binding site topography
J. Mol. Biol.
(1996) - et al.
Directed evolution of soluble single-chain human class II MHC molecules
J. Mol. Biol.
(2004) Armadillo repeat proteins: beyond the animal kingdom
Trends Cell Biol.
(2003)- et al.
The WD repeat: a common architecture for diverse functions
Trends Biochem. Sci.
(1999) - et al.
A repeating amino acid motif shared by proteins with diverse cellular roles
Cell
(1994) - et al.
Decisions, decisions: beta-catenin chooses between adhesion and transcription
Trends Cell Biol.
(2005) - et al.
Importin alpha: a multipurpose nuclear-transport receptor
Trends Cell Biol.
(2004) - et al.
Topological characteristics of helical repeat proteins
Curr. Opin. Struct. Biol.
(1999) - et al.
When protein folding is simplified to protein coiling: the continuum of solenoid protein structures
Trends Biochem. Sci.
(2000) - et al.
Three-dimensional structure of the armadillo repeat region of beta-catenin
Cell
(1997)
Crystallographic analysis of the recognition of a nuclear localization signal by the nuclear import factor karyopherin alpha
Cell
Biophysical characterization of interactions involving importin-alpha during nuclear import
J. Biol. Chem.
A novel strategy to design binding molecules harnessing the modular nature of repeat proteins
FEBS Lett.
Designing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins
J. Mol. Biol.
Design of stable alpha-helical arrays from an idealized TPR motif
Structure
Designing repeat proteins: modular leucine-rich repeat protein libraries based on the mammalian ribonuclease inhibitor family
J. Mol. Biol.
Characterization and further stabilization of designed ankyrin repeat proteins by combining molecular dynamics simulations and experiments
J. Mol. Biol.
Comparison of ARM and HEAT protein repeats
J. Mol. Biol.
Classical nuclear localization signals: definition, function, and interaction with importin alpha
J. Biol. Chem.
Dissection of the karyopherin alpha nuclear localization signal (NLS)-binding groove: functional requirements for NLS binding
J. Biol. Chem.
Computation and analysis of protein circular dichroism spectra
Methods Enzymol.
Anilinonaphthalene sulfonate as a probe of membrane composition and function
Biochim. Biophys. Acta
Molten globule and protein folding
Adv. Protein Chem.
Trading accuracy for speed: a quantitative comparison of search algorithms in protein sequence design
J. Mol. Biol.
Crystallographic analysis of the specific yet versatile recognition of distinct nuclear localization signals by karyopherin alpha
Structure
A short amino acid sequence able to specify nuclear location
Cell
Structure of the armadillo repeat domain of plakophilin 1
J. Mol. Biol.
Side-chain and backbone flexibility in protein core design
J. Mol. Biol.
Basic local alignment search tool
J. Mol. Biol.
High efficiency transformation of Escherichia coli with plasmids
Gene
Biotinylation of proteins in vivo and in vitro using small peptide tags
Methods Enzymol.
Natural abundance N-15 NMR by enhanced heteronuclear spectroscopy
Chem. Phys. Lett.
Selecting and screening recombinant antibody libraries
Nat. Biotechnol.
Engineering novel binding proteins from nonimmunoglobulin domains
Nat. Biotechnol.
Identification of differences in the specificity-determining residues of antibodies that recognize antigens of different size: implications for the rational design of antibody repertoires
J. Mol. Recognit.
Exquisite specificity and peptide epitope recognition promiscuity, properties shared by antibodies from sharks to humans
J. Mol. Recognit.
Structure of anti-peptide antibody complexes
Res. Immunol.
Cited by (98)
Identification and Analysis of Natural Building Blocks for Evolution-Guided Fragment-Based Protein Design
2020, Journal of Molecular BiologyDevelopment and applications of artificial symmetrical proteins
2020, Computational and Structural Biotechnology JournalThe use of consensus sequence information to engineer stability and activity in proteins
2020, Methods in EnzymologyThe tetratricopeptide-repeat motif is a versatile platform that enables diverse modes of molecular recognition
2019, Current Opinion in Structural BiologyContext-Dependent Energetics of Loop Extensions in a Family of Tandem-Repeat Proteins
2018, Biophysical Journal
- 1
Present addresses: A.P. Larsen, Department of Biomedical Sciences, University of Copenhagen, Blegdamsvej 3, DK-2200 Copenhagen, Denmark; M.T. Stumpp, Molecular Partners AG, Grabenstrasse 11a, CH-8952 Schlieren, Switzerland.