Emerald BioSystems

Emerald BioSystems
7869 NE Day Road West
Bainbridge Island, WA, 98110
Website: http://www.emeraldbiosystems.com





A Novel Protein Design Tool

Peter Nollert
Introduction
Welcome to the era of synthetic biology! Protein crystallography laboratories are starting to dispose of tedious gene cloning and shortcut the molecular biology process by designing optimized proteins from scratch employing de novo gene synthesis methods to accelerate their research.(1) The adoption of this work mode allows heterologous expression of proteins to be tested more quickly while minimizing failure rates in the remaining molecular biology procedures.

Over the years, two recipes for success in protein structure determination have been discovered: (i) the ‘divide and conquer’ - approach and (ii) the paradigm of ‘the more the better’. Indeed, dividing proteins into domains and expressing them separately has provided many structures of protein portions. This is due to the fact that domains can often be over-expressed in heterologous expression systems and can readily be purified, crystallized and their structure determined by X-ray crystallography. The way though, that many crystallographers go about defining such smaller target proteins does not lend itself well to scaling up. This is due to the fact that the design process requires input from various sources such as functional characterization data, sequence alignments or knowledge of related 3D structural information. This is particularly true when multiple variants of a protein such as homologs or orthologs are to be fed into the structure determination pipeline. The latter acknowledges the fact that, in protein expression and crystallization experiments, the protein is the most important variable.(2) Finally, an additional layer of complexity has been added since the advent of selective surface residue mutagenesis.(3) The successful application of different surface mutation schemes has provided evidence for the utility of this approach. A prominent example of the success of such an approach is HIV integrase, where the protein only crystallized after introduction of several point mutations.(4) The tricky side of this proposition, however, lies in the fact that it is usually advised to test the expression of multiple surface mutagenized protein variants. Combine this with multiple domain selections and the design of these variants using conventional tools poses a substantial bottleneck.

Figure 1. Screen shot of the protein design module within Gene Composer. Multiple amino acid sequence alignments are displayed with accessory information that is available from experimental protein structures such as secondary structure, interaction of particular residues with solvent or ligands and B-factors and, calculated properties such as crystal contacts. Click to enlarge.
Design of expression optimized genes
Emerald BioSystems has developed a software database application called Gene Composer to address this issue. Fundamentally, Gene Composer shifts the focus towards the design of the amino acid sequences while the nucleic acid sequences are optimized 'on the fly' and final outputs are generated for online ordering of complete genes. Depending on the users' preferences, the only remaining molecular biology procedure is PCR-based gene assembly from oligonucleotides or just simple PCR-based generation of expression templates prior to expression testing. Gene Composer operates on Windows’ computers and utilizes a network based SQL server database that is populated by users as they design genes. This arrangement makes it possible for multiple users to return after a period of time has elapsed and, if necessary, design new variants or improve on existing designs by inclusion of new information.

Walking through the gene design process
A typical gene design cycle starts with defining a desired target protein. For example, this can be a full length sequence of a multi-domain human protein. In Gene Composer the researcher can pull in additional information from multiple sources such as existing sequence alignments, FASTA files or simple txt files with homologous sequences of related proteins or orthologs. Gene Composer automatically creates the familiar CLUSTALW multiple sequence alignments, pointing out areas of conservation, gaps and dissimilar regions. Adding structural information is simple. Coordinate files, from the PDB for example, of related proteins or domains can be added to the alignment and used to display experimental information. Secondary structural information is annotated and those amino acids are labeled that participate in ligand binding sites, that are water exposed, or that form crystal contacts. At this point the researcher may decide that it is sufficient to express the activity bearing domain only and that multiple amino acid sequence variants be generated, including variants with surface mutations and tags at either end of the protein.

Gene Composer 2.0 enables the facile design of these variants by displaying an information-rich graphical environment to protein researchers that provides comprehensive decision support. In a nutshell, Gene Composer allows researchers to view, understand, annotate, create and share designed proteins. The next step is back-translating designed amino acid sequences into nucleic acid sequences. Owing to the degeneracy of the genetic code, myriad possible DNA sequences can be chosen that correspond to a given amino acid sequence. Not all of these necessarily lead to high expression levels. Gene Composer optimizes genes through the introduction of ‘silent mutations’, that is without altering the amino acid sequence on the level of individual nucleic acids for a chosen expression system, for example E. coli, mammalian cells or a cell-free wheat-germ based expression system. Optimized nucleic acid sequences are generated following user defined rules such as codon optimization (using Gene Composer's built-in codon usage tables for highly expressed proteins), suppressing strong mRNA structure forming elements and avoiding undesired restriction sites. Finally, either the sequences for complete genes or optimized oligonucleotides for PCR-based assembly are output for online ordering from third party vendors for prices currently below 1$ per base pair.
Figure 2. Example for the effect of expression optimization employing Gene Composer-based design and gene synthesis. B. Subtilis FtsZ expression in E.coli engineered with HEXP CUT for E.coli. Crystals of FtsZ with GTP-gamma-S bound and the electron density map for this ligand are shown. Click to enlarge.

Gene Composer at deCODE biostructures
While speeding up the design of multiple genes, Gene Composer 2.0 was also developed to make use of the speed and low cost of gene synthesis, helping to avoid cumbersome molecular biology procedures. The software has been so successful at deCODE biostructures that there are very few projects that start from cDNA clones. Gene Composer has been used to successfully design genes encoding bacterial, viral and human proteins that have been engineered to crystallization competency. Consequently, molecular biology processing times have been decreased, speeding up one significant portion of the X-ray crystallographic structure determination process. Ellen Wallace, Senior Research Associate at deCODE biostructures agrees that "keeping track of all the genes we're going through this would be very difficult to do without Gene Composer." "We can quickly play through different scenarios generating gene variants. There's quick turnaround from design to expression testing, with virtually no lab-work on our side because we can buy the customized synthetic genes according to our specific design," says Don Lorimer, Associate. Director. "And by starting with a tailored nucleic acid sequence in the first place we are set up for success in construct generation," he adds. "I feel empowered using Gene Composer because I get exactly the amino acid sequence I want. No more compromising the C-or N-termini with 'cloning artifacts,'" says Bart Staker, Sr. Research Scientist. Indeed, Gene Composer presents to crystallographers the protein in a language they understand: amino acids sequences rather than nucleic acids. On top of this, crystallographers enjoy discovering clues from existing structural homologs and assessing exactly which amino acids should left untouched due to their participation in ligand, water or crystal contacts. Consequently the design process focuses on the amino acid sequences, while the nucleic acid sequences are optimized 'on the fly' for online ordering of complete genes. "We have seen quite a few cases where the expression level of proteins was increased substantially, owing to the expression-optimized nature of the nucleic acid sequence," says Alex Burgin, COO of deCODE biostructures. "While we're just starting to understand some of the rules of gene design for optimized expression, our capability to deliver to our clients X-ray structures of protein ligand co-structures has been accelerated quite a bit with Gene Composer."

About the author
Peter Nollert, Ph.D. is Director of Emerald BioSystems, and is interested in smart high-throughput protein crystallization, membrane protein technology development, microscopic imaging and bioinformatics software development.

Emerald Biosystems
888-780-8535
www.emeraldbiosystems.com

References
1. Stewart, L., A.B. Burgin. Whole gene synthesis: a gene-o-matic future, Frontiers in Drug Design & Disc. 1:297-341 (2005).
2. Dale, G.E., Oefner, C., D’Arcy, A. The Protein as a variable in protein crystallization. J. Struct. Biol. 142(1), 88-97 (2003).
3. Longecker, K.L., Garrard, S.M., Sheffielde, P.J., Derewenda, Z.S. Protein crystallization by rational mutagenesis of surface residues: Lys to Ala mutations promote crystallization of RhoGDI. Acta Cryst. D57, 679-688 (2001).
4. Chen, J.C.-H., Krucinski, J., Miercke, L.J.W., Finer-Moore, J.S., Tang, A.H., Leavitt, A.D., Stroud, R.M. Crystal structure of the HIV-1 integrase catalytic core and C-terminal domains: A model for viral DNA binding. PNAS 97(15) 8233-8238 (2000).



© 2008 Advantage Business Media. All rights reserved.
Use of this website is subject to its terms of use.
New Privacy Policy