![]() A Novel Protein Design Tool |
||
|
IntroductionWelcome to the era of synthetic biology! Protein crystallography laboratories are starting to dispose of tedious gene cloning and shortcut the molecular biology process by designing optimized proteins from scratch employing de novo gene synthesis methods to accelerate their research.(1) The adoption of this work mode allows heterologous expression of proteins to be tested more quickly while minimizing failure rates in the remaining molecular biology procedures.Over the years, two recipes for success in protein structure determination have been discovered: (i) the ‘divide and conquer’ - approach and (ii) the paradigm of ‘the more the better’. Indeed, dividing proteins into domains and expressing them separately has provided many structures of protein portions. This is due to the fact that domains can often be over-expressed in heterologous expression systems and can readily be purified, crystallized and their structure determined by X-ray crystallography. The way though, that many crystallographers go about defining such smaller target proteins does not lend itself well to scaling up. This is due to the fact that the design process requires input from various sources such as functional characterization data, sequence alignments or knowledge of related 3D structural information. This is particularly true when multiple variants of a protein such as homologs or orthologs are to be fed into the structure determination pipeline. The latter acknowledges the fact that, in protein expression and crystallization experiments, the protein is the most important variable.(2) Finally, an additional layer of complexity has been added since the advent of selective surface residue mutagenesis.(3) The successful application of different surface mutation schemes has provided evidence for the utility of this approach. A prominent example of the success of such an approach is HIV integrase, where the protein only crystallized after introduction of several point mutations.(4) The tricky side of this proposition, however, lies in the fact that it is usually advised to test the expression of multiple surface mutagenized protein variants. Combine this with multiple domain selections and the design of these variants using conventional tools poses a substantial bottleneck.
Design of expression optimized genesEmerald BioSystems has developed a software database application called Gene Composer to address this issue. Fundamentally, Gene Composer shifts the focus towards the design of the amino acid sequences while the nucleic acid sequences are optimized 'on the fly' and final outputs are generated for online ordering of complete genes. Depending on the users' preferences, the only remaining molecular biology procedure is PCR-based gene assembly from oligonucleotides or just simple PCR-based generation of expression templates prior to expression testing. Gene Composer operates on Windows’ computers and utilizes a network based SQL server database that is populated by users as they design genes. This arrangement makes it possible for multiple users to return after a period of time has elapsed and, if necessary, design new variants or improve on existing designs by inclusion of new information.Walking through the gene design processA typical gene design cycle starts with defining a desired target protein. For example, this can be a full length sequence of a multi-domain human protein. In Gene Composer the researcher can pull in additional information from multiple sources such as existing sequence alignments, FASTA files or simple txt files with homologous sequences of related proteins or orthologs. Gene Composer automatically creates the familiar CLUSTALW multiple sequence alignments, pointing out areas of conservation, gaps and dissimilar regions. Adding structural information is simple. Coordinate files, from the PDB for example, of related proteins or domains can be added to the alignment and used to display experimental information. Secondary structural information is annotated and those amino acids are labeled that participate in ligand binding sites, that are water exposed, or that form crystal contacts. At this point the researcher may decide that it is sufficient to express the activity bearing domain only and that multiple amino acid sequence variants be generated, including variants with surface mutations and tags at either end of the protein.Gene Composer 2.0 enables the facile design of these variants by displaying an information-rich graphical environment to protein researchers that provides comprehensive decision support. In a nutshell, Gene Composer allows researchers to view, understand, annotate, create and share designed proteins. The next step is back-translating designed amino acid sequences into nucleic acid sequences. Owing to the degeneracy of the genetic code, myriad possible DNA sequences can be chosen that correspond to a given amino acid sequence. Not all of these necessarily lead to high expression levels. Gene Composer optimizes genes
Gene Composer at deCODE biostructuresWhile speeding up the design of multiple genes, Gene Composer 2.0 was also developed to make use of the speed and low cost of gene synthesis, helping to avoid cumbersome molecular biology procedures. The software has been so successful at deCODE biostructures that there are very few projects that start from cDNA clones. Gene Composer has been used to successfully design genes encoding bacterial, viral and human proteins that have been engineered to crystallization competency. Consequently, molecular biology processing times have been decreased, speeding up one significant portion of the X-ray crystallographic structure determination process. Ellen Wallace, Senior Research Associate at deCODE biostructures agrees that "keeping track of all the genes we're going throughAbout the authorPeter Nollert, Ph.D. is Director of Emerald BioSystems, and is interested in smart high-throughput protein crystallization, membrane protein technology development, microscopic imaging and bioinformatics software development.www.emeraldbiosystems.com References1. Stewart, L., A.B. Burgin. Whole gene synthesis: a gene-o-matic future, Frontiers in Drug Design & Disc. 1:297-341 (2005).2. Dale, G.E., Oefner, C., D’Arcy, A. The Protein as a variable in protein crystallization. J. Struct. Biol. 142(1), 88-97 (2003). 3. Longecker, K.L., Garrard, S.M., Sheffielde, P.J., Derewenda, Z.S. Protein crystallization by rational mutagenesis of surface residues: Lys to Ala mutations promote crystallization of RhoGDI. Acta Cryst. D57, 679-688 (2001). 4. Chen, J.C.-H., Krucinski, J., Miercke, L.J.W., Finer-Moore, J.S., Tang, A.H., Leavitt, A.D., Stroud, R.M. Crystal structure of the HIV-1 integrase catalytic core and C-terminal domains: A model for viral DNA binding. PNAS 97(15) 8233-8238 (2000). |
||
© 2006 Advantage Business Media All rights reserved. Use of this website is subject to its terms of use. Privacy Policy |